Re: Killfiling, anyone?

David Champion Thu, 09 Aug 2012 11:24:24 -0700

* On 25 Jul 2012, John Long wrote: 
> Guys, what are you using for killfiling/mail filtering?


This isn't exactly what you're looking for since you want it filtered
pre-download, but it's perhaps something worth thinking about.  I've
been meaning to post it for years but never got around to it, so I
finally made time for this topic.

Background: I use procmail pretty heavily for delivery filtering,
because I know of nothing else with its power.  However I absolutely
loathe Procmail's performance and configuration syntax.  It's OK for the
kind of programmatic filtering I use it for, but less OK for the very
narrowly-scoped filtering that killfiling implies.  I should be able to
add a very simple expression to a file and go, without banging around in
procmail's bizarre notation, external programs to extract content, etc.

(I was never very interested in Sieve for my needs, which include a
lot of content filtering, not just sorting.  I don't want to get into
details, but it may be worth another look now that it's been 12 years
since I first looked at it.)

Stage 1: I decided I would do my killfiling in mutt, using mutt's
pattern expressions for simplicity and flexibility.  First I wrote a
simple shell script that generates mutt commands based on a 'killfile'
that contains simple mutt patterns -- e.g.:

    ~s '^Output from .cron. command'
    ~y mutt      ~s 'Design choices'
    ~t root@     ~s 'logwatch'
    ~y mailman   ~s '^.rt \#[0-9]+. List ([^ ]+) creation request'

I run this with a macro:
    macro index \;j "<enter-command>source 'mutt-killfile |'<enter>" "Delete 
junk mail."

So whenever I want to run the killfile I press ;j, and that tags all such
messages.  I eyeball the results and tag-delete.

Stage 2: I found that once I added enough patterns to killfile
everything, execution got really slow because mutt was running dozens
of regular expressions across my entire mailbox.  It would be faster to
adjust my script to aggregate these expressions into a single regexp.  I
did that, and performance improved dramatically.

Stage 3: Before long I encountered mutt's command length limit of 1024
characters.  My program needed to detect when its tagging commands
were encroaching on that limit, and break the aggregate expression up
into multiple commands.  I wrote this initially in Perl years ago,
but recently converted it to Python (as I eventually do with every
Perl program I've written) and added features.  I currently condense
about 180 patterns in my killfile into 8 actual mutt commands with this
approach.

This is what I use today.  With no arguments, this program reads three
files, if they exist:
    ~/.killfile
    ~/.mutt-killfile
    ~/.mutt/killfile

Alternatively it can read stdin or from a specific file name or names:
    usage: mutt-killfile
           mutt-killfile -
           mutt-killfile file [...]

It generates muttrc commands based on the patterns contained within,
as described above.  For example, the patterns above create this result:
    push "<tag-pattern>(~s '\^Output from .cron. command')|(~y mutt      ~s 
'Design choices')|(~t root@     ~s 'logwatch')|(~y mailman   ~s '\^.rt 
\#[0-9]+. List ([\^ ]+) creation request')<enter>"

Additionally you can control the output template.  This input:
    template push "<delete-pattern>%p<enter>"
    ~s '^Output from .cron. command'
    ~y mutt      ~s 'Design choices'
    ~t root@     ~s 'logwatch'
    ~y mailman   ~s '^.rt \#[0-9]+. List ([^ ]+) creation request'

generates this:
    push "<delete-pattern>(~s '\^Output from .cron. command')|(~y mutt      ~s 
'Design choices')|(~t root@     ~s 'logwatch')|(~y mailman   ~s '\^.rt 
\#[0-9]+. List ([\^ ]+) creation request')<enter>"

... which immediately deletes messages instead of tagging, like an
actual killfile for the non-paranoid.

A killfile also can source other killfiles.  Here's my ~/.mutt/killfile:
    source ~/.mutt/killfile.$DOMAIN

Since DOMAIN=uchicago.edu in my environment, that causes it to source
~/.mutt/killfile.uchicago.edu.

The killfile may contain blank lines and comments for readability.

Each sourced killfile may have its own template.  If no template is
explicit in the file, it inherits from the previous file or the default.
When the file is done, the template reverts to the previous template.

You can of course set a folder-hook to run this script upon entering a
folder, too, if you prefer full automation.  I suppose something like
this would do it:
        folder-hook . 'source "mutt-killfile |"'

Now that it's a Python program I may add the capability to automatically
add killfile entries for author, subject, etc, but so far it doesn't
bother me enough that I can't.

The main thing to beware of with this program is that it's exactly as
sensitive to escaping in patterns as mutt itself is.  You may encounter
problems using it that are hard to diagnose, since they boil down to
a few characters in a generated muttrc command up to ~1020 characters
long.

-- 
David Champion • d...@uchicago.edu • IT Services • University of Chicago

#!/usr/bin/env python

import os
import sys

class KFParseError(ValueError): pass
        
def expand(s):
        return os.path.expanduser(os.path.expandvars(s))

def encode(s):
        '''perform mutilations on patterns for muttrc parser'''
        s = s.replace('"', "'")
        s = s.replace('^', '\\^')
        s = s.replace('$', '\\\$')
        while '\t\t' in s:
                s = s.replace('\t\t', '\t')
        s = s.replace('\t', ' ')
        return s

class Killfile(object):
        # mutt commands must be 1024 characters or fewer
        maxline = 1020

        def __init__(self):
                self.ctemplate = ['push "<tag-pattern>%p<enter>"']
                self.groups = {self.ctemplate[-1]: []}

        def read(self, *files):
                for file in files:
                        try:
                                fp = open(expand(file), 'r')
                        except:
                                continue

                        # dup the ctemplate stack, read file, and pop
                        self.ctemplate.append(self.ctemplate[-1])
                        self.readfp(fp)
                        self.ctemplate.pop()

                        fp.close()

        def readfp(self, fp):
                for line in fp:
                        self.parse(line)

        def _clean(self, line):
                if '#' in line:
                        off = line.find('#')
                        if off == 0 or line[off-1] != '\\':
                                line = line[:off]
                return line.strip()

        def _settemplate(self, template):
                self.ctemplate[-1] = template.strip()
                if self.ctemplate[-1] not in self.groups:
                        self.groups[self.ctemplate[-1]] = []

        def parse(self, line):
                line = self._clean(line)
                if not line:
                        return
                elif line.startswith('[') and line.endswith(']'):
                        # obsolete syntax
                        self._settemplate(line[1:-1])
                        return
                elif line.startswith('template'):
                        self._settemplate(line[8:])
                        return
                elif line.startswith('source'):
                        filename = line[6:].strip()
                        self.read(filename)
                        return
                elif '~' not in line:
                        raise KFParseError, 'not a mutt pattern: ' + line
                self.append(self.ctemplate[-1], line)

        def append(self, template, pattern):
                self.groups[template].append(pattern)

        def release(self):
                out = []
                for template, patterns in self.groups.items():
                        plist = []
                        test = len(template)
                        for pattern in patterns:
                                # Test length of command if pattern is added.
                                # Besides its own length, each pattern may add 
3 characters of
                                # overhead to the command: (, ), and |.
                                test += len(encode(pattern)) + 3
                                if test > self.maxline:
                                        # emit the current set, then resume
                                        out.append(self.emit(template, plist))
                                        plist = []
                                        test = len(template) + 
len(encode(pattern)) + 3
                                plist.append(pattern)

                        # emit the final set for the current template
                        if len(plist):
                                out.append(self.emit(template, plist))

                return out

        def emit(self, template, patterns):
                expr = '|'.join(['(%s)' % encode(p) for p in patterns])
                return template.replace('%p', expr)

def usage():
        p = os.path.basename(sys.argv[0])
        print 'usage: %s' % p
        print '       %s -' % p
        print '       %s file [...]' % p

def main(args):
        if args and args[0] in '-h --help'.split():
                usage()
                return 0

        k = Killfile()

        if args:
                for arg in args:
                        if arg == '-':
                                k.readfp(sys.stdin)
                        else:
                                k.read(arg)

        else:
                try:
                        # ignore exceptions on this file: it may belong to some 
other
                        # application.
                        k.read('~/.killfile')
                except:
                        pass
                k.read('~/.mutt-killfile')
                k.read('~/.mutt/killfile')

        for line in k.release():
                print line

if __name__ == '__main__':
        try:
                sys.exit(main(sys.argv[1:]))
        except KeyboardInterrupt:
                print >>sys.stderr, '\nbreak'

Re: Killfiling, anyone?

Reply via email to