* On 25 Jul 2012, John Long wrote: > Guys, what are you using for killfiling/mail filtering?
This isn't exactly what you're looking for since you want it filtered pre-download, but it's perhaps something worth thinking about. I've been meaning to post it for years but never got around to it, so I finally made time for this topic. Background: I use procmail pretty heavily for delivery filtering, because I know of nothing else with its power. However I absolutely loathe Procmail's performance and configuration syntax. It's OK for the kind of programmatic filtering I use it for, but less OK for the very narrowly-scoped filtering that killfiling implies. I should be able to add a very simple expression to a file and go, without banging around in procmail's bizarre notation, external programs to extract content, etc. (I was never very interested in Sieve for my needs, which include a lot of content filtering, not just sorting. I don't want to get into details, but it may be worth another look now that it's been 12 years since I first looked at it.) Stage 1: I decided I would do my killfiling in mutt, using mutt's pattern expressions for simplicity and flexibility. First I wrote a simple shell script that generates mutt commands based on a 'killfile' that contains simple mutt patterns -- e.g.: ~s '^Output from .cron. command' ~y mutt ~s 'Design choices' ~t root@ ~s 'logwatch' ~y mailman ~s '^.rt \#[0-9]+. List ([^ ]+) creation request' I run this with a macro: macro index \;j "<enter-command>source 'mutt-killfile |'<enter>" "Delete junk mail." So whenever I want to run the killfile I press ;j, and that tags all such messages. I eyeball the results and tag-delete. Stage 2: I found that once I added enough patterns to killfile everything, execution got really slow because mutt was running dozens of regular expressions across my entire mailbox. It would be faster to adjust my script to aggregate these expressions into a single regexp. I did that, and performance improved dramatically. Stage 3: Before long I encountered mutt's command length limit of 1024 characters. My program needed to detect when its tagging commands were encroaching on that limit, and break the aggregate expression up into multiple commands. I wrote this initially in Perl years ago, but recently converted it to Python (as I eventually do with every Perl program I've written) and added features. I currently condense about 180 patterns in my killfile into 8 actual mutt commands with this approach. This is what I use today. With no arguments, this program reads three files, if they exist: ~/.killfile ~/.mutt-killfile ~/.mutt/killfile Alternatively it can read stdin or from a specific file name or names: usage: mutt-killfile mutt-killfile - mutt-killfile file [...] It generates muttrc commands based on the patterns contained within, as described above. For example, the patterns above create this result: push "<tag-pattern>(~s '\^Output from .cron. command')|(~y mutt ~s 'Design choices')|(~t root@ ~s 'logwatch')|(~y mailman ~s '\^.rt \#[0-9]+. List ([\^ ]+) creation request')<enter>" Additionally you can control the output template. This input: template push "<delete-pattern>%p<enter>" ~s '^Output from .cron. command' ~y mutt ~s 'Design choices' ~t root@ ~s 'logwatch' ~y mailman ~s '^.rt \#[0-9]+. List ([^ ]+) creation request' generates this: push "<delete-pattern>(~s '\^Output from .cron. command')|(~y mutt ~s 'Design choices')|(~t root@ ~s 'logwatch')|(~y mailman ~s '\^.rt \#[0-9]+. List ([\^ ]+) creation request')<enter>" ... which immediately deletes messages instead of tagging, like an actual killfile for the non-paranoid. A killfile also can source other killfiles. Here's my ~/.mutt/killfile: source ~/.mutt/killfile.$DOMAIN Since DOMAIN=uchicago.edu in my environment, that causes it to source ~/.mutt/killfile.uchicago.edu. The killfile may contain blank lines and comments for readability. Each sourced killfile may have its own template. If no template is explicit in the file, it inherits from the previous file or the default. When the file is done, the template reverts to the previous template. You can of course set a folder-hook to run this script upon entering a folder, too, if you prefer full automation. I suppose something like this would do it: folder-hook . 'source "mutt-killfile |"' Now that it's a Python program I may add the capability to automatically add killfile entries for author, subject, etc, but so far it doesn't bother me enough that I can't. The main thing to beware of with this program is that it's exactly as sensitive to escaping in patterns as mutt itself is. You may encounter problems using it that are hard to diagnose, since they boil down to a few characters in a generated muttrc command up to ~1020 characters long. -- David Champion • d...@uchicago.edu • IT Services • University of Chicago
#!/usr/bin/env python import os import sys class KFParseError(ValueError): pass def expand(s): return os.path.expanduser(os.path.expandvars(s)) def encode(s): '''perform mutilations on patterns for muttrc parser''' s = s.replace('"', "'") s = s.replace('^', '\\^') s = s.replace('$', '\\\$') while '\t\t' in s: s = s.replace('\t\t', '\t') s = s.replace('\t', ' ') return s class Killfile(object): # mutt commands must be 1024 characters or fewer maxline = 1020 def __init__(self): self.ctemplate = ['push "<tag-pattern>%p<enter>"'] self.groups = {self.ctemplate[-1]: []} def read(self, *files): for file in files: try: fp = open(expand(file), 'r') except: continue # dup the ctemplate stack, read file, and pop self.ctemplate.append(self.ctemplate[-1]) self.readfp(fp) self.ctemplate.pop() fp.close() def readfp(self, fp): for line in fp: self.parse(line) def _clean(self, line): if '#' in line: off = line.find('#') if off == 0 or line[off-1] != '\\': line = line[:off] return line.strip() def _settemplate(self, template): self.ctemplate[-1] = template.strip() if self.ctemplate[-1] not in self.groups: self.groups[self.ctemplate[-1]] = [] def parse(self, line): line = self._clean(line) if not line: return elif line.startswith('[') and line.endswith(']'): # obsolete syntax self._settemplate(line[1:-1]) return elif line.startswith('template'): self._settemplate(line[8:]) return elif line.startswith('source'): filename = line[6:].strip() self.read(filename) return elif '~' not in line: raise KFParseError, 'not a mutt pattern: ' + line self.append(self.ctemplate[-1], line) def append(self, template, pattern): self.groups[template].append(pattern) def release(self): out = [] for template, patterns in self.groups.items(): plist = [] test = len(template) for pattern in patterns: # Test length of command if pattern is added. # Besides its own length, each pattern may add 3 characters of # overhead to the command: (, ), and |. test += len(encode(pattern)) + 3 if test > self.maxline: # emit the current set, then resume out.append(self.emit(template, plist)) plist = [] test = len(template) + len(encode(pattern)) + 3 plist.append(pattern) # emit the final set for the current template if len(plist): out.append(self.emit(template, plist)) return out def emit(self, template, patterns): expr = '|'.join(['(%s)' % encode(p) for p in patterns]) return template.replace('%p', expr) def usage(): p = os.path.basename(sys.argv[0]) print 'usage: %s' % p print ' %s -' % p print ' %s file [...]' % p def main(args): if args and args[0] in '-h --help'.split(): usage() return 0 k = Killfile() if args: for arg in args: if arg == '-': k.readfp(sys.stdin) else: k.read(arg) else: try: # ignore exceptions on this file: it may belong to some other # application. k.read('~/.killfile') except: pass k.read('~/.mutt-killfile') k.read('~/.mutt/killfile') for line in k.release(): print line if __name__ == '__main__': try: sys.exit(main(sys.argv[1:])) except KeyboardInterrupt: print >>sys.stderr, '\nbreak'