On 01/06/05 08:41 AM, Jeff Koch sat at the `puter and typed: > > Has anyone come up with a script or method that would allow users to > forward their false positive and false negative emails back to an address > on the mailserver where they can be used to train the Bayes database. I > understand that Bayes needs the email in its original format so the script > has to strip off the forwarding enclosure. > > Thanks in advance.
Cool idea. I have one that allows a user to send an email with a list of addresses to whitelist or blacklist. They send it to their own address with a +whitelist or +blacklist extension. Frinstance, I could send to [EMAIL PROTECTED] and whitelist an address. Naturally, it requires a password in there as well, but it works. This really only boils down to a procmail recipe at the server end, but I did write a quick mutt macro that uses formail to parse the >From address out of the message and autosend it using a script with about 20 lines of Perl code. It also assumes your MTA can handle plussed folders, but this can be worked around with a subject scan or something similar. I wonder if the same thing could work with this idea. One would have to be careful what was passed into bayes. Anyone know exactly what and how this would need to be encapsulated? I'm guessing it would require some perlish at the server end to be called from procmail, but it would have to be encapsulated carefully at the client end to avoid piping the encapsulation headers through the learner. XXX Just because it's remotely relevant, I use maildir now with my mail server. This allows easy confirmation of spam by providing a different subdirectory for new and read email. So anything in the .../cur directory is marked as read, and in the spam folder that should be confirmed spam. Autolearned spam goes into a different folder altogether. In my years with SA, this has a 0% FP rate, so I don't feel I even have to bother with it anymore. I wrote a script that uses Mail::SpamAssassin to parse the confirmed spam, then move it to a spamdump folder. I did some shameless borrowing from sa-learn, giving credit in the script, of course. By default, the spamdump is recreated each month, leaving the old to be purged at the users will. I made my script extremely flexible, with some powerful and flexible configuration methods, so you can pretty much configure anything of consequence. The reason I did this is that I wanted to be able to confirm spam and have it learned as spam, then moved away. The configuration uses a list of directories expected to contain confirmed spam. I also wanted to have autolearned spam moved out without trying to relearn it. This is done with another list of directories, containing autolearned spam. I wanted to include both read and unread autolearned spam - remember, I'm getting 100% accuracy in this set - so I simply included both directories in the list. Naturally, it will also use a list of directories that contain confirmed ham, and learn them as such, but these will be left where they are. No good hiding the users real mail, right? At some point I hope to keep track of the last time the script was run and use that here to parse only files with a last mod or create time since the last run. Whether that approach is better than just rechecking all of them may be debatable. There is a configuration switch to autoreport all learned spam. This is off by default, and I haven't used it yet. Once a month (when the new spamdump is created) the script will force a sync and expire. This can be done every time the script runs by turning on a config switch. Anyone interested it checking it out to provide feedback? There are a couple things that might be considered downsides or TODO items: * The configuration method is a bit technical (has to be valid perl), but it's pretty powerful if you use your imagination. At some point, I hope to find a way to do configuration through the Mail::SpamAssassin::Conf module for consistency, but I'm not sure how it will handle list definition, or even if that module was written to be used by other scripts. * It is limited to directory based mail, no mbox or mbx files - it was written solely with maildir in mind. * New spam archive folders are created with a system call - to maildirmake by default, but that can be changed to a mkdir -p command if necessary. I've done a quick scan for a perl module to create the maildir, but haven't found one yet. Courier IMAP doesn't have one, it uses a C/C++ utility to do it. * Just because a file winds up in the confirmed spam directory doesn't guarantee it will be learned, but it will be scanned. It isn't uncommon to see a message come through that has enough in common with a message already learned as spam to be skipped. The script doesn't forget and relearn by default, so it might not catch the case of an autolearned FN. To do this, I may need to duplicate the Mail::SpamAssassin::ArchiveIterator object and use one to forget all messages, then use the other to relearn them as spam. I haven't found a way to tell Mail::SpamAssassin->learn() to force a relearn yet. * There's a LOT of commentary in the script, but it's not a real POD yet. There's still quite a bit to do, but it's been working great on my system for about a week now. I have the verbosity turned up a bit, and the nightly crons send me the output. So far so good. I hope to make it worthy of submission to the SA project, but it still requires some work. Lou -- Louis LeBlanc [EMAIL PROTECTED] Fully Funded Hobbyist, KeySlapper Extrordinaire :) http://www.keyslapper.org ԿԬ Not one hundred percent efficient, of course ... but nothing ever is. -- Kirk, "Metamorphosis", stardate 3219.8