Re: learning from IMAP spam collection

2009-05-25 Thread Michael Monnerie
On Dienstag 19 Mai 2009 martin f krafft wrote: also sprach Jeff Mincy j...@delphioutpost.com [2009.05.19.1445 +0200]: Use prefix matching instead! formail -b -t -I X-Spam- msg This is undoubtedly more of a sledgehammer approach, but I don't see how it would/could be unsafe, really.

Re: learning from IMAP spam collection

2009-05-25 Thread Yet Another Ninja
On 5/25/2009 9:47 AM, Michael Monnerie wrote: On Dienstag 19 Mai 2009 martin f krafft wrote: also sprach Jeff Mincy j...@delphioutpost.com [2009.05.19.1445 +0200]: Use prefix matching instead! formail -b -t -I X-Spam- msg This is undoubtedly more of a sledgehammer approach, but I don't

Re: learning from IMAP spam collection

2009-05-25 Thread Michael Monnerie
On Montag 25 Mai 2009 Yet Another Ninja wrote: fetchmail from spam box , set fetchmail to deliver via smtp, procmail pipe thru ripmime, save spam msg part, drop original, use spam part to learn... Ah, ripmime is the hint... Thx. mfg zmi -- // Michael Monnerie, Ing.BSc-

Re: learning from IMAP spam collection

2009-05-20 Thread martin f krafft
also sprach Jeff Mincy j...@delphioutpost.com [2009.05.19.1445 +0200]: formail -b -t -I X-Spam-Status: -I X-Spam-Flag: -I X-Spam-Checker-Version: -I X-Spam-Rbl: -I X-Spam-Pyzor: -I X-Spam-DCC: -I X-Spam-Level: -I X-Spam-Bayes: -I X-Spam-Relay: -I X-Spam-Report: -I X-Spam-AWL: -I X-Spam-Karma:

Re: learning from IMAP spam collection

2009-05-19 Thread Michael Monnerie
On Sonntag 17 Mai 2009 Rick Macdougall wrote: Why not use http://www.sonologic.nl/pub/Projects/ImapSaLearn/imap-sa-learn.pl.txt I've improved it a bit: http://zmi.at/x/imap-sa-learn.pl * debug 1 or 2 selectable * no debug is good for interactive use, debug 1 for scripts, debug 2 for real

Re: learning from IMAP spam collection

2009-05-19 Thread Michael Monnerie
On Sonntag 17 Mai 2009 Michael Monnerie wrote: Why is it so extremely slow and CPU consuming just to remove any existing markups? There really seems to be no other way than calling spamassassin -d to remove existing markups. I guess I will create an account where a script takes all messages

Re: learning from IMAP spam collection

2009-05-19 Thread Michael Monnerie
On Dienstag 19 Mai 2009 Michael Monnerie wrote: On Sonntag 17 Mai 2009 Rick Macdougall wrote: Why not use http://www.sonologic.nl/pub/Projects/ImapSaLearn/imap-sa-learn.pl.t xt I've improved it a bit: http://zmi.at/x/imap-sa-learn.pl * debug 1 or 2 selectable * no debug is good for

Re: learning from IMAP spam collection

2009-05-19 Thread Martin Gregorie
On Tue, 2009-05-19 at 03:03 +0200, Michael Monnerie wrote: Yes, I want to use spamc. But what parameters does it need to remove existing spam markup, just like spamassassin -d does? I don't think it does that, but it should be easy enough to add the option and submit the result as a patch.

Re: learning from IMAP spam collection

2009-05-19 Thread Jeff Mincy
From: Michael Monnerie michael.monne...@is.it-management.at Date: Tue, 19 May 2009 09:34:53 +0200 On Sonntag 17 Mai 2009 Michael Monnerie wrote: Why is it so extremely slow and CPU consuming just to remove any existing markups? There really seems to be no other way than

RE: learning from IMAP spam collection

2009-05-19 Thread Michael Monnerie
I don't think it does that, but it should be easy enough to add the option and submit the result as a patch. spamc seemed pretty straight forward last time I looked at its source. Yeah, maybe some good hacker could do that. I'm not a programmer, unfortunately. mfg zmi

RE: learning from IMAP spam collection

2009-05-19 Thread Martin Gregorie
On Tue, 2009-05-19 at 15:05 +0200, Michael Monnerie wrote: Nope. It needs to modify the body as well. We have a lengthy this is SPAM text in the beginning of recognized Spam, with the original mail attached. this way, it cannot happen that users accidentally click on stupid Viagra links. So

Re: learning from IMAP spam collection

2009-05-19 Thread LuKreme
On 19-May-2009, at 06:45, Jeff Mincy wrote: You can use formail to remove headers. It is way faster than spamassassin -d. The only trick is listing all of the headers that can be added by SpamAssassin. formail -b -t -I X-Spam-Status: -I X-Spam-Flag: -I X-Spam-Checker- Version: -I

Re: learning from IMAP spam collection

2009-05-19 Thread LuKreme
On 19-May-2009, at 09:56, Martin Gregorie wrote: Thats a much more complex problem than your original requirement to strip out headers. You'll not get good solutions if you hide part of the problem. His original problem was the very slow speed of spamassassin -d OP from post #1 I like to

Re: learning from IMAP spam collection

2009-05-19 Thread Martin Schütte
Michael Monnerie schrieb: Nope. It needs to modify the body as well. [...] And sometimes messages are encrypted twice, when they arrive over certain paths. But that's an extra mess. If the processing is that difficult you might consider to save a copy of every incoming mail (before filters)

Re: learning from IMAP spam collection

2009-05-18 Thread Martin Gregorie
On Sun, 2009-05-17 at 19:11 -0600, LuKreme wrote: On 17-May-2009, at 01:42, Michael Monnerie wrote: fetchmail -asnp IMAP --folder autolearn --user $username -m formail -s |spamassassin -d /tmp/x $mailserver Switch to using spamc/spamd and this way of using SA is OK. Start the spamd

Re: learning from IMAP spam collection

2009-05-18 Thread Michael Monnerie
On Sonntag 17 Mai 2009 Rick Macdougall wrote: Why not use http://www.sonologic.nl/pub/Projects/ImapSaLearn/imap-sa-learn.pl.txt Oh, looks interesting. But there's comment missing in the header, it says: # Feed mail from an imap mail folder to sa-learn. Options: and then nothing. Are there no

Re: learning from IMAP spam collection

2009-05-18 Thread Michael Monnerie
On Sonntag 17 Mai 2009 Andrzej Adam Filip wrote: Do you access the IMAP on the same host/via unecrypted LAN connection? [Translated: Can you use Net::IMAP::Simple module to access the folder? Do you use Dovecot IMAP on the same host? No. The spambox is different from the IMAP store, working

Re: learning from IMAP spam collection

2009-05-18 Thread Michael Monnerie
On Montag 18 Mai 2009 Martin Gregorie wrote: Switch to using spamc/spamd and this way of using SA is OK. Start the spamd daemon as part of your boot sequence. Replace spamassassin -d with spamc in your fetchmail command. This way there's no spamassassin per-message startup overhead. The

Re: learning from IMAP spam collection

2009-05-18 Thread Michael Monnerie
On Sonntag 17 Mai 2009 Chris wrote: Here's a script I've been using for years now on my imap folders. Works great. I've left some of the information in so you can see how it's formated. Reports to Razor, Pyzor, DCC and, if setup, to Spamcop. http://pastebin.com/m39ad4cf9 Thank you Chris,

Re: learning from IMAP spam collection

2009-05-18 Thread Michael Monnerie
On Sonntag 17 Mai 2009 Jari Fredriksson wrote: Why is there no mode -L spam -C report to spamc? Could do both at once. I think -C report does a) remove markup b) sent reports to ALL c) learn as spam All with the same command. Hm. And where would the output without markup go? That's

Re: learning from IMAP spam collection

2009-05-17 Thread Michael Monnerie
On Sonntag 17 Mai 2009 Michael Monnerie wrote: To clarify my posting, here some additions: Question 1: Do I need to call spamc twice, once with -L spam and once with -C report? Do I understand correctly that -L trains my bayes, while -C reports to spamcop etc.? The man page of spamc

Re: learning from IMAP spam collection

2009-05-17 Thread Michael Monnerie
Finally measured again, it takes 1h7m to fetch from imap plus remove all markups: # time fetchmail -kasnp IMAP --folder $spamfolder--user $spamuser -m formail -s |spamassassin -d /tmp/x $mailhost real67m10.352s user51m41.350s sys 3m27.170s mfg zmi -- // Michael Monnerie, Ing.BSc

Re: learning from IMAP spam collection

2009-05-17 Thread Rick Macdougall
Michael Monnerie wrote: Finally measured again, it takes 1h7m to fetch from imap plus remove all markups: # time fetchmail -kasnp IMAP --folder $spamfolder--user $spamuser -m formail -s |spamassassin -d /tmp/x $mailhost real67m10.352s user51m41.350s sys 3m27.170s mfg zmi Why

Re: learning from IMAP spam collection

2009-05-17 Thread Jari Fredriksson
- Original Message - From: Michael Monnerie michael.monne...@is.it-management.at To: users@spamassassin.apache.org Sent: Sunday, May 17, 2009 1:15 PM Subject: Re: learning from IMAP spam collection Why is there no mode -L spam -C report to spamc? Could do both at once. I think -C

Re: learning from IMAP spam collection

2009-05-17 Thread John Hardin
On Sun, 17 May 2009, Michael Monnerie wrote: Finally measured again, it takes 1h7m to fetch from imap plus remove all markups: I think the largest part of your problem is the fetch part. The way this is usually set up is the training mailbox files reside on the same server that is doing the

Re: learning from IMAP spam collection

2009-05-17 Thread Chris
On Sun, 2009-05-17 at 09:42 +0200, Michael Monnerie wrote: Dear experts, I have a question regarding spam/ham learning, regarding performance. I store spam in a mail folder accessible via IMAP. Then I want to feed this into bayes. For this, I do: fetchmail -asnp IMAP --folder autolearn

Re: learning from IMAP spam collection

2009-05-17 Thread Andrzej Adam Filip
Michael Monnerie michael.monne...@is.it-management.at wrote: Dear experts, I have a question regarding spam/ham learning, regarding performance. I store spam in a mail folder accessible via IMAP. Then I want to feed this into bayes. [...] Could you answer a few extra question needed to

Re: learning from IMAP spam collection

2009-05-17 Thread LuKreme
On 17-May-2009, at 01:42, Michael Monnerie wrote: fetchmail -asnp IMAP --folder autolearn --user $username -m formail -s |spamassassin -d /tmp/x $mailserver Fethmail first so you an get ALL the messages at once. THEN run Spamassassin. This will be a lot shorter I'll be than what you are