Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns
On Tue, 2009-10-27 at 19:28 -0400, Timo Sirainen wrote: On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote: The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file. This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately. How comes? Any insight? Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf(). I'm not sure this is what we want -- shouldn't we keep it as pristine as possible? However, I don't understand Karsten anyway, which message is the trained one? Karsten, please list the three relevant messages: the one first handed to SA _before_ dovecot gets involved, the one stored, and the one handed to SA via antispam. johannes signature.asc Description: This is a digitally signed message part
Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns
On Oct 28, 2009, at 3:02 AM, Johannes Berg wrote: Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf(). I'm not sure this is what we want -- shouldn't we keep it as pristine as possible? The linefeeds don't really matter much. For example IMAP and SMTP require sending CRLF linefeeds and Dovecot always converts them to just LFs before saving the messages. Then depending on how the input comes, it may have CRLF or LF linefeeds.
Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns
On Wed, 2009-10-28 at 03:07 -0400, Timo Sirainen wrote: On Oct 28, 2009, at 3:02 AM, Johannes Berg wrote: Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf(). I'm not sure this is what we want -- shouldn't we keep it as pristine as possible? The linefeeds don't really matter much. For example IMAP and SMTP require sending CRLF linefeeds and Dovecot always converts them to just LFs before saving the messages. Then depending on how the input comes, it may have CRLF or LF linefeeds. Indeed. But I think Karsten is saying that his mail comes with CRLF via SMTP, and is seen by SA with CRLF, and then when it comes back to SA via antispam, it now has just LF. In a sense, dovecot is normalising linefeeds to LF, which seems to be causing him problems. johannes signature.asc Description: This is a digitally signed message part
Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns
On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote: The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file. This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately. How comes? Any insight? Probably because incoming mails have CRLF linefeeds. Antispam plugin could drop these by wrapping the mail_get_stream()'s returned input stream to i_stream_create_lf(). signature.asc Description: This is a digitally signed message part
Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns
*nudge* Anyone? Since Timo seems to be on a list processing spree lately, here's hoping. :) On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote: Guys, Dovecot 1.0.15 [1], just built the latest antispam-plugin 1.2 (tarball) for testing, mailtrain backend for SA integration. Both built from custom spec files. The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file. This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately. How comes? Any insight? How could I fix this, other than wrapping the sa-learn inside another shell script and have sed strip off the noise? This becomes more of an issue, once I switch from sa-learn to the lightning-fast spamc training variant. TIA guenther [1] Yes, I know, sorry. Don't want to change everything at the same time, and the target system I'm experimenting for runs that version, too. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
[Dovecot] antispam-plugin 1.2 and trailing carriage-returns
Guys, Dovecot 1.0.15 [1], just built the latest antispam-plugin 1.2 (tarball) for testing, mailtrain backend for SA integration. Both built from custom spec files. The mail that is being trained is different than its respective source in the mbox file. The trained one shows added, trailing carriage-return chars for all headers, which are not in the headers in the mbox file. This breaks sa-learn -- both these variations are different, and SA would learn *both* when run against each one separately. How comes? Any insight? How could I fix this, other than wrapping the sa-learn inside another shell script and have sed strip off the noise? This becomes more of an issue, once I switch from sa-learn to the lightning-fast spamc training variant. TIA guenther [1] Yes, I know, sorry. Don't want to change everything at the same time, and the target system I'm experimenting for runs that version, too. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}