Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns

2009-10-28 Thread Johannes Berg
On Tue, 2009-10-27 at 19:28 -0400, Timo Sirainen wrote:
 On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote:
  The mail that is being trained is different than its respective source
  in the mbox file. The trained one shows added, trailing carriage-return
  chars for all headers, which are not in the headers in the mbox file.
  
  This breaks sa-learn -- both these variations are different, and SA
  would learn *both* when run against each one separately.
  
  How comes? Any insight? 
 
 Probably because incoming mails have CRLF linefeeds. Antispam plugin
 could drop these by wrapping the mail_get_stream()'s returned input
 stream to i_stream_create_lf().

I'm not sure this is what we want -- shouldn't we keep it as pristine as
possible?

However, I don't understand Karsten anyway, which message is the
trained one? Karsten, please list the three relevant messages: the one
first handed to SA _before_ dovecot gets involved, the one stored, and
the one handed to SA via antispam.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns

2009-10-28 Thread Timo Sirainen

On Oct 28, 2009, at 3:02 AM, Johannes Berg wrote:


Probably because incoming mails have CRLF linefeeds. Antispam plugin
could drop these by wrapping the mail_get_stream()'s returned input
stream to i_stream_create_lf().


I'm not sure this is what we want -- shouldn't we keep it as  
pristine as

possible?


The linefeeds don't really matter much. For example IMAP and SMTP  
require sending CRLF linefeeds and Dovecot always converts them to  
just LFs before saving the messages. Then depending on how the input  
comes, it may have CRLF or LF linefeeds.




Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns

2009-10-28 Thread Johannes Berg
On Wed, 2009-10-28 at 03:07 -0400, Timo Sirainen wrote:
 On Oct 28, 2009, at 3:02 AM, Johannes Berg wrote:
 
  Probably because incoming mails have CRLF linefeeds. Antispam plugin
  could drop these by wrapping the mail_get_stream()'s returned input
  stream to i_stream_create_lf().
 
  I'm not sure this is what we want -- shouldn't we keep it as  
  pristine as
  possible?
 
 The linefeeds don't really matter much. For example IMAP and SMTP  
 require sending CRLF linefeeds and Dovecot always converts them to  
 just LFs before saving the messages. Then depending on how the input  
 comes, it may have CRLF or LF linefeeds.

Indeed. But I think Karsten is saying that his mail comes with CRLF via
SMTP, and is seen by SA with CRLF, and then when it comes back to SA via
antispam, it now has just LF. In a sense, dovecot is normalising
linefeeds to LF, which seems to be causing him problems.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns

2009-10-27 Thread Timo Sirainen
On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote:
 The mail that is being trained is different than its respective source
 in the mbox file. The trained one shows added, trailing carriage-return
 chars for all headers, which are not in the headers in the mbox file.
 
 This breaks sa-learn -- both these variations are different, and SA
 would learn *both* when run against each one separately.
 
 How comes? Any insight? 

Probably because incoming mails have CRLF linefeeds. Antispam plugin
could drop these by wrapping the mail_get_stream()'s returned input
stream to i_stream_create_lf().


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] antispam-plugin 1.2 and trailing carriage-returns

2009-10-17 Thread Karsten Bräckelmann
*nudge*  Anyone? Since Timo seems to be on a list processing spree
lately, here's hoping. :)

On Tue, 2009-09-01 at 22:20 +0200, Karsten Bräckelmann wrote:
 Guys,
 
 Dovecot 1.0.15 [1], just built the latest antispam-plugin 1.2 (tarball)
 for testing, mailtrain backend for SA integration. Both built from
 custom spec files.
 
 The mail that is being trained is different than its respective source
 in the mbox file. The trained one shows added, trailing carriage-return
 chars for all headers, which are not in the headers in the mbox file.
 
 This breaks sa-learn -- both these variations are different, and SA
 would learn *both* when run against each one separately.
 
 How comes? Any insight? How could I fix this, other than wrapping the
 sa-learn inside another shell script and have sed strip off the noise?
 This becomes more of an issue, once I switch from sa-learn to the
 lightning-fast spamc training variant.
 
 TIA
 
   guenther
 
 
 [1] Yes, I know, sorry. Don't want to change everything at the same
 time, and the target system I'm experimenting for runs that version,
 too.

-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



[Dovecot] antispam-plugin 1.2 and trailing carriage-returns

2009-09-01 Thread Karsten Bräckelmann
Guys,

Dovecot 1.0.15 [1], just built the latest antispam-plugin 1.2 (tarball)
for testing, mailtrain backend for SA integration. Both built from
custom spec files.

The mail that is being trained is different than its respective source
in the mbox file. The trained one shows added, trailing carriage-return
chars for all headers, which are not in the headers in the mbox file.

This breaks sa-learn -- both these variations are different, and SA
would learn *both* when run against each one separately.

How comes? Any insight? How could I fix this, other than wrapping the
sa-learn inside another shell script and have sed strip off the noise?
This becomes more of an issue, once I switch from sa-learn to the
lightning-fast spamc training variant.

TIA

  guenther


[1] Yes, I know, sorry. Don't want to change everything at the same
time, and the target system I'm experimenting for runs that version,
too.

-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}