Re: collecting mail for sa-learn, how to?

2008-07-17 Thread Karsten Bräckelmann
Are you actually READING this list?

Sent Jul 11, Jul 14, and now again Jul 17. Identical text, including
typos. Got quite a few replies and discussion. No follow up by you,
though.

Please stop sending the same question over and over again, if you are
not reading the replies.

  guenther


-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: collecting mail for sa-learn, how to?

2008-07-17 Thread Andy Smith
Soz, I just saw that. Until today my attempts to mail the subscibe address 
on this list were'nt resulting in an autoreply etc.
I only recieved confirmation I was subscribed to this list some 20 mins ago, 
im taking a look now at the replys


thanks Andy.


- Original Message - 
From: Karsten Bräckelmann [EMAIL PROTECTED]

To: Andy Smith [EMAIL PROTECTED]
Cc: users@spamassassin.apache.org
Sent: Thursday, July 17, 2008 2:23 PM
Subject: Re: collecting mail for sa-learn, how to?



Are you actually READING this list?

Sent Jul 11, Jul 14, and now again Jul 17. Identical text, including
typos. Got quite a few replies and discussion. No follow up by you,
though.

Please stop sending the same question over and over again, if you are
not reading the replies.

 guenther


--
char 
*t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? 
c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ 
putchar(t[s]);h=m;s=0; }}}







Re: collecting mail for sa-learn, how to?

2008-07-17 Thread Karsten Bräckelmann
On Thu, 2008-07-17 at 13:39 +0200, Andy Smith wrote:
 Soz, I just saw that. Until today my attempts to mail the subscibe address 
 on this list were'nt resulting in an autoreply etc.
 I only recieved confirmation I was subscribed to this list some 20 mins ago, 
 im taking a look now at the replys

You can find a wealth of funky, almost-usable archives here:
  http://wiki.apache.org/spamassassin/MailingLists

Lots of bling. Less usability, alas.  Enjoy

  guenther


-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: collecting mail for sa-learn, how to?

2008-07-17 Thread Andy Smith

Hi All,

 thanks very much for all the replies and discussion around my original 
post, and appologies for not replying
more promptly, Ive only just managed to successfully subscribe to the list 
and managed to confuse myself looking
at the forum archives (I think there had been some delays to when my posts 
appeared blah blah blah :P )


Anyway, thanks for clarifying the requirements of sa-learn.
I think the best options sounds like it will be this:

This is what I do:
Forwarding the unrecognised message to an account which will process the
message through sal-wrapper.pl. You will find further informations here:
https://po2.uni-stuttgart.de/~rusjako/sal-wrapper

Thanks to Stefan for this suggestion.

Reason being it doesnt impose the need for an IMAP client on the users and I 
think its the simplest
option from the point of view of the end users, ie if you recieve spam and 
wish to report it please
simply forward the email and it will be analysed by the spam filter 
software Sounds good to me :P

Ive downloaded this and and will do some eval on my systems.

thanks alot!! Andy. 



Re: collecting mail for sa-learn, how to?

2008-07-15 Thread Diego Pomatta

DAve escribió:
We have had good luck by setting the email clients of *trusted* users 
to leave their mail on the server for 1 day. The users can then login 
to their webmail and move the spam to a SPAM folder and a selection of 
ham to a HAM folder. I train bayes on those folders each night.


By retaining the messages I train with for seven days, I can go back 
and relearn any improperly classified messages if needed.


The key part is *trusted* users.

Heh, in my case I really don't like having to re-train anything. I like 
to be sure when I train that if I tell sa-learn that a mail is spam, it 
is 100% spam. That's why I weekly collect spammy mail from a bunch of 
trusted users and re filter it myself before passing it to sa-learn.


/Diego
[ Ensign , you may impress *me*. -- Worf ]


Re: collecting mail for sa-learn, how to?

2008-07-15 Thread DAve

John Hardin wrote:

On Mon, 2008-07-14 at 14:11 -0400, DAve wrote:

John Hardin wrote:

On Mon, 2008-07-14 at 12:16 -0400, DAve wrote:

andys wrote:

for a mail server running email for multiple domains what is the
typical/recommended way to collect emails which arent detected as spam to
be processed by sa-learn? Users are downloading mail via POP3, so once a
users sees a mail and decides that it is in fact spam its already been
removed from the mail server. If the user forwards the mail to a special
mailbox for processing then the mail is obviously now different from the
original spam, the user is the sender etc. Will sa-learn still work using
this method? and if not what else can I implement that would work?
thanks for any comments, Andy :P
We have had good luck by setting the email clients of *trusted* users 
to leave their mail on the server for 1 day. The users can then login to 
their webmail and move the spam to a SPAM folder and a selection of ham 
to a HAM folder. I train bayes on those folders each night.

That requires IMAP, though, correct?

That depends on the webmail software he uses


...where does Andy mention webmail?

He doesn't, which is why I made no assumption that IMAP would be a 
requirement. It may not.


DAve


--
Don't tell me I'm driving the cart!


Re: collecting mail for sa-learn, how to?

2008-07-15 Thread DAve

Diego Pomatta wrote:

DAve escribió:
We have had good luck by setting the email clients of *trusted* users 
to leave their mail on the server for 1 day. The users can then login 
to their webmail and move the spam to a SPAM folder and a selection of 
ham to a HAM folder. I train bayes on those folders each night.


By retaining the messages I train with for seven days, I can go back 
and relearn any improperly classified messages if needed.


The key part is *trusted* users.

Heh, in my case I really don't like having to re-train anything. I like 
to be sure when I train that if I tell sa-learn that a mail is spam, it 
is 100% spam. That's why I weekly collect spammy mail from a bunch of 
trusted users and re filter it myself before passing it to sa-learn.




I haven't yet, but keeping the files for a few days just in case 
certainly doesn't hurt, and could prove useful. They could also be used 
to create a new bayes db in a hurry if something goes wrong with your 
existing db.


DAve

--
Don't tell me I'm driving the cart!


Re: collecting mail for sa-learn, how to?

2008-07-15 Thread Diego Pomatta

DAve escribió:

Diego Pomatta wrote:
Heh, in my case I really don't like having to re-train anything. I 
like to be sure when I train that if I tell sa-learn that a mail is 
spam, it is 100% spam. That's why I weekly collect spammy mail from a 
bunch of trusted users and re filter it myself before passing it to 
sa-learn.




I haven't yet, but keeping the files for a few days just in case 
certainly doesn't hurt, and could prove useful. They could also be 
used to create a new bayes db in a hurry if something goes wrong with 
your existing db.


DAve


Yes, I keep the spam mail in a mbox folder/file for that purpose, too.

Diego
[Scott me up, Beammy!]


RE: collecting mail for sa-learn, how to?

2008-07-15 Thread Robert - elists

 
 Heh, in my case I really don't like having to re-train anything. I like
 to be sure when I train that if I tell sa-learn that a mail is spam, it
 is 100% spam. That's why I weekly collect spammy mail from a bunch of
 trusted users and re filter it myself before passing it to sa-learn.
 

Diego and list,

Isn't the timeliness of the training of spam important?

Isn't spam trained immediately (close to realtime) more effective than spam
trained well after spammer mail runs?

 - rh



Re: collecting mail for sa-learn, how to?

2008-07-15 Thread DAve

Robert - elists wrote:

Heh, in my case I really don't like having to re-train anything. I like
to be sure when I train that if I tell sa-learn that a mail is spam, it
is 100% spam. That's why I weekly collect spammy mail from a bunch of
trusted users and re filter it myself before passing it to sa-learn.



Diego and list,

Isn't the timeliness of the training of spam important?

Isn't spam trained immediately (close to realtime) more effective than spam
trained well after spammer mail runs?

 - rh


In my experience yes. We train each evening within hours of the users 
doing their selections.


DAve


--
Don't tell me I'm driving the cart!


Re: collecting mail for sa-learn, how to?

2008-07-15 Thread John Hardin

On Tue, 2008-07-15 at 08:55 -0400, DAve wrote:

 They could also be used 
 to create a new bayes db in a hurry if something goes wrong with your 
 existing db.

Absolutely. If you're manually training you want to retain your training
corpa to troubleshoot, correct errors, and rebuild from scratch if
needed.

-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control enables genocide while doing little to reduce crime.
---
 Tomorrow: the 63rd anniversary of the dawn of the Atomic Age



Re: collecting mail for sa-learn, how to?

2008-07-15 Thread mouss

Robert - elists wrote:

Heh, in my case I really don't like having to re-train anything. I like
to be sure when I train that if I tell sa-learn that a mail is spam, it
is 100% spam. That's why I weekly collect spammy mail from a bunch of
trusted users and re filter it myself before passing it to sa-learn.



Diego and list,

Isn't the timeliness of the training of spam important?

Isn't spam trained immediately (close to realtime) more effective than spam
trained well after spammer mail runs?


It would even be more effective to train your bayes before spam is 
received :) come on...


for me, the goal of bayes is to detect mail that is legitimate because 
it resembles legitimate mail. the fact that spammers change their 
practice doesn't matter because legitimate users do not.


of course, learning as fast as possible is helpful to block new spam. 
but I am not going to watch my mailbox in real time just for that. This 
would be worst then hit delete button.




Re: collecting mail for sa-learn, how to?

2008-07-14 Thread Stefan Jakobs
On Friday 11 July 2008 17:29, andys wrote:
 Hi,

Hello,

   for a mail server running email for multiple domains what is the
 typical/recommended way to collect emails which arent detected as spam to
 be processed by sa-learn? Users are downloading mail via POP3, so once a
 users sees a mail and decides that it is in fact spam its already been
 removed from the mail server. If the user forwards the mail to a special
 mailbox for processing then the mail is obviously now different from the
 original spam, the user is the sender etc. Will sa-learn still work using
 this method? and if not what else can I implement that would work?

This is what I do:
Forwarding the unrecognised message to an account which will process the 
message through sal-wrapper.pl. You will find further informations here: 
https://po2.uni-stuttgart.de/~rusjako/sal-wrapper

 thanks for any comments, Andy :P

Greetings
Stefan


pgpSXDVVrL9bO.pgp
Description: PGP signature


Re: collecting mail for sa-learn, how to?

2008-07-14 Thread John Hardin

On Mon, 2008-07-14 at 15:48 +0200, Stefan Jakobs wrote:
 On Friday 11 July 2008 17:29, andys wrote:
for a mail server running email for multiple domains what is the
  typical/recommended way to collect emails which arent detected as spam to
  be processed by sa-learn? Users are downloading mail via POP3, so once a
  users sees a mail and decides that it is in fact spam its already been
  removed from the mail server. If the user forwards the mail to a special
  mailbox for processing then the mail is obviously now different from the
  original spam, the user is the sender etc. Will sa-learn still work using
  this method? and if not what else can I implement that would work?
 
 This is what I do:
 Forwarding the unrecognised message to an account which will process the 
 message through sal-wrapper.pl. You will find further informations here: 
 https://po2.uni-stuttgart.de/~rusjako/sal-wrapper

Forwarding alters the message, you will not get reliable results.

You can, of course, use auto-learn and let SA take care of it.

If you want your users to classify, the best way is to use IMAP instead
of POP, and provide server-side training folders that sa-learn can see.
If IMAP is not an option then this obviously won't work.

If procmail is in use as the LDA, you could set up a rule to clone to a
local ham folder to do scheduled training. You could get creative with
rules and have it collect a randomly-chosen subset of the ham traffic,
or only train where the score is low and the message is not already
BAYES_00 or the score is high and the message is not already BAYES_99.
However, this would be cloning users' mail (even if only temporarily),
and you should obtain their consent before doing this.

-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Usually Microsoft doesn't develop products, we buy products.
  -- Arno Edelmann, Microsoft product manager
---
 2 days until the 63rd anniversary of the dawn of the Atomic Age



Re: collecting mail for sa-learn, how to?

2008-07-14 Thread Stefan Jakobs
On Monday 14 July 2008 16:27, John Hardin wrote:
 On Mon, 2008-07-14 at 15:48 +0200, Stefan Jakobs wrote:
  On Friday 11 July 2008 17:29, andys wrote:
 for a mail server running email for multiple domains what is the
   typical/recommended way to collect emails which arent detected as spam
   to be processed by sa-learn? Users are downloading mail via POP3, so
   once a users sees a mail and decides that it is in fact spam its
   already been removed from the mail server. If the user forwards the
   mail to a special mailbox for processing then the mail is obviously now
   different from the original spam, the user is the sender etc. Will
   sa-learn still work using this method? and if not what else can I
   implement that would work?
 
  This is what I do:
  Forwarding the unrecognised message to an account which will process the
  message through sal-wrapper.pl. You will find further informations here:
  https://po2.uni-stuttgart.de/~rusjako/sal-wrapper

 Forwarding alters the message, you will not get reliable results.

Sorry, I should be more clear. The unrecognised message is in the appendix of 
the forwarding message. sal-wrapper will unpack the message from the 
appendix and feed it to sa-learn.

snip

Greetings
Stefan


pgpnVknJpL9Bh.pgp
Description: PGP signature


Re: collecting mail for sa-learn, how to?

2008-07-14 Thread Diego Pomatta

andys escribió:

Hi,
 for a mail server running email for multiple domains what is the 
typical/recommended way to collect emails which arent detected as spam 
to be processed by sa-learn? Users are downloading mail via POP3, so 
once a users sees a mail and decides that it is in fact spam its 
already been removed from the mail server. If the user forwards the 
mail to a special mailbox for processing then the mail is obviously 
now different from the original spam, the user is the sender etc. Will 
sa-learn still work using this method? and if not what else can I 
implement that would work?

thanks for any comments, Andy :P



I have a similar situation here.
What I do is instruct several key users to move the spam that still 
slips through, to a spam folder in their client. I then copy or move 
those folders regulary (once a week or so) over the network to my 
computer, import them all to a folder in my Mozilla Thunderbird, and 
check the mails (because sometimes what users think is spam, actually 
isn't). The headers remain intact.


Then I feed my thunderbird spam folder (mbox format) to sa-learn.
I happen to use thunderbird, that use mbox file format to store mails, 
but there are programs out there that convert Outlook or Outlook express 
folders to mbox format, too.

Many parts of this process can be automatized with scripts.

Regards.
/Diego



Re: collecting mail for sa-learn, how to?

2008-07-14 Thread DAve

andys wrote:

Hi,

for a mail server running email for multiple domains what is the
typical/recommended way to collect emails which arent detected as spam to
be processed by sa-learn? Users are downloading mail via POP3, so once a
users sees a mail and decides that it is in fact spam its already been
removed from the mail server. If the user forwards the mail to a special
mailbox for processing then the mail is obviously now different from the
original spam, the user is the sender etc. Will sa-learn still work using
this method? and if not what else can I implement that would work?
thanks for any comments, Andy :P


We have had good luck by setting the email clients of *trusted* users 
to leave their mail on the server for 1 day. The users can then login to 
their webmail and move the spam to a SPAM folder and a selection of ham 
to a HAM folder. I train bayes on those folders each night.


By retaining the messages I train with for seven days, I can go back and 
relearn any improperly classified messages if needed.


The key part is *trusted* users.

DAve


--
Don't tell me I'm driving the cart!


Re: collecting mail for sa-learn, how to?

2008-07-14 Thread John Hardin

On Mon, 2008-07-14 at 12:16 -0400, DAve wrote:
 andys wrote:
  Hi,
  
  for a mail server running email for multiple domains what is the
  typical/recommended way to collect emails which arent detected as spam to
  be processed by sa-learn? Users are downloading mail via POP3, so once a
  users sees a mail and decides that it is in fact spam its already been
  removed from the mail server. If the user forwards the mail to a special
  mailbox for processing then the mail is obviously now different from the
  original spam, the user is the sender etc. Will sa-learn still work using
  this method? and if not what else can I implement that would work?
  thanks for any comments, Andy :P
 
 We have had good luck by setting the email clients of *trusted* users 
 to leave their mail on the server for 1 day. The users can then login to 
 their webmail and move the spam to a SPAM folder and a selection of ham 
 to a HAM folder. I train bayes on those folders each night.

That requires IMAP, though, correct?

That actually may work for Andy - set up both POP and IMAP, and for
selected users have them use IMAP rather then POP and provide them with
server-side ham and spam training folders. That won't require all users
to use IMAP, with the resulting storage requirements on the server.


-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...every time I sit down in front of a Windows machine I feel as
  if the computer is just a place for the manufacturers to put their
  advertising.-- fwadling on Y! SCOX
--
 2 days until the 63rd anniversary of the dawn of the Atomic Age



Re: collecting mail for sa-learn, how to?

2008-07-14 Thread DAve

John Hardin wrote:

On Mon, 2008-07-14 at 12:16 -0400, DAve wrote:

andys wrote:

Hi,

for a mail server running email for multiple domains what is the
typical/recommended way to collect emails which arent detected as spam to
be processed by sa-learn? Users are downloading mail via POP3, so once a
users sees a mail and decides that it is in fact spam its already been
removed from the mail server. If the user forwards the mail to a special
mailbox for processing then the mail is obviously now different from the
original spam, the user is the sender etc. Will sa-learn still work using
this method? and if not what else can I implement that would work?
thanks for any comments, Andy :P
We have had good luck by setting the email clients of *trusted* users 
to leave their mail on the server for 1 day. The users can then login to 
their webmail and move the spam to a SPAM folder and a selection of ham 
to a HAM folder. I train bayes on those folders each night.


That requires IMAP, though, correct?


That depends on the webmail software he uses and the location and 
permissions of his mailboxes. We use a webmail product utilizing IMAP, 
there are some that do not require IMAP services to be running.




That actually may work for Andy - set up both POP and IMAP, and for
selected users have them use IMAP rather then POP and provide them with
server-side ham and spam training folders. That won't require all users
to use IMAP, with the resulting storage requirements on the server.


Even if his webmail requires IMAP, he doesn't need to make his users use 
IMAP. We provide IMAP only for webmail, not for mail clients. IMAP 
access is available only on 127.0.0.1. I would think that would work for 
him as well. That is why we have the POP client leave the message on the 
server for 1 day. So that a spam message is still accessible to webmail 
 after it arrives in the POP client's mail folder.


DAve


--
Don't tell me I'm driving the cart!


Re: collecting mail for sa-learn, how to?

2008-07-14 Thread John Hardin

On Mon, 2008-07-14 at 14:11 -0400, DAve wrote:
 John Hardin wrote:
  On Mon, 2008-07-14 at 12:16 -0400, DAve wrote:
  andys wrote:
 
  for a mail server running email for multiple domains what is the
  typical/recommended way to collect emails which arent detected as spam to
  be processed by sa-learn? Users are downloading mail via POP3, so once a
  users sees a mail and decides that it is in fact spam its already been
  removed from the mail server. If the user forwards the mail to a special
  mailbox for processing then the mail is obviously now different from the
  original spam, the user is the sender etc. Will sa-learn still work using
  this method? and if not what else can I implement that would work?
  thanks for any comments, Andy :P
  We have had good luck by setting the email clients of *trusted* users 
  to leave their mail on the server for 1 day. The users can then login to 
  their webmail and move the spam to a SPAM folder and a selection of ham 
  to a HAM folder. I train bayes on those folders each night.
  
  That requires IMAP, though, correct?
 
 That depends on the webmail software he uses

...where does Andy mention webmail?

-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Windows and its users got mentioned at home today, after my wife the
 psych major brought up Seligman's theory of learned helplessness.
 -- Dan Birchall in a.s.r
---
 2 days until the 63rd anniversary of the dawn of the Atomic Age