Re: collecting mail for sa-learn, how to?
Are you actually READING this list? Sent Jul 11, Jul 14, and now again Jul 17. Identical text, including typos. Got quite a few replies and discussion. No follow up by you, though. Please stop sending the same question over and over again, if you are not reading the replies. guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: collecting mail for sa-learn, how to?
Soz, I just saw that. Until today my attempts to mail the subscibe address on this list were'nt resulting in an autoreply etc. I only recieved confirmation I was subscribed to this list some 20 mins ago, im taking a look now at the replys thanks Andy. - Original Message - From: Karsten Bräckelmann [EMAIL PROTECTED] To: Andy Smith [EMAIL PROTECTED] Cc: users@spamassassin.apache.org Sent: Thursday, July 17, 2008 2:23 PM Subject: Re: collecting mail for sa-learn, how to? Are you actually READING this list? Sent Jul 11, Jul 14, and now again Jul 17. Identical text, including typos. Got quite a few replies and discussion. No follow up by you, though. Please stop sending the same question over and over again, if you are not reading the replies. guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: collecting mail for sa-learn, how to?
On Thu, 2008-07-17 at 13:39 +0200, Andy Smith wrote: Soz, I just saw that. Until today my attempts to mail the subscibe address on this list were'nt resulting in an autoreply etc. I only recieved confirmation I was subscribed to this list some 20 mins ago, im taking a look now at the replys You can find a wealth of funky, almost-usable archives here: http://wiki.apache.org/spamassassin/MailingLists Lots of bling. Less usability, alas. Enjoy guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: collecting mail for sa-learn, how to?
Hi All, thanks very much for all the replies and discussion around my original post, and appologies for not replying more promptly, Ive only just managed to successfully subscribe to the list and managed to confuse myself looking at the forum archives (I think there had been some delays to when my posts appeared blah blah blah :P ) Anyway, thanks for clarifying the requirements of sa-learn. I think the best options sounds like it will be this: This is what I do: Forwarding the unrecognised message to an account which will process the message through sal-wrapper.pl. You will find further informations here: https://po2.uni-stuttgart.de/~rusjako/sal-wrapper Thanks to Stefan for this suggestion. Reason being it doesnt impose the need for an IMAP client on the users and I think its the simplest option from the point of view of the end users, ie if you recieve spam and wish to report it please simply forward the email and it will be analysed by the spam filter software Sounds good to me :P Ive downloaded this and and will do some eval on my systems. thanks alot!! Andy.
Re: collecting mail for sa-learn, how to?
DAve escribió: We have had good luck by setting the email clients of *trusted* users to leave their mail on the server for 1 day. The users can then login to their webmail and move the spam to a SPAM folder and a selection of ham to a HAM folder. I train bayes on those folders each night. By retaining the messages I train with for seven days, I can go back and relearn any improperly classified messages if needed. The key part is *trusted* users. Heh, in my case I really don't like having to re-train anything. I like to be sure when I train that if I tell sa-learn that a mail is spam, it is 100% spam. That's why I weekly collect spammy mail from a bunch of trusted users and re filter it myself before passing it to sa-learn. /Diego [ Ensign , you may impress *me*. -- Worf ]
Re: collecting mail for sa-learn, how to?
John Hardin wrote: On Mon, 2008-07-14 at 14:11 -0400, DAve wrote: John Hardin wrote: On Mon, 2008-07-14 at 12:16 -0400, DAve wrote: andys wrote: for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? thanks for any comments, Andy :P We have had good luck by setting the email clients of *trusted* users to leave their mail on the server for 1 day. The users can then login to their webmail and move the spam to a SPAM folder and a selection of ham to a HAM folder. I train bayes on those folders each night. That requires IMAP, though, correct? That depends on the webmail software he uses ...where does Andy mention webmail? He doesn't, which is why I made no assumption that IMAP would be a requirement. It may not. DAve -- Don't tell me I'm driving the cart!
Re: collecting mail for sa-learn, how to?
Diego Pomatta wrote: DAve escribió: We have had good luck by setting the email clients of *trusted* users to leave their mail on the server for 1 day. The users can then login to their webmail and move the spam to a SPAM folder and a selection of ham to a HAM folder. I train bayes on those folders each night. By retaining the messages I train with for seven days, I can go back and relearn any improperly classified messages if needed. The key part is *trusted* users. Heh, in my case I really don't like having to re-train anything. I like to be sure when I train that if I tell sa-learn that a mail is spam, it is 100% spam. That's why I weekly collect spammy mail from a bunch of trusted users and re filter it myself before passing it to sa-learn. I haven't yet, but keeping the files for a few days just in case certainly doesn't hurt, and could prove useful. They could also be used to create a new bayes db in a hurry if something goes wrong with your existing db. DAve -- Don't tell me I'm driving the cart!
Re: collecting mail for sa-learn, how to?
DAve escribió: Diego Pomatta wrote: Heh, in my case I really don't like having to re-train anything. I like to be sure when I train that if I tell sa-learn that a mail is spam, it is 100% spam. That's why I weekly collect spammy mail from a bunch of trusted users and re filter it myself before passing it to sa-learn. I haven't yet, but keeping the files for a few days just in case certainly doesn't hurt, and could prove useful. They could also be used to create a new bayes db in a hurry if something goes wrong with your existing db. DAve Yes, I keep the spam mail in a mbox folder/file for that purpose, too. Diego [Scott me up, Beammy!]
RE: collecting mail for sa-learn, how to?
Heh, in my case I really don't like having to re-train anything. I like to be sure when I train that if I tell sa-learn that a mail is spam, it is 100% spam. That's why I weekly collect spammy mail from a bunch of trusted users and re filter it myself before passing it to sa-learn. Diego and list, Isn't the timeliness of the training of spam important? Isn't spam trained immediately (close to realtime) more effective than spam trained well after spammer mail runs? - rh
Re: collecting mail for sa-learn, how to?
Robert - elists wrote: Heh, in my case I really don't like having to re-train anything. I like to be sure when I train that if I tell sa-learn that a mail is spam, it is 100% spam. That's why I weekly collect spammy mail from a bunch of trusted users and re filter it myself before passing it to sa-learn. Diego and list, Isn't the timeliness of the training of spam important? Isn't spam trained immediately (close to realtime) more effective than spam trained well after spammer mail runs? - rh In my experience yes. We train each evening within hours of the users doing their selections. DAve -- Don't tell me I'm driving the cart!
Re: collecting mail for sa-learn, how to?
On Tue, 2008-07-15 at 08:55 -0400, DAve wrote: They could also be used to create a new bayes db in a hurry if something goes wrong with your existing db. Absolutely. If you're manually training you want to retain your training corpa to troubleshoot, correct errors, and rebuild from scratch if needed. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control enables genocide while doing little to reduce crime. --- Tomorrow: the 63rd anniversary of the dawn of the Atomic Age
Re: collecting mail for sa-learn, how to?
Robert - elists wrote: Heh, in my case I really don't like having to re-train anything. I like to be sure when I train that if I tell sa-learn that a mail is spam, it is 100% spam. That's why I weekly collect spammy mail from a bunch of trusted users and re filter it myself before passing it to sa-learn. Diego and list, Isn't the timeliness of the training of spam important? Isn't spam trained immediately (close to realtime) more effective than spam trained well after spammer mail runs? It would even be more effective to train your bayes before spam is received :) come on... for me, the goal of bayes is to detect mail that is legitimate because it resembles legitimate mail. the fact that spammers change their practice doesn't matter because legitimate users do not. of course, learning as fast as possible is helpful to block new spam. but I am not going to watch my mailbox in real time just for that. This would be worst then hit delete button.
Re: collecting mail for sa-learn, how to?
On Friday 11 July 2008 17:29, andys wrote: Hi, Hello, for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? This is what I do: Forwarding the unrecognised message to an account which will process the message through sal-wrapper.pl. You will find further informations here: https://po2.uni-stuttgart.de/~rusjako/sal-wrapper thanks for any comments, Andy :P Greetings Stefan pgpSXDVVrL9bO.pgp Description: PGP signature
Re: collecting mail for sa-learn, how to?
On Mon, 2008-07-14 at 15:48 +0200, Stefan Jakobs wrote: On Friday 11 July 2008 17:29, andys wrote: for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? This is what I do: Forwarding the unrecognised message to an account which will process the message through sal-wrapper.pl. You will find further informations here: https://po2.uni-stuttgart.de/~rusjako/sal-wrapper Forwarding alters the message, you will not get reliable results. You can, of course, use auto-learn and let SA take care of it. If you want your users to classify, the best way is to use IMAP instead of POP, and provide server-side training folders that sa-learn can see. If IMAP is not an option then this obviously won't work. If procmail is in use as the LDA, you could set up a rule to clone to a local ham folder to do scheduled training. You could get creative with rules and have it collect a randomly-chosen subset of the ham traffic, or only train where the score is low and the message is not already BAYES_00 or the score is high and the message is not already BAYES_99. However, this would be cloning users' mail (even if only temporarily), and you should obtain their consent before doing this. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Usually Microsoft doesn't develop products, we buy products. -- Arno Edelmann, Microsoft product manager --- 2 days until the 63rd anniversary of the dawn of the Atomic Age
Re: collecting mail for sa-learn, how to?
On Monday 14 July 2008 16:27, John Hardin wrote: On Mon, 2008-07-14 at 15:48 +0200, Stefan Jakobs wrote: On Friday 11 July 2008 17:29, andys wrote: for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? This is what I do: Forwarding the unrecognised message to an account which will process the message through sal-wrapper.pl. You will find further informations here: https://po2.uni-stuttgart.de/~rusjako/sal-wrapper Forwarding alters the message, you will not get reliable results. Sorry, I should be more clear. The unrecognised message is in the appendix of the forwarding message. sal-wrapper will unpack the message from the appendix and feed it to sa-learn. snip Greetings Stefan pgpnVknJpL9Bh.pgp Description: PGP signature
Re: collecting mail for sa-learn, how to?
andys escribió: Hi, for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? thanks for any comments, Andy :P I have a similar situation here. What I do is instruct several key users to move the spam that still slips through, to a spam folder in their client. I then copy or move those folders regulary (once a week or so) over the network to my computer, import them all to a folder in my Mozilla Thunderbird, and check the mails (because sometimes what users think is spam, actually isn't). The headers remain intact. Then I feed my thunderbird spam folder (mbox format) to sa-learn. I happen to use thunderbird, that use mbox file format to store mails, but there are programs out there that convert Outlook or Outlook express folders to mbox format, too. Many parts of this process can be automatized with scripts. Regards. /Diego
Re: collecting mail for sa-learn, how to?
andys wrote: Hi, for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? thanks for any comments, Andy :P We have had good luck by setting the email clients of *trusted* users to leave their mail on the server for 1 day. The users can then login to their webmail and move the spam to a SPAM folder and a selection of ham to a HAM folder. I train bayes on those folders each night. By retaining the messages I train with for seven days, I can go back and relearn any improperly classified messages if needed. The key part is *trusted* users. DAve -- Don't tell me I'm driving the cart!
Re: collecting mail for sa-learn, how to?
On Mon, 2008-07-14 at 12:16 -0400, DAve wrote: andys wrote: Hi, for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? thanks for any comments, Andy :P We have had good luck by setting the email clients of *trusted* users to leave their mail on the server for 1 day. The users can then login to their webmail and move the spam to a SPAM folder and a selection of ham to a HAM folder. I train bayes on those folders each night. That requires IMAP, though, correct? That actually may work for Andy - set up both POP and IMAP, and for selected users have them use IMAP rather then POP and provide them with server-side ham and spam training folders. That won't require all users to use IMAP, with the resulting storage requirements on the server. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...every time I sit down in front of a Windows machine I feel as if the computer is just a place for the manufacturers to put their advertising.-- fwadling on Y! SCOX -- 2 days until the 63rd anniversary of the dawn of the Atomic Age
Re: collecting mail for sa-learn, how to?
John Hardin wrote: On Mon, 2008-07-14 at 12:16 -0400, DAve wrote: andys wrote: Hi, for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? thanks for any comments, Andy :P We have had good luck by setting the email clients of *trusted* users to leave their mail on the server for 1 day. The users can then login to their webmail and move the spam to a SPAM folder and a selection of ham to a HAM folder. I train bayes on those folders each night. That requires IMAP, though, correct? That depends on the webmail software he uses and the location and permissions of his mailboxes. We use a webmail product utilizing IMAP, there are some that do not require IMAP services to be running. That actually may work for Andy - set up both POP and IMAP, and for selected users have them use IMAP rather then POP and provide them with server-side ham and spam training folders. That won't require all users to use IMAP, with the resulting storage requirements on the server. Even if his webmail requires IMAP, he doesn't need to make his users use IMAP. We provide IMAP only for webmail, not for mail clients. IMAP access is available only on 127.0.0.1. I would think that would work for him as well. That is why we have the POP client leave the message on the server for 1 day. So that a spam message is still accessible to webmail after it arrives in the POP client's mail folder. DAve -- Don't tell me I'm driving the cart!
Re: collecting mail for sa-learn, how to?
On Mon, 2008-07-14 at 14:11 -0400, DAve wrote: John Hardin wrote: On Mon, 2008-07-14 at 12:16 -0400, DAve wrote: andys wrote: for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a mail and decides that it is in fact spam its already been removed from the mail server. If the user forwards the mail to a special mailbox for processing then the mail is obviously now different from the original spam, the user is the sender etc. Will sa-learn still work using this method? and if not what else can I implement that would work? thanks for any comments, Andy :P We have had good luck by setting the email clients of *trusted* users to leave their mail on the server for 1 day. The users can then login to their webmail and move the spam to a SPAM folder and a selection of ham to a HAM folder. I train bayes on those folders each night. That requires IMAP, though, correct? That depends on the webmail software he uses ...where does Andy mention webmail? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Windows and its users got mentioned at home today, after my wife the psych major brought up Seligman's theory of learned helplessness. -- Dan Birchall in a.s.r --- 2 days until the 63rd anniversary of the dawn of the Atomic Age