Re: sanitizing/normalizing messages for feeding sa-learn

2014-08-28 Thread Matus UHLAR - fantomas
imap and feeding sa-learn, but they've been a bit adulterated by the time they're retrieved, and i believe some cleanup is probably necessary prior to feeding sa-learn. Should not be that necessary. Hopefully Zimbra does not alter messages as bad as Outlook/Exchange does (what should I tell you

sanitizing/normalizing messages for feeding sa-learn

2014-08-27 Thread btb
-learn, but they've been a bit adulterated by the time they're retrieved, and i believe some cleanup is probably necessary prior to feeding sa-learn. here are two samples: http://dpaste.com/0B6S3FN.txt [claimed to be spam] http://dpaste.com/3ZZ733Z.txt [claimed to be not spam] the original message

Re: sanitizing/normalizing messages for feeding sa-learn

2014-08-27 Thread Quanah Gibson-Mount
delivered to a mailbox. i intend on retrieving these messages via imap and feeding sa-learn, but they've been a bit adulterated by the time they're retrieved, and i believe some cleanup is probably necessary prior to feeding sa-learn. That seems rather convoluted, given that Zimbra already trains

Re: sanitizing/normalizing messages for feeding sa-learn

2014-08-27 Thread listsb-spamassassin
. this generates a message [containing the selected message] which is ultimately delivered to a mailbox. i intend on retrieving these messages via imap and feeding sa-learn, but they've been a bit adulterated by the time they're retrieved, and i believe some cleanup is probably necessary prior

Re: Feeding SA-learn

2008-01-24 Thread Anthony Peacock
John Thompson wrote: On 2008-01-23, Anthony Peacock [EMAIL PROTECTED] wrote: My intention was to manually feed the few spam messages that slip thru undetected. By the time I get a hold of those, they are in the recipient's mail client inbox, not in the server. I was thinking, if I save the

Re: Feeding SA-learn

2008-01-24 Thread Diego Pomatta
John Thompson escribió: On 2008-01-23, Diego Pomatta [EMAIL PROTECTED] wrote: I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk (53.172k). The msf file must be some kind of index. I just feed the biggest one to sa-learn? Yup. Use sa-learn --spam --mbox

Re: Feeding SA-learn

2008-01-24 Thread Anthony Peacock
Diego Pomatta wrote: John Thompson escribió: On 2008-01-23, Diego Pomatta [EMAIL PROTECTED] wrote: I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk (53.172k). The msf file must be some kind of index. I just feed the biggest one to sa-learn? Yup. Use

Re: Feeding SA-learn

2008-01-24 Thread John Thompson
On 2008-01-24, Anthony Peacock [EMAIL PROTECTED] wrote: John Thompson wrote: Isn't that what cron is for? :-) I have a cron job on my imap server to regularly feed ham and spam through sa-learn. I have a cron job that runs the learning process nightly. I was refering to the process

Re: Feeding SA-learn

2008-01-24 Thread John Thompson
On 2008-01-24, Mark Johnson [EMAIL PROTECTED] wrote: John Thompson wrote: Isn't that what cron is for? :-) I have a cron job on my imap server to regularly feed ham and spam through sa-learn. Do you delete the messages from the IMAP folder after you learn them? If so, how do you go

Re: Feeding SA-learn

2008-01-24 Thread Mark Johnson
John Thompson wrote: No. I use Thunderbird and just set the Junk filter controls to expire junk messages after a couple weeks. Interesting idea! Thanks for the tips! You have no idea how much time and how many steps this is going to save me. -- Mark Johnson

Re: Feeding SA-learn

2008-01-23 Thread Diego Pomatta
Anthony Peacock escribió: Can I feed a plain text file representing just the body of a message to sa-learn? /Diego Yes you can, who to stop it? I just sent your message body as --ham, and it told it learned one message. I meant without the headers, just the body. ok thanks

Re: Feeding SA-learn

2008-01-23 Thread Anthony Peacock
Diego Pomatta wrote: Anthony Peacock escribió: Can I feed a plain text file representing just the body of a message to sa-learn? /Diego Yes you can, who to stop it? I just sent your message body as --ham, and it told it learned one message. I meant without the headers, just the

Re: Feeding SA-learn

2008-01-23 Thread Diego Pomatta
Anthony Peacock escribió: Well the short answer is, yes you can. The slightly longer answer is that you won't get as good results doing this, as the Bayes system uses tokens found in the complete message. By only learning on the body you will not gain any advantage for tokens found in

Re: Feeding SA-learn

2008-01-23 Thread Anthony Peacock
Diego Pomatta wrote: Anthony Peacock escribió: Well the short answer is, yes you can. The slightly longer answer is that you won't get as good results doing this, as the Bayes system uses tokens found in the complete message. By only learning on the body you will not gain any advantage for

Re: Feeding SA-learn

2008-01-23 Thread Mark Johnson
Depends on the client. For instance, Thunderbird stores it's folders in mbox format, so sa-learn can work against those files as-is. Other email clients can save emails in text format complete with headers. I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk

Re: Feeding SA-learn

2008-01-23 Thread Anthony Peacock
Mark Johnson wrote: Depends on the client. For instance, Thunderbird stores it's folders in mbox format, so sa-learn can work against those files as-is. Other email clients can save emails in text format complete with headers. I use Thunderbird. There are two files for that folder: Junk.msf

Re: Feeding SA-learn

2008-01-23 Thread John Thompson
On 2008-01-23, Anthony Peacock [EMAIL PROTECTED] wrote: My intention was to manually feed the few spam messages that slip thru undetected. By the time I get a hold of those, they are in the recipient's mail client inbox, not in the server. I was thinking, if I save the mail as EML files,

Re: Feeding SA-learn

2008-01-23 Thread John Thompson
On 2008-01-23, Diego Pomatta [EMAIL PROTECTED] wrote: I use Thunderbird. There are two files for that folder: Junk.msf (7k) and Junk (53.172k). The msf file must be some kind of index. I just feed the biggest one to sa-learn? Yup. Use sa-learn --spam --mbox Junk to learn your spam. You'll

Re: Feeding SA-learn

2008-01-23 Thread John Thompson
On 2008-01-23, Mark Johnson [EMAIL PROTECTED] wrote: My emails are stored on an IMAP server and what you suggested wasn't I use Thunderbird as my mail client but have found that I needed to use Evolution to save the messages in mbox format, which was always a hassle. mbox is already the

Re: Feeding SA-learn

2008-01-23 Thread Mark Johnson
John Thompson wrote: Isn't that what cron is for? :-) I have a cron job on my imap server to regularly feed ham and spam through sa-learn. Do you delete the messages from the IMAP folder after you learn them? If so, how do you go about that? I'm pretty sure if I deleted the mail files

Feeding SA-learn

2008-01-21 Thread Diego Pomatta
Hey list, Can I feed a plain text file representing just the body of a message to sa-learn? /Diego

Re: Feeding SA-learn

2008-01-21 Thread Jari Fredriksson
Hey list, Can I feed a plain text file representing just the body of a message to sa-learn? /Diego Yes you can, who to stop it? I just sent your message body as --ham, and it told it learned one message.

Re: Feeding SA-learn

2008-01-21 Thread Diego Pomatta
Jari Fredriksson escribió: Hey list, Can I feed a plain text file representing just the body of a message to sa-learn? /Diego Yes you can, who to stop it? I just sent your message body as --ham, and it told it learned one message. I meant without the headers, just the body. ok

Re: Feeding SA-learn

2008-01-21 Thread Anthony Peacock
Diego Pomatta wrote: Jari Fredriksson escribió: Hey list, Can I feed a plain text file representing just the body of a message to sa-learn? /Diego Yes you can, who to stop it? I just sent your message body as --ham, and it told it learned one message. I meant without the