Re: Train and use bayes on different adresses
On Thu, 26 Jun 2008, Florian Lindner wrote: Hello, I use (honestly: I plan) the following procedure to filter my spam using SA: All mails are piped through spamc. (emails for my family and me). required_score is set to high value of 9 to avoid false postives. Mail which is detected as spam is being deleted. Refine that a bit. Leave the threshold at 5 so that suspicious messages get marked, but delete at a high level (e.g. 10+) All SA filtering is done on the server side. On the client side additional filtering is done by statistic filters of Apple Mail and Thunderbird. Now I want to train the server SA filter by moving the junk mails (whish have slipped through SA) on the client into an IMAP folder. This is done only with the mail I receive, not the one the rest of family receive. Why not let others train? Just give each user training folders. Will this setup cause any problems? I ask because the bayes filter I train with only my email is used for all email. It's better if you train with all users' email. Note that *you* may actually be doing the training, but it's still their email. Some tools that may help you set things up are available here: http://www.impsec.org/~jhardin/antispam/ Hooking up spamc via procmail, special handling at a given score, and training from per-user spam and ham boxes. The only difference between what you're suggesting and what I'm doing today is that I have two mail servers, one at a hosted site and one at home (fed by fetchmail from the hosted server), so I have some extra glue moving the training folders from the home server's IMAP folders back out to the hosted server where SA runs. All my family have training folders, but I pretty much do all the training classification whenever I'm doing administrative stuff to their systems. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Users mistake widespread adoption of Microsoft Office as the development of a standard document format. --- 8 days until the 232nd anniversary of the Declaration of Independence
Re: Train and use bayes on different adresses
Am 26.06.2008 um 18:26 schrieb John Hardin: On Thu, 26 Jun 2008, Florian Lindner wrote: Hello, I use (honestly: I plan) the following procedure to filter my spam using SA: All mails are piped through spamc. (emails for my family and me). required_score is set to high value of 9 to avoid false postives. Mail which is detected as spam is being deleted. Refine that a bit. Leave the threshold at 5 so that suspicious messages get marked, but delete at a high level (e.g. 10+) What should be done with marked messages? All SA filtering is done on the server side. On the client side additional filtering is done by statistic filters of Apple Mail and Thunderbird. Now I want to train the server SA filter by moving the junk mails (whish have slipped through SA) on the client into an IMAP folder. This is done only with the mail I receive, not the one the rest of family receive. Why not let others train? Just give each user training folders. The rest of family is rather computer agnostic and I'm happy they get along with the Thunderbird filter well. Will this setup cause any problems? I ask because the bayes filter I train with only my email is used for all email. It's better if you train with all users' email. Note that *you* may actually be doing the training, but it's still their email. Another option would be to completely disable the statistic filters for my family and leave this completely up to Thunderbird. I would be using another SA config with statistics. How to implement this? Is is sufficient to use spamc -F nostat.cf with use_bayes 0 in the config file and just spamc for me? Are these two spamc invocations are seperated from eath other? Some tools that may help you set things up are available here: http://www.impsec.org/~jhardin/antispam/ It's very interesting but way too sophisticated for my situation and audience. Hooking up spamc via procmail, special handling at a given score, and training from per-user spam and ham boxes. The only difference between what you're suggesting and what I'm doing today is that I have two mail servers, one at a hosted site and one at home (fed by fetchmail from the hosted server), so I have some extra glue moving the training folders from the home server's IMAP folders back out to the hosted server where SA runs. All my family have training folders, but I pretty much do all the training classification whenever I'm doing administrative stuff to their systems. Regards, Florian
Re: Train and use bayes on different adresses
On Thu, 26 Jun 2008, Florian Lindner wrote: Am 26.06.2008 um 18:26 schrieb John Hardin: On Thu, 26 Jun 2008, Florian Lindner wrote: Hello, I use (honestly: I plan) the following procedure to filter my spam using SA: All mails are piped through spamc. (emails for my family and me). required_score is set to high value of 9 to avoid false postives. Mail which is detected as spam is being deleted. Refine that a bit. Leave the threshold at 5 so that suspicious messages get marked, but delete at a high level (e.g. 10+) What should be done with marked messages? If they are spam, the user can drop them into their spam training folder - the assumption is bayes doesn't recognize them well enough yet, but that isn't always the case. If you want to minimize the number of weak-scores spams that your users have to see, and you are less sensitive to FPs (which your original proposal suggests) then you'd just delete at a lower score (e.g. 9+ or 8+). Generally speaking, it's a bad idea to fiddle with the threshold as all the base rulesets are scored by the masscheck process with the assumption that 5 is spammy. All SA filtering is done on the server side. On the client side additional filtering is done by statistic filters of Apple Mail and Thunderbird. Now I want to train the server SA filter by moving the junk mails (whish have slipped through SA) on the client into an IMAP folder. This is done only with the mail I receive, not the one the rest of family receive. Why not let others train? Just give each user training folders. The rest of family is rather computer agnostic and I'm happy they get along with the Thunderbird filter well. That's reasonable. In my experience what you'll see when you review the mailbox is a few false positives that you can copy to the user's ham training folder for them. They will generally just delete any spams unless you stress repeatedly that spams which leak thorough shold go into the spam training folder rather than the trash, and you may be able to tell the MUA's classifier to save to the spam training folder rather than deleting. Will this setup cause any problems? I ask because the bayes filter I train with only my email is used for all email. It's better if you train with all users' email. Note that *you* may actually be doing the training, but it's still their email. Another option would be to completely disable the statistic filters for my family and leave this completely up to Thunderbird. I would be using another SA config with statistics. How to implement this? Is is sufficient to use spamc -F nostat.cf with use_bayes 0 in the config file and just spamc for me? Are these two spamc invocations are seperated from eath other? I'd recommend against that, personally. Bayes is very helpful even if you can't get your users to train it themselves. You might want to have Thunderbird move spams to the spam training folder as I suggested, that way bayes will be led by thunderbird and the classification at the server (which is where it should be) will get better. Some tools that may help you set things up are available here: http://www.impsec.org/~jhardin/antispam/ It's very interesting but way too sophisticated for my situation and audience. Most of it will be visible only to you. My wife and MiL don't worry about training and they get along well. Then again, it also depends on how allergic to receiving _any_ spam your users are. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Perfect Security and Absolute Safety are unattainable; beware those who would try to sell them to you, regardless of the cost, for they are trying to sell you your own slavery. --- 8 days until the 232nd anniversary of the Declaration of Independence
Re: Train and use bayes on different adresses
Am 26.06.2008 um 19:31 schrieb John Hardin: On Thu, 26 Jun 2008, Florian Lindner wrote: Am 26.06.2008 um 18:26 schrieb John Hardin: On Thu, 26 Jun 2008, Florian Lindner wrote: Hello, I use (honestly: I plan) the following procedure to filter my spam using SA: All mails are piped through spamc. (emails for my family and me). required_score is set to high value of 9 to avoid false postives. Mail which is detected as spam is being deleted. Refine that a bit. Leave the threshold at 5 so that suspicious messages get marked, but delete at a high level (e.g. 10+) What should be done with marked messages? If they are spam, the user can drop them into their spam training folder - the assumption is bayes doesn't recognize them well enough yet, but that isn't always the case. If you want to minimize the number of weak-scores spams that your users have to see, and you are less sensitive to FPs (which your original proposal suggests) then you'd just delete at a lower score (e.g. 9+ or 8+). Generally speaking, it's a bad idea to fiddle with the threshold as all the base rulesets are scored by the masscheck process with the assumption that 5 is spammy. Sorry, I don't understand this. What is difference between changing the threshold and deleting all spam messages or leave the threshold at 5 and deleting mail with 9 points. Is the threshold changed anything else than: if sore threshold: mark spam else: mark ham after all tests have been run? All SA filtering is done on the server side. On the client side additional filtering is done by statistic filters of Apple Mail and Thunderbird. Now I want to train the server SA filter by moving the junk mails (whish have slipped through SA) on the client into an IMAP folder. This is done only with the mail I receive, not the one the rest of family receive. Why not let others train? Just give each user training folders. The rest of family is rather computer agnostic and I'm happy they get along with the Thunderbird filter well. That's reasonable. In my experience what you'll see when you review the mailbox is a few false positives that you can copy to the user's ham training folder for them. They will generally just delete any spams unless you stress repeatedly that spams which leak thorough shold go into the spam training folder rather than the trash, and you may be able to tell the MUA's classifier to save to the spam training folder rather than deleting. For my family I want to leave it as it is. Will this setup cause any problems? I ask because the bayes filter I train with only my email is used for all email. It's better if you train with all users' email. Note that *you* may actually be doing the training, but it's still their email. Another option would be to completely disable the statistic filters for my family and leave this completely up to Thunderbird. I would be using another SA config with statistics. How to implement this? Is is sufficient to use spamc -F nostat.cf with use_bayes 0 in the config file and just spamc for me? Are these two spamc invocations are seperated from eath other? I'd recommend against that, personally. Bayes is very helpful even if you can't get your users to train it themselves. Can I use two different bayes DBs? One for my family without training (just the auto train functions) and one for me that is trained? spamc is invoked from the maildrop MDA. I can't change the system user I invoke spamc from but best would be two kind of spamc invocations that act like they were different users. You might want to have Thunderbird move spams to the spam training folder as I suggested, that way bayes will be led by thunderbird and the classification at the server (which is where it should be) will get better. Some tools that may help you set things up are available here: http://www.impsec.org/~jhardin/antispam/ It's very interesting but way too sophisticated for my situation and audience. Most of it will be visible only to you. My wife and MiL don't worry about training and they get along well. Then again, it also depends on how allergic to receiving _any_ spam your users are. I want to optimize it primarily for me, it's working fine for my family. Florian
Re: Train and use bayes on different adresses
On Thu, 26 Jun 2008, Florian Lindner wrote: Generally speaking, it's a bad idea to fiddle with the threshold as all the base rulesets are scored by the masscheck process with the assumption that 5 is spammy. Sorry, I don't understand this. What is difference between changing the threshold and deleting all spam messages or leave the threshold at 5 and deleting mail with 9 points. Raising the threshold will result in more emails that are obviously spam to a human being coming into their mailbox without a [SPAM] tag. It will make your antispam efforts look less effective - you're guaranteeing yourself more false negatives. Leaving the threshold at 5 and deleting at the higher threshold will result in lower-scoring (i.e. possibly-not-spam) spams being delivered with a [SPAM] tag as a warning, while the higher-scoring (9+ obvious spam) spams don't get delivered. For my family I want to leave it as it is. Fair enough. Can I use two different bayes DBs? One for my family without training (just the auto train functions) and one for me that is trained? ...that I don't know. Others may be able to comment. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #4: If your shooting stance is good, you're probably not moving fast enough nor using cover correctly. --- 8 days until the 232nd anniversary of the Declaration of Independence
Re: Train and use bayes on different adresses
On Donnerstag, 26. Juni 2008 Florian Lindner wrote: Can I use two different bayes DBs? One for my family without training (just the auto train functions) and one for me that is trained? You don't want that, really. If you use a trained bayes, it helps all. You do not have to have all spam that your family gets also. Don't forget that bayes auto-learns also. So just take your ham/spam, keep bayes in training, and let it learn. Feed all e-mails with it, and the results will be good. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: curl -s http://zmi.at/zmi.asc | gpg --import // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: www.keyserver.net Key-ID: 1C1209B4 signature.asc Description: This is a digitally signed message part.