Am 09.01.2015 um 02:01 schrieb David Flanigan:
Excellent feature - I look forward to using it. It does lead me to another question however. Using a spam honeypot would lead to a large corpus of SPAM. My corpus of HAM, but its very nature, would be much smaller. Are there any negative implication to training the Bayesian filters with thousands (or tens of thousands) SPAM message but only a couple hundred HAM messages?
a perfect bayes has at least 50/50 percent, more ham in doubt is better, i would in any case re-view the messages before train them, most are open-relay-tests or 100% identical crap not worth to taint the bayes 1000 times with the same copy and such crap here
below my current bayes-countsthe 1,3M "bayes_seen" is for sure a bug becuse it contains random message parts which leads also in not recognized already trained messages, hence a find with a 24 hour limt since i save the corpus forever
____________________________________ [root@mail-gw:~]$ sa-learn.shReplacing "Subject: [SPAM] " with "Subject: " (case sensitive) (partial words matched) Replacing "Subject: [SPAM] " with "Subject: " (case sensitive) (partial words matched)
09-01-2015 02:32:19: Proceed SPAM Samples 09-01-2015 02:32:19: Proceed HAM Samples 09-01-2015 02:32:19: Done 0.000 0 3 0 non-token data: bayes db version 0.000 0 7511 0 non-token data: nspam 0.000 0 7565 0 non-token data: nham 0.000 0 1015029 0 non-token data: ntokens 0.000 0 993467899 0 non-token data: oldest atime 0.000 0 1420765927 0 non-token data: newest atime0.000 0 1420765955 0 non-token data: last journal sync atime
0.000 0 0 0 non-token data: last expiry atime0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count
insgesamt 27M -rw------- 1 sa-milt sa-milt 13K 2015-01-09 02:30 bayes_journal -rw------- 1 sa-milt sa-milt 1,3M 2015-01-09 00:59 bayes_seen -rw------- 1 sa-milt sa-milt 39M 2015-01-09 02:12 bayes_toks -rw------- 1 sa-milt sa-milt 98 2014-08-21 17:47 user_prefs
On 2015-01-08 18:13, David B Funk wrote:On Thu, 8 Jan 2015, Alex Regan wrote:How about using a domain specifically for creating a honeypot, ofyou only need an email@address <mailto:email@address> no point in registering a domain soley for this, some might think its better, but I see no real advantage to it over using a well known existing domain, infact if you examine your logs you might see one already there you can use, for example, I use a fewThis represents the largest problem I have, because any well-known existing domain has zen running at SMTP level, which makes it impossible to whitelist for a specific account. I'd have to disable RBLs at SMTP connect time, as well as greylisting...In sendmail, there's the "delay_checks" feature which if enabled will postpone the RBL/blacklist & milter checks until after the 'RCPT to:' SMTP phase. This enables things such as 'SPAMFRIENDS' filters in your access DB making it possible to use RBL/blacklists/milters and still let all senders get messages to specific selected recipients (EG "postmaster" or selected spamtrap/honeypot addresses)
signature.asc
Description: OpenPGP digital signature