Am 09.01.2015 um 02:01 schrieb David Flanigan:
Excellent feature - I look forward to using it.

It does lead me to another question however. Using a spam honeypot would
lead to a large corpus of SPAM. My corpus of HAM, but its very nature,
would be much smaller. Are there any negative implication to training
the Bayesian filters with thousands (or tens of thousands) SPAM message
but only a couple hundred HAM messages?

a perfect bayes has at least 50/50 percent, more ham in doubt is better, i would in any case re-view the messages before train them, most are open-relay-tests or 100% identical crap not worth to taint the bayes 1000 times with the same copy and such crap here

below my current bayes-counts

the 1,3M "bayes_seen" is for sure a bug becuse it contains random message parts which leads also in not recognized already trained messages, hence a find with a 24 hour limt since i save the corpus forever
____________________________________

[root@mail-gw:~]$ sa-learn.sh
Replacing "Subject: [SPAM] " with "Subject: " (case sensitive) (partial words matched) Replacing "Subject: [SPAM] " with "Subject: " (case sensitive) (partial words matched)

09-01-2015 02:32:19: Proceed SPAM Samples

09-01-2015 02:32:19: Proceed HAM Samples

09-01-2015 02:32:19: Done

0.000          0          3          0  non-token data: bayes db version
0.000          0       7511          0  non-token data: nspam
0.000          0       7565          0  non-token data: nham
0.000          0    1015029          0  non-token data: ntokens
0.000          0  993467899          0  non-token data: oldest atime
0.000          0 1420765927          0  non-token data: newest atime
0.000 0 1420765955 0 non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count

insgesamt 27M
-rw------- 1 sa-milt sa-milt  13K 2015-01-09 02:30 bayes_journal
-rw------- 1 sa-milt sa-milt 1,3M 2015-01-09 00:59 bayes_seen
-rw------- 1 sa-milt sa-milt  39M 2015-01-09 02:12 bayes_toks
-rw------- 1 sa-milt sa-milt   98 2014-08-21 17:47 user_prefs


On 2015-01-08 18:13, David B Funk wrote:

On Thu, 8 Jan 2015, Alex Regan wrote:
How about using a domain specifically for creating a honeypot, of
you only need an email@address <mailto:email@address> no point in
registering a domain soley for this, some might think its better,
but I see no real advantage to it over using a well known existing
domain, infact if you examine your logs you might see one already
there you can use, for example, I use a few
This represents the largest problem I have, because any well-known
existing domain has zen running at SMTP level, which makes it
impossible to whitelist for a specific account. I'd have to disable
RBLs at SMTP connect time, as well as greylisting...
In sendmail, there's the "delay_checks" feature which if enabled
will postpone the RBL/blacklist & milter checks until after the 'RCPT to:'
SMTP phase. This enables things such as 'SPAMFRIENDS' filters
in your access DB making it possible to use RBL/blacklists/milters and
still let all senders get messages to specific selected recipients
(EG "postmaster" or selected spamtrap/honeypot addresses)

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to