Re: Bayes auto-learn - not happening

David Jones Wed, 09 Aug 2017 05:53:17 -0700

On 08/08/2017 08:02 PM, Ian Zimmerman wrote:

On 2017-08-08 15:20, Scott wrote:

Another new one  big score, auto-learn disabled.  This one is fairly small.

X-Spam-Status: Yes, score=29.428 tag=-9999 tag2=5 kill=6.4
         tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2,
DIGEST_MULTIPLE=0.001,
         FILL_THIS_FORM=0.001, FROM_MISSPACED=0.001, FROM_MISSP_SPF_FAIL=1,
         HEADER_FROM_DIFFERENT_DOMAINS=0.001, HEXHASH_WORD=1,
         HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001,
         HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14,
         NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365,
         RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5,
         RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274, SPF_FAIL=4,
         SPF_HELO_FAIL=4, STYLE_GIBBERISH=3.093,
         T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.948,
         WEIRD_QUOTING=0.001] autolearn=unavailable autolearn_force=no

Can you tell if this one has the 3 point match?


Scott,

when I tried to use the autolearn feature I was as confused as you are.
As far as I remember, the 3 point each from header and body is not the
only requirement; the full truth is that some rules are "privileged" and
can contribute to autolearning while others cannot.  I found it opaque
in the extreme and essentially unpredictable, and so I stopped
autolearning and hacked up some scripts that put duplicate of each ham
message into a folder which is then processed by sa-learn from a
cronjob, with sufficient delay that I can review the contents and remove
any false negatives; and similarly with spam, excluding the utterly
horrible category which just goes to /dev/null.

It may not be possible for you to adopt such a process if your volume is
high, but OTOH in that case you probably have users to help you :)

I think this is what RW is telling you, too.

FWIW, this is documented (sort of) by:

perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold

Same here. I had a little success with autolearn. When I startedsplitting out messages into a spam and ham folder and using a cronscript to train explicitly, the BAYES hits became very accurate andhelped with zero-hour spam which is the hardest to block.

I setup an iRedmail server on a local-only subdomain and send/BCC copiesof messages over to it. Then I can use simple Inbox rules to sort ordiscard them. Then I cron'd spam and ham training based on the Maildir"cur" folders. This requires me to do a quick scan of the unreadmessages. When I mark them as read, then they get sa-learn'd. Takes afew minutes a day and drastically improved the mail filtering.

A side effect of this has allowed me to easily spot some new spamcampaigns and messages that are scoring just below the block thresholdso I can add them to local custom rules. Sometimes these are legitsenders with good opt-out so I add them to a whitelist_auth entry.


--
David Jones

Re: Bayes auto-learn - not happening

Reply via email to