Re: bayes filtlering

Bill Cole Tue, 23 Jun 2015 11:53:49 -0700

On 23 Jun 2015, at 8:34, Roman Gelfand wrote:

Periodically, I am running the following command on my spam box...
sa-learn --no-sync --spam/mbx/adomain.com/auser/Maildir/.Junk/{cur,new}
It seems to work.  However, I continue to get this message type.  Why?
Here is SA message.
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) onmail.adomain.com
X-Spam-Level: ***
X-Spam-Status: No, score=3.6 required=5.0tests=BAYES_99,BAYES_999,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS,URIBL_BLOCKEDautolearn=no
        version=3.3.2

Your configuration appears to use the default scores for the rules thatare being hit there and for the "required" threshold. A 100% certainBayes judgment (technically anything >99.9%) only adds up to a score of3.7 with the default scores, and the default threshold is 5.0, so youneed *something more* than a Bayes certainty to get SA to call anythingspam, using the default configuration. Without seeing the actual mail,what "more" might be is a generic theoretical discussion.

However, in this case there's an obvious first thing to fix: stop usinga shared DNS resolver.

The URIBL_BLOCKED "rule" is a message from the operators of theuribl.com service that the DNS resolver used for a query is explicitlyrefused service. The most common reason for this is excess query volumefrom a resolver. The only likely reasons for you to hit this are:

1. You are scanning so much mail with SA that you must be a largecommercial operation capable of helping to support the uribl.com serviceas "free for most," so they require you to do so. This seems unlikelyfor someone newly setting up SA...

2. You are using a DNS resolver that is shared by a large number ofother people and in aggregate you are all pounding the uribl.comnameservers as if you are a commercial service provider or largebusiness.

The solution for (2) is a step that should be part of running *ANY* MTAthat accepts mail from the world at large: bring up a caching recursive(NOT forwarding) resolver DNS daemon on the same host (or in multi-hostenvironments: same physical LAN) as the MTA and use it as the resolverfor the MTA. In addition to being able to use services like uribl.comand Spamhaus that block large resolvers who don't support them, havingyour own resolver makes DNS resolution substantially faster on averagefor your MTA. With a modern MTA doing basic spam control, DNS resolutiontime is a substantial contributor to session lifetime, which is a majordeterminant of overall capacity. Another positive advantage is that manyshared resolvers (especially those run by ISPs) do non-standard thingsin response to some queries designed to either assist and protectweb-surfing users or line their own pockets, depending on the particularresolver and one's PoV. None of those tricks are helpful for an MTA, andsome can be positively harmful, so you shouldn't do resolution for anMTA through such a server. A caching-only recursive nameserver isn't asubstantial load and isn't difficult to configure, and many OSdistributions include such a configuration in the base OS (e.g. FreeBSD)or as the default config in packages of ISC BIND and/or other DNSdaemons.

Re: bayes filtlering

Reply via email to