On Tuesday, January 17, 2006, 8:45:45 AM, Matt wrote:

M> Pete,

M> I reviewed my Hold range going back to Monday morning and I wasn't able
M> to find anything out of the ordinary.  I also searched my logs from my
M> URIBL tool that queries SURBL among other things, and I wasn't able to
M> find any hits for those domains that you pointed out.  I guess that I 
M> wasn't affected.

That's good. It was very short lived on our system... only this
morning it appears - and we there (on that minute) to see it. I wasn't
sure at the time how bad the problem was, and with things like
.earthlink.net and .w3.org being tagged it looked serious - better
safe than sorry.

M> As far as promoting such domains to Sniffer through automated means 
M> goes, I believe that this helps substantiate the need for adding extra
M> qualifications.  For instance, the chances of a 2 letter dot-com domain
M> being a legitimately taggable spam domain are almost zero.  To a lesser
M> extent the same is true as you add on more characters.  Also, it would
M> be very helpful for such situations and false positives in general if 
M> you were to track long-standing domains that appear in ham and don't add
M> these automatically by cross checking these blacklists.  There are many
M> different ways to accomplish this.  I have found over time that foreign
M> free E-mail services can get picked up by Sniffer, and because these 
M> services are frequently forged and legitimate traffic is low enough that
M> people don't often either notice/report false positives, that these 
M> rules stay high in strength and live a very long time.  You can in fact
M> prevent this from happening to a large extent with further validation.
M> SURBL is subject to false positives on such things, but they expire such
M> rules using different techniques that prevent them from being long-term
M> issues, but these cross-checked false positives can have a life of their
M> own on Sniffer sometimes.

We have very few foreign customers - that is changing - but in the
mean time that sets up a couple of dynamics. 1 - nobody reports it as
a false positive because there are very few (if any) people in our
system that use the service, 2 - most of the messages coming from
those services to our US customers are, in fact, spam sent by abusing
those networks. In these cases, until someone reports a false positive
against one of these rules we really don't have any practical way of
tipping the balance. We can't be personally familiar with every system
everywhere so often we must go with the evidence we have, and in these
cases that is most frequently a lot of spam and no other indications.

With regard to tracking long-standing "good" domains, we're working on
mechanisms in v3 that gather statistics on "friendly" message features
so that we can be alerted any time something like this comes through.
Real-time "feature" reputation mechanisms will help to steer more
accurate and more aggressive automated tools that we can leverage to
capture more spam/malware very quickly and to prevent creating rules
that appear "friendly" without more aggressive research.

As for promoting domains or other message features by automatic means,
the criteria are always under review, we generally manually review the
same messages after the bots (this is how we noticed w3.org, declude,
earthling, et al...) and the criteria are pretty strong.

For example, not only must a message be presented to us through a
harvested address (in most cases) but after that it must hit more than
one black list - and then if the bot finds something useful and it
also matches SURBL then it will be added...

All that by way of saying: though the rule might reference SURBL in
it's name, that's really more for research purposes than a anything
else. It was definitely much harder for the rule to get into our
system than that-- The only way these rules get in there are by
satisfying a battery of constraints.

Any bad rule that lasts any time in our system is there because it
wasn't reported which generally means there were no meaningful false
positives out there -- especially if the rule strength is high...

http://www.sortmonster.com/MessageSniffer/Performance/RuleStrengths.jsp

Above 2.0 there are 100s of messages per day being tagged by a rule as
measured by only about 150 systems that send in logs. Each one of
those is an opportunity to trigger a false positive report. It seems
unlikely (theoretically) that this could go on for very long without
somebody noticing and reporting a false positive. Still, at this level
a rule must have been sourced through a harvested address (clean
spamtrap) in order to survive in the core after an FP report.

That said, once an FP report arrives on such a rule if it is anywhere
near the "gray area" I research it pretty thoroughly before making the
local/global adjustment decision. (Recall that even if we keep the
rule in the core, every system is able to block a rule or mitigate it
with white rules).

Messages sourced by user submission must be above 3.0 for the same
treatment - indicating that 1000s of messages per day are being
tagged. If a rule survives at that level without a false positive
report for any period of time it is unlikely to be an error - at least
according to the majority of our customers.

We can't have a zero error rate - each customer has a different
perspective on what is or is not spam, for example. So what we do is
remain extremely sensitive to any false positive reports. We even use
some data collected from quarantine retrieval systems to help us
review rules that might be false positives.

We're always improving the process though... nothing is immune from
review.

Hope this helps,

_M




This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html

Reply via email to