Re: [sniffer]Re[2]: [sniffer]A design question - how many DNS based tests?

Darin Cox Tue, 06 Jun 2006 16:50:05 -0700

> Can you recommend an alternate process, or changes to the existing
> process that would be an improvement and would continue to achieve
> these goals? We are always looking for ways to improve.

I've been thinking about this recently.  I'm mostly concerned with FPs for
the best tests, like Sniffer, so I was thinking about grouping held messages
by highest weight test that they failed.... something I'm considering for
the spam queue review app I'm working on in "spare" time.  I do have a
framework in place in the app to assign a keystroke to an action, like
copying the message, altering the copy to send to your false@ address, and
releasing back for delivery.  That makes FP processing on my end much
easier, with one keystroke doing everything we need.

This also got me thinking of the flip side, spam reporting.  There's a
significant untapped load of spam that sniffer doesn't fail that we filter.
I was thinking about creating a filter to copy your spam@ address with
messages that get moved to our archive (we archive held spam for 30 days in
case we missed an FP) that did not fail Sniffer.  This would be after we
have already processed for FPs.

Thoughts?

Darin.

----- Original Message ----- 
From: "Pete McNeil" <[EMAIL PROTECTED]>
To: "Message Sniffer Community" <sniffer@sortmonster.com>
Sent: Tuesday, June 06, 2006 7:29 PM
Subject: [sniffer]Re[2]: [sniffer]A design question - how many DNS based
tests?

Hello Matt,

Tuesday, June 6, 2006, 12:37:56 PM, you wrote:

<snip/>

> appropriately and tend to hit less often, but the FP issues with
> Sniffer have grown due to cross checking automated rules with other
> lists that I use, causing two hits on a single piece of data.  For
> instance, if SURBL has an FP on a domain, it is possible that
> Sniffer will pick that up too based on an automated cross reference,
> and it doesn't take but one  additional minor test to push something
> into Hold on my system.

Please note. It has been quite some time now that the cross-reference
style rule-bots have been removed from our system. In fact, at the
present time we have no automated systems that add new domain rules.

Another observation I might point out is that many RBLs will register
a hit on the same IP - weighting systems using RBLs actually depend on
this. An IP rule hit in SNF should be treated similarly to other RBL
type tests. This is one of the reasons that we code IP rules to group
63 - so that they are "tumped" by a rule hit in any other group and
therefore are easily isolated from the other rules.

<snip/>

> handling false positive reports with Sniffer is cumbersome for both
> me and Sniffer.

The current process has a number of important goals:

* Capture as much information as possible about any false positive so
that we can improve our rule coding processes.

* Preserve the relationship with the customer and ensure that each
case reaches a well-informed conclusion with the customer's full
knowledge.

* Protect the integrity of the rulebase.

This link provides a good description of our false positive handling
process:

http://kb.armresearch.com/index.php?title=Message_Sniffer.FAQ.FalsePositives

Can you recommend an alternate process, or changes to the existing
process that would be an improvement and would continue to achieve
these goals? We are always looking for ways to improve.

> I would hope that any changes
> seek to increase accuracy above all else.  Sniffer does a very good
> job of  keeping up with spam, and it's main issues with leakage are
> caused by  not being real-time, but that's ok with me.  At the same
> time Sniffer is the test most often a part of false positives, being
> a contributing  factor in about half of them.

Log data shows that SNF tags on average more than 74% of all email
traffic and a significantly higher percentage of spam typically.

It would seem that it is likely that SNF would also represent highly
in the percentage of false positives (relative to other tests with
lower capture rates) for any given system since it is represented
highly in email traffic as a whole.

You've also indicated that you weight SNF differently than your other
tests - presumably giving it more weight (this is frequently the case
on many systems).

How much do you feel these factors contribute to your findings?

>   About 3/4 of all FP's (things that are  blocked by my system) are
> some form of automated or bulk E-mail.  That's not to say that other
> tests are more accurate; they are just scored more appropriately and
> tend to hit less often, but the FP issues with Sniffer have grown
> due to cross checking automated rules with other lists that I use,
> causing two hits on a single piece of data,

W/regard "causing two hits on a single piece of data": SNF employs a
wide variety of techniques to classify messages so it is likely that a
match in SNF will coincide with a match in some other tests. In fact,
as I pointed out earlier, filtering systems that apply weights to
tests depend on this very fact to some extent.

What makes weighting systems powerful is that when more than one test
does trigger on a piece of data, such as an IP or URI fragment, that
the events leading up to that match were distinct for each of the
matching test. This is the critical component to reducing errors
through a "voting process".

Test A uses process A to reach conclusion Z.

Test B uses process B to reach conclusion Z.

Process A is different from process B and so the inherent errors in
process A are different than the errors in process B and so we presume
it is unlikely that an error in Test A will occur under the same
conditions as the errors in Test B.

If a valid test result is the "signal" we want, and an erroneous test
result is "noise" on top of that signal then it follows:

By combining the results of Test A and Test B we have the opportunity
to increase the signal to noise ratio to the extent our assumptions
about errors are true. In fact, if no error occurs in both A and B
under the same circumstances, then defining a new test C as (A+B/2)
will produce a signal that is "twice as clear" as test A or B on it's
own.

If I follow what you have said about false positives and SNF matching
other tests, then you are describing a situation where the process for
SNF and the alternate tests are the same - or put another way, that
SNF somehow represents a copy of the other test and so will also
contain the same errors. If that's the case then the equation changes
and the advantage of combining the tests evaporates because the errors
(noise) would be amplified as much as the desired result (signal).

I assure you that this is NOT the case with SNF. There are no
components of SNF's filtering scheme that are copied from any other
system. This is one of our primary design constraints precisely
because "starting from scratch" ensures SNF's results are distinct
from other tests.

Previously when we had bots that cross-referenced other tests as part
of their validation process: The result of an SNF rule being present
_DID_ represent a unique process (unique perspective) for that result.

That is, the "meaning" of an IP matching in SNF was distinct from the
meaning of the same IP matching in another test - even if that test
had been used as part of the validation process. This is because the
origination of the SNF rule was based on a distinct process - most
commonly a spam reaching our spamtraps some significant number of
times and being sourced from the same IP, and more commonly, several
different spam reaching our spamtraps from a given source IP.
Additionally, IP rules removed from SNF are removed permanently so the
absence of an IP match in SNF where it existed in the alternate test
might have that special meaning.

In that context, an SNF IP match result would add significant value to
the equation because it would serve to validate the match in the
alternate test. Put another way, the vast majority of IP matches in
the alternate test would not be present in SNF, and so those that are
present in SNF are significant because they represent that BOTH
distinct testing processes agreed on the result.

These days, our new bots use entirely different processes to create IP
rules and they do not validate these rules against any other single
RBL. To the extent validation might have made the previous test cases
less distinct, the new test cases are certainly more distinct and so
you should probably reconsider how you view SNF's bot generated rules.

--- In summary, if FPs with SNF have grown it is not due to cross
checking since that process is not used anymore. Also, if FPs with SNF
have grown at all then we need to understand your data better:
Overall, the rate of reported false positives is measurably lower even
while our subscriber base has grown significantly.

http://reports.messagesniffer.com/Performance/FalseReportsRates.jsp

> and the growth of the Sniffer userbase which has become more likely
> to report first-party advertising as spam, either manually or
> through an automated submission mechanism.

Firstly, we handle spam submitted by humans (any humans) with
different rules than spam that hits our spamtraps. Also, we generally
discourage the use of broadly deployed, automated spam submission
precisely because these kinds of submissions tend not to agree between
individual users and frequently not with the policies of the system's
administrator(s).

In any case, first-party advertising is not automatically considered
to be legitimate traffic on many systems, and it does appear that a
significant portion of our userbase holds this view. Another segment
of our customer base seems to prefer to take each case on it's own -
This seems to be the largest group and it is also our strategy for the
core rulebase.

The fact that many "first-party advertisements" run afoul of our
spamtraps is also an important factor since the only way those
addresses could make it onto their lists is through harvesting -
either directly or indirectly. This is a clear indicator that some of
this content is reaching people who may not have a first party
relationship at all, or if they do may have explicitly opted out of
any further contact w/ the advertiser and especially it's affiliates.
(I have heard this complaint more than once, and have made the
complaint myself.)

It is also true that this kind of traffic frequently contains
obfuscation and tracking mechanisms that are also used by hard core
spammers, and that there is a segment of advertisers that will
leverage both legitimate and illegitimate bulk mail providers and
"marketing services" either by choice or by accident. - All of these
things make the subject of "first-party advertising" problematic at
best.

None the less, we almost never code a rule for what appears to be
legitimate first party advertising, and even the questionable items
must be heavily submitted before we will consider coding for it.

Layered on top of this is the fact that our system prevents us from
repeating rules, our protocols tend to force us to create very
specific forms of rules (that would likely match if sourced from
similar messages) - and rules that have already been removed due to
false positives remain in the system as reminders of what not to code.
As a result we nearly never make the same "mistake" twice, and we tend
to learn quickly as a group.

Our strategy in these cases is to keep the core rule-base focused on
the preferences of the greatest segment of our subscriber base and to
customize individual subscribers in cases where their policy
disagrees. This customization process most frequently occurs as a
result of our false positive handling process... though it is worth
noting that the vast majority of reported false positives result in
rules being removed from the core rulebase.

To date, only a very small fraction of our customers have any
customization.

Ongoing development work and upcoming features are focused on
improving accuracy (on both the spam and ham sides of the equation),
improving response time, increasing SNFs flexibility and breadth,
reducing complexity, maintenance & administration, and improving speed
& efficiency.

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.

#############################################################
This message is sent to you because you are subscribed to
  the mailing list <sniffer@sortmonster.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

#############################################################
This message is sent to you because you are subscribed to
  the mailing list <sniffer@sortmonster.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

Re: [sniffer]Re[2]: [sniffer]A design question - how many DNS based tests?

Reply via email to