RE: [Declude.JunkMail] Interesting test results

2003-03-25 Thread Madscientist
| What we are doing is to track the 2000 (user configurable) 
| most recent spammer
| IP addresses. The list is maintained as an MRU style list 
| (sorted with the
| most recent at the top). If incoming messages reach a user 
| defined score, the
| IP address of the spammer is added to the list.

snip

| Here is what we found. After about 3 weeks of data 
| collection, only about 1 in
| 400 incoming spams is identified by a DNS lookup, and NOT on 
| the list of the
| 2000 most recent spammers. Also, of all the spams we receive 
| on all accounts,
| about 43% are on the recent spammer list, meaning that almost 
| half of the
| spams we receive are from senders that have spammed us before.

snip

This is one of the capabilities we're buiding into Message Sniffer v3.
Our testing has shown similar results, however there are some
complexities with these tests particularly where gray sources are
found. As a result our implementation will resolve the IP address 
other network centric tests first as features of the message. These
features then become part of the input stream for the bayesian hinting
engine.

(It should be noted that the bayesian hinting engine is really more a
blend of fuzzy logic, neural networks, and naieve baysian learning
techniques... it's just easier to use the current buzz-word to describe
it...)

So far our simulations indicate some profound accuracy imrpovements when
new spam arrives, and surprisingly also when non-spam from gray
senders arrives. The early analysis indicates that the learning engine
is picking up second and third order patterns associated with these
message features... This has the effect of gating the effect of some
heuristics which are ambiguous under other circumstances so that they
only count when they can be accurate.

It seems obvious that as a weighted test, the top n most used IPs are
a good bet - similarly a suggestion for research would be to apply a
logarithmic scale to the MRU list position and use that as a weight...
This scheme can be particularly useful if the list is dynamically scaled
because the relative weights of different list positions can be
maintained as the number of entries on the list changes... This is a
similar mechanism to our Rule Strength analysis which is used to gate
out rules that are currently inactive. (See
http://www.sortmonster.com/MessageSniffer/Performance/CurrentRuleStrengt
h.jsp)

Another important factor we have found for these kinds of tests is that
there tends to be a periodicity to message rates from some networks...
the result of this is that in a linear MRU paradigm some networks will
appear and dissappear from the list resulting in late blocking on the
same period. That is, a batch of unwanted content will come through and
cause the IP to go to the top of the list, but then the flow falls off
and the IP is dropped. Next time unwanted content comes in from that IP
it is let through the filter for a time because the IP is not on the
list... shortly it will be blocked again but during that build up time
a significant amount of the content might be delivered.

A counter to this pulsing effect is to develop in increasing
persistence to the more highly listed IPs so that they tend to stay on
the list through the down period. Another important balance for
persistence however is to reduce it's effects based on any ambiguous or
false positive hits... in fact it turns out that this persistence
reduction should have a persistence of it's own so that periodic
false-positive indications can be suppressed when there is mixed content
from the source.

Note that periodicity, gating, and persistence mechanisms are useful on
may heuristics - not just IP based tests.

I hope these thoughts spark some new ones the prove helpful...

:-)

_M

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


[Declude.JunkMail] Interesting test results

2003-03-24 Thread brian

Hi Scott and all,

We added a test to SpamManager that has produced some really interesting
results.

What we are doing is to track the 2000 (user configurable) most recent spammer
IP addresses. The list is maintained as an MRU style list (sorted with the
most recent at the top). If incoming messages reach a user defined score, the
IP address of the spammer is added to the list.

As part of our testing procedure for our own lists, we validate the results of
our spam trap accounts and internal email accounts against most of the public
DNS lookup databases and the 3 we subscribe to mostly to determine their
weighting.

Prior to implementing this test, roughly 40% of spam we received also got hits
from one or more of the DNS lookup databases with SpamCop having the best
results (false positives ignored).

Here is what we found. After about 3 weeks of data collection, only about 1 in
400 incoming spams is identified by a DNS lookup, and NOT on the list of the
2000 most recent spammers. Also, of all the spams we receive on all accounts,
about 43% are on the recent spammer list, meaning that almost half of the
spams we receive are from senders that have spammed us before.

In analyzing this data, we found that spam trap accounts that were set up at
the same time, and use the same methods, have a totally different mailing list
distribution after a couple of months. This analysis supports our supposition
that a locally maintained list of spammers is going to be a lot more accurate
than some centrally maintained DNS lookup database. Also we routinely get lots
of spam reported to us that we have never seen, also indicating that spam
mailing lists evolve into lists that tend to be very unique, and that a few
originators are responsible for a majority of spam for each account.

I was thinking that it would probably be a relatively simple matter to add
such a test in a future version of declude. If an incoming message reached a
certain weight, it could be added to a recent spammer list. This list could be
checked along with other internal tests _before_ DNS tests are performed, and
this could push a weighting up high enough that external DNS lookups could be
skipped. 

The effect of this is that by using a individualized IP address scheme,
processing time per message could be greatly reduced resulting in less
resource problems, and faster delivery times.

Anyway, I thought this would make an interesting topic for discussion.

Brian Milburn

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] Interesting test results

2003-03-24 Thread R. Scott Perry

Here is what we found. After about 3 weeks of data collection, only about 1 in
400 incoming spams is identified by a DNS lookup, and NOT on the list of the
2000 most recent spammers.
That is quite impressive.

I was thinking that it would probably be a relatively simple matter to add
such a test in a future version of declude. If an incoming message reached a
certain weight, it could be added to a recent spammer list. This list could be
checked along with other internal tests _before_ DNS tests are performed, and
this could push a weighting up high enough that external DNS lookups could be
skipped.
The effect of this is that by using a individualized IP address scheme,
processing time per message could be greatly reduced resulting in less
resource problems, and faster delivery times.
That sounds like an excellent idea -- I'm going to investigate to see 
whether this may be possible or not.  Circumventing the DNS lookups would 
be very useful.
-Scott

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


RE: [Declude.JunkMail] Interesting test results

2003-03-24 Thread Colbeck, Andrew
I was thinking that it would probably be a relatively simple matter to add
such a test in a future version of declude. If an incoming message reached
a
certain weight, it could be added to a recent spammer list. This list
could be
checked along with other internal tests _before_ DNS tests are performed,
and
this could push a weighting up high enough that external DNS lookups could
be
skipped.

The effect of this is that by using a individualized IP address scheme,
processing time per message could be greatly reduced resulting in less
resource problems, and faster delivery times.

SPThat sounds like an excellent idea -- I'm going to investigate to see 
SPwhether this may be possible or not.  Circumventing the DNS lookups would

SPbe very useful.
SP -Scott

Mr. Obvious here... the same technique could be used in the negative to pass
through frequent mail from *low* scoring servers.

That may mean that a server from which we receive a lot of mail, which
suddenly finds itself or its subnet on numerous RBLs, may still deliver its
mail successfully to us, based on it's previous good behaviour.

Andrew.
---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


RE: [Declude.JunkMail] Interesting test results

2003-03-24 Thread R. Scott Perry

SPThat sounds like an excellent idea -- I'm going to investigate to see
SPwhether this may be possible or not.  Circumventing the DNS lookups would
SPbe very useful.
Mr. Obvious here... the same technique could be used in the negative to pass
through frequent mail from *low* scoring servers.
That may mean that a server from which we receive a lot of mail, which
suddenly finds itself or its subnet on numerous RBLs, may still deliver its
mail successfully to us, based on it's previous good behaviour.
That sounds like it would work very well as well.  :)
-Scott
---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.