RE: [Declude.JunkMail] Interesting test results
| What we are doing is to track the 2000 (user configurable) | most recent spammer | IP addresses. The list is maintained as an MRU style list | (sorted with the | most recent at the top). If incoming messages reach a user | defined score, the | IP address of the spammer is added to the list. snip | Here is what we found. After about 3 weeks of data | collection, only about 1 in | 400 incoming spams is identified by a DNS lookup, and NOT on | the list of the | 2000 most recent spammers. Also, of all the spams we receive | on all accounts, | about 43% are on the recent spammer list, meaning that almost | half of the | spams we receive are from senders that have spammed us before. snip This is one of the capabilities we're buiding into Message Sniffer v3. Our testing has shown similar results, however there are some complexities with these tests particularly where gray sources are found. As a result our implementation will resolve the IP address other network centric tests first as features of the message. These features then become part of the input stream for the bayesian hinting engine. (It should be noted that the bayesian hinting engine is really more a blend of fuzzy logic, neural networks, and naieve baysian learning techniques... it's just easier to use the current buzz-word to describe it...) So far our simulations indicate some profound accuracy imrpovements when new spam arrives, and surprisingly also when non-spam from gray senders arrives. The early analysis indicates that the learning engine is picking up second and third order patterns associated with these message features... This has the effect of gating the effect of some heuristics which are ambiguous under other circumstances so that they only count when they can be accurate. It seems obvious that as a weighted test, the top n most used IPs are a good bet - similarly a suggestion for research would be to apply a logarithmic scale to the MRU list position and use that as a weight... This scheme can be particularly useful if the list is dynamically scaled because the relative weights of different list positions can be maintained as the number of entries on the list changes... This is a similar mechanism to our Rule Strength analysis which is used to gate out rules that are currently inactive. (See http://www.sortmonster.com/MessageSniffer/Performance/CurrentRuleStrengt h.jsp) Another important factor we have found for these kinds of tests is that there tends to be a periodicity to message rates from some networks... the result of this is that in a linear MRU paradigm some networks will appear and dissappear from the list resulting in late blocking on the same period. That is, a batch of unwanted content will come through and cause the IP to go to the top of the list, but then the flow falls off and the IP is dropped. Next time unwanted content comes in from that IP it is let through the filter for a time because the IP is not on the list... shortly it will be blocked again but during that build up time a significant amount of the content might be delivered. A counter to this pulsing effect is to develop in increasing persistence to the more highly listed IPs so that they tend to stay on the list through the down period. Another important balance for persistence however is to reduce it's effects based on any ambiguous or false positive hits... in fact it turns out that this persistence reduction should have a persistence of it's own so that periodic false-positive indications can be suppressed when there is mixed content from the source. Note that periodicity, gating, and persistence mechanisms are useful on may heuristics - not just IP based tests. I hope these thoughts spark some new ones the prove helpful... :-) _M --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
[Declude.JunkMail] Interesting test results
Hi Scott and all, We added a test to SpamManager that has produced some really interesting results. What we are doing is to track the 2000 (user configurable) most recent spammer IP addresses. The list is maintained as an MRU style list (sorted with the most recent at the top). If incoming messages reach a user defined score, the IP address of the spammer is added to the list. As part of our testing procedure for our own lists, we validate the results of our spam trap accounts and internal email accounts against most of the public DNS lookup databases and the 3 we subscribe to mostly to determine their weighting. Prior to implementing this test, roughly 40% of spam we received also got hits from one or more of the DNS lookup databases with SpamCop having the best results (false positives ignored). Here is what we found. After about 3 weeks of data collection, only about 1 in 400 incoming spams is identified by a DNS lookup, and NOT on the list of the 2000 most recent spammers. Also, of all the spams we receive on all accounts, about 43% are on the recent spammer list, meaning that almost half of the spams we receive are from senders that have spammed us before. In analyzing this data, we found that spam trap accounts that were set up at the same time, and use the same methods, have a totally different mailing list distribution after a couple of months. This analysis supports our supposition that a locally maintained list of spammers is going to be a lot more accurate than some centrally maintained DNS lookup database. Also we routinely get lots of spam reported to us that we have never seen, also indicating that spam mailing lists evolve into lists that tend to be very unique, and that a few originators are responsible for a majority of spam for each account. I was thinking that it would probably be a relatively simple matter to add such a test in a future version of declude. If an incoming message reached a certain weight, it could be added to a recent spammer list. This list could be checked along with other internal tests _before_ DNS tests are performed, and this could push a weighting up high enough that external DNS lookups could be skipped. The effect of this is that by using a individualized IP address scheme, processing time per message could be greatly reduced resulting in less resource problems, and faster delivery times. Anyway, I thought this would make an interesting topic for discussion. Brian Milburn --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
Re: [Declude.JunkMail] Interesting test results
Here is what we found. After about 3 weeks of data collection, only about 1 in 400 incoming spams is identified by a DNS lookup, and NOT on the list of the 2000 most recent spammers. That is quite impressive. I was thinking that it would probably be a relatively simple matter to add such a test in a future version of declude. If an incoming message reached a certain weight, it could be added to a recent spammer list. This list could be checked along with other internal tests _before_ DNS tests are performed, and this could push a weighting up high enough that external DNS lookups could be skipped. The effect of this is that by using a individualized IP address scheme, processing time per message could be greatly reduced resulting in less resource problems, and faster delivery times. That sounds like an excellent idea -- I'm going to investigate to see whether this may be possible or not. Circumventing the DNS lookups would be very useful. -Scott --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
RE: [Declude.JunkMail] Interesting test results
I was thinking that it would probably be a relatively simple matter to add such a test in a future version of declude. If an incoming message reached a certain weight, it could be added to a recent spammer list. This list could be checked along with other internal tests _before_ DNS tests are performed, and this could push a weighting up high enough that external DNS lookups could be skipped. The effect of this is that by using a individualized IP address scheme, processing time per message could be greatly reduced resulting in less resource problems, and faster delivery times. SPThat sounds like an excellent idea -- I'm going to investigate to see SPwhether this may be possible or not. Circumventing the DNS lookups would SPbe very useful. SP -Scott Mr. Obvious here... the same technique could be used in the negative to pass through frequent mail from *low* scoring servers. That may mean that a server from which we receive a lot of mail, which suddenly finds itself or its subnet on numerous RBLs, may still deliver its mail successfully to us, based on it's previous good behaviour. Andrew. --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.
RE: [Declude.JunkMail] Interesting test results
SPThat sounds like an excellent idea -- I'm going to investigate to see SPwhether this may be possible or not. Circumventing the DNS lookups would SPbe very useful. Mr. Obvious here... the same technique could be used in the negative to pass through frequent mail from *low* scoring servers. That may mean that a server from which we receive a lot of mail, which suddenly finds itself or its subnet on numerous RBLs, may still deliver its mail successfully to us, based on it's previous good behaviour. That sounds like it would work very well as well. :) -Scott --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type unsubscribe Declude.JunkMail. The archives can be found at http://www.mail-archive.com.