Jeff, I had a look at your list at some random time a few days ago.  I
noticed that the top 90% or so of the reports looked pretty solid.  At the
instant I looked, the bottom 10% of the reports were most all highly
suspect.  This is where the yahoo and geocities and other whitelist stuff
was showing up.  Some other reports (and I can't remember what any of them
were) also seemed somewhat suspect, even though they probably weren't on a
whitelist.

I concluded that only the top 90% of your reports should be used in the
blocking test, and ignore the reports with less than 10% of the
highest-scoring report.  Now, perhaps this percentage fluxuates with time, I
certainly haven't made multiple checks to see.  And maybe after whitelist
removal the rest of the bottom 10% really is spam.

But I think it would be an interesting experiment to compare the relibility
of the top 90% to the relibility of the entire collection.

        Loren

Reply via email to