Re: Razor FP on simple http link (by itself)

2017-05-05 Thread RW
On Fri, 5 May 2017 11:37:38 -0400
Rob McEwen wrote:


> Does RAZOR extract domains from links and checks them against a bad 
> domain database... sort of how SURBL works... and/or check the IP
> that they resolve to? (I don't think so, but now I have to ask just
> to be sure!)
> 
> If not... this seems to go beyond checksum-checking of parts of a 
> message - this seems much more surgical/specific than that.
> 
> Don't get me wrong... I'm a big fan of razor and of other 
> checksum-technologies. But I'm sort of shaken by this because I
> always thought a FP for razor would be much more difficult due to
> larger portions of a message having to match a checksum match in
> order to have a hit. (sort of like a larger "fingerprint" that is not
> easily duplicated in another innocent message, allegedly making FPs
> practically impossible)

razor2 supports multiple hash engines, but currently only engine 8 is
used. This is based on a hash of URI domain name and message size in
multiples of (I think) 100 bytes.


Razor FP on simple http link (by itself)

2017-05-05 Thread Rob McEwen
I use SA as a "helper app" within my custom written spam filter. So I'll 
get SA give me an opinion about certain marginal messages, and then my 
spam filter factors the SA score into my spam filter's scoring.


Recently, a prominent law firm for whom I host mail - was complaining 
about FPs where messages from a prominent real estate company were not 
making it to them. Interestingly, their messages kept hitting RAZOR, 
where SA was giving the following response:


1.7 RAZOR2_CHECK   Listed in Razor2 (http://razor.sf.net/)
0.4 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
   [cf: 100]
2.4 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
   above 50%
   [cf: 100]

In testing, I narrowed it all the way down to simply the following 
(alone!) hitting on razor:


either
http://www.example.com
or
http://example.com

(except with the sender's domain, of course)

...either one was triggering this razor score. I even put that as the 
ONLY body text of another message (so a totally different header) - and 
it still triggered. But either variation WITHOUT the "http://; part did 
not trigger.


Interesting... this domain name happens to resolve to an IP that is 
currently blacklisted on Zen. (I know, that is really really bad!) 
Unfortunately, that confuses issues!


Does RAZOR extract domains from links and checks them against a bad 
domain database... sort of how SURBL works... and/or check the IP that 
they resolve to? (I don't think so, but now I have to ask just to be sure!)


If not... this seems to go beyond checksum-checking of parts of a 
message - this seems much more surgical/specific than that.


Don't get me wrong... I'm a big fan of razor and of other 
checksum-technologies. But I'm sort of shaken by this because I always 
thought a FP for razor would be much more difficult due to larger 
portions of a message having to match a checksum match in order to have 
a hit. (sort of like a larger "fingerprint" that is not easily 
duplicated in another innocent message, allegedly making FPs practically 
impossible)


While this kind of more surgical strike can be beneficial in blocking 
more spam - it seems like it changes the paradigm of what I 
(mistakenly?) thought to be RAZOR's potential for collateral damage.


Is this "extra curricular activity"? or did I misunderstand RAZOR's 
checksum technique?


--
Rob McEwen