>> Their physical and structural proximity is not noted. Synthetic
>> tokens based on hostname or IP address in the urls will be generated
>> if you add x-pick_apart_urls:True to the Tokenizer section of your
>> config file.
Dave> That doesn't sound like it's doing what I'm asking about.
No, it's not, however, you might be surprised how helpful it is to generate
tokens for the /8, /16, /24 and /32 address blocks can be. I what I was
implying is that maybe you don't need the spoof detection you were asking
for if the address tokens generated from the spammer's IP address are
spammy.
Dave> I want a special token that is generated each time a link's text
Dave> is just a URL and the link and the URL text don't point to the
Dave> same place.
That will require actually parsing the HTML at some level. SpamBayes just
sees a stream of tokens. It doesn't really know much (if anything) about
compound structure.
Dave> Messages with this property are always spam and account for a
Dave> large percentage of my unsures.
Try these two settings
x-pick_apart_urls:True
x-lookup_ip:True
and see if they help.
Dave> From what you say above it looks like pick_apart_urls will
Dave> generate tokens describing different parts of a given URL, but
Dave> will do nothing to help capture this particular spammy
Dave> relationship between enclosed text and actual link.
Dave> Or did I misunderstand you?
No, I probably misunderstood myself. The IP address hacker is the
x-lookup_ip option I believe. They are both helpful though.
Skip
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev