>> Their physical and structural proximity is not noted. Synthetic >> tokens based on hostname or IP address in the urls will be generated >> if you add x-pick_apart_urls:True to the Tokenizer section of your >> config file.
Dave> That doesn't sound like it's doing what I'm asking about. No, it's not, however, you might be surprised how helpful it is to generate tokens for the /8, /16, /24 and /32 address blocks can be. I what I was implying is that maybe you don't need the spoof detection you were asking for if the address tokens generated from the spammer's IP address are spammy. Dave> I want a special token that is generated each time a link's text Dave> is just a URL and the link and the URL text don't point to the Dave> same place. That will require actually parsing the HTML at some level. SpamBayes just sees a stream of tokens. It doesn't really know much (if anything) about compound structure. Dave> Messages with this property are always spam and account for a Dave> large percentage of my unsures. Try these two settings x-pick_apart_urls:True x-lookup_ip:True and see if they help. Dave> From what you say above it looks like pick_apart_urls will Dave> generate tokens describing different parts of a given URL, but Dave> will do nothing to help capture this particular spammy Dave> relationship between enclosed text and actual link. Dave> Or did I misunderstand you? No, I probably misunderstood myself. The IP address hacker is the x-lookup_ip option I believe. They are both helpful though. Skip _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev