David> Something that comes up over and over in spam is a link of the David> form:
David> <a href="http://url/of/spammers/site"> David> http://url/of/some/legit/site David> </a> David> Does SpamBayes have a token that represents that information and David> an option I can set that will use it? The SpamBayes tokenizer essentially splits the message at word boundaries, so the two urls are considered separately. Their physical and structural proximity is not noted. Synthetic tokens based on hostname or IP address in the urls will be generated if you add x-pick_apart_urls:True to the Tokenizer section of your config file. For completeness here is my current set of tokenizer settings (haven't changed them in a long while): [Tokenizer] record_header_absence:True summarize_email_prefixes:True summarize_email_suffixes:True mine_received_headers:True x-pick_apart_urls:True x-fancy_url_recognition:False x-lookup_ip:True lookup_ip_cache:~/tmp/dnscache.pck x-image_size:True x-crack_images:True x-ocr_engine:gocr max_image_size:100000 crack_image_cache:~/tmp/imagecache.pck Skip _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev