Re: [spambayes-dev] effective tokenizer for wiki text

2006-10-30 Thread Tony Meyer
[Skip] Why not just create an email message out of the input? If the headers are identical in every message they won't generate any useful tokens and the message body will be all that yields useful clues. OTOH, if you have login or IP address information for the spammers, you might

Re: [spambayes-dev] effective tokenizer for wiki text

2006-10-30 Thread Matt Good
On Tue, 2006-10-31 at 13:51 +1300, Tony Meyer wrote: [Matt] Yes, I think it would be fine to start testing the filter that way, but I figured since the custom tokenizer had been suggested it was worth looking into what would be required and what the advantages might be. [Skip] Maybe