Hi all,

Does Bayes tokenize on word boundaries and hence would ignore hyphens?  Or does 
it include them?  I've seen a lot of spam lately inserting random hyphens 
between key spammy words (like "economic-crisis"), presumably in an attempt to 
bypass word filters and/or Bayes.  So would word1-word2 get tokenized as a 
single item or as two words?

If hyphens are currently included, then perhaps Bayes should be updated to 
ignore hyphens and/or tokenize at word boundaries?

Cheers.

--- Amir

Reply via email to