Thanks for all the good feedback, I'll certainly be following up on it. I did find one reason why I wasn't getting good matches ... when I looked more carefully at the perl data structure, I found that the 'features' hash only contained alphabetic characters. So, for example, in the string 'WARRIOR 14-160 14-160', only the warrior part was being used. Also, with 'BMW 318i' and 'BWM 525i', the numbers were being ignored, and with something like 'A/T', two separate features 'a' and 't' were there.
So my further question is how to get NaiveBayes to use white space separated words as features ('318i', 'a/t') and not just the individual alphabetic characters. Is it a simple option when calling new AI::Categorizer? -- Jason Armstrong