Hello folks, I see that it's possible to use mahout to train a naive bayes classifier using n-grams as features (or I guess, strictly speaking, mahout can be used to generate sequence files containing n-grams; I suspect the naive bayes trainer is indifferent to the form of features it trains on). Is there any facility to generate character n-grams instead of word n-grams?
Thanks, Dean.
