Hi Dean, i might be wrong. but try googling for "shingling"... could be something to start with.
Cheers Jens 2013/10/9 Ted Dunning <[email protected]> > Yes. Should work to use character n-grams. There are oddities in the > stats because the different n-grams are not independent, but Naive Bayes > methods are in such a state of sin that it shouldn't hurt any worse. > > No... I don't think that there is a capability built in to generate the > character n-grams. Should be relatively trivial to build. > > > > On Wed, Oct 9, 2013 at 3:18 AM, Dean Jones <[email protected]> wrote: > > > Hello folks, > > > > I see that it's possible to use mahout to train a naive bayes > > classifier using n-grams as features (or I guess, strictly speaking, > > mahout can be used to generate sequence files containing n-grams; I > > suspect the naive bayes trainer is indifferent to the form of features > > it trains on). Is there any facility to generate character n-grams > > instead of word n-grams? > > > > Thanks, > > > > Dean. > > > <http://www.hightechmg.com>
