Yes.  Should work to use character n-grams.  There are oddities in the
stats because the different n-grams are not independent, but Naive Bayes
methods are in such a state of sin that it shouldn't hurt any worse.

No... I don't think that there is a capability built in to generate the
character n-grams.  Should be relatively trivial to build.



On Wed, Oct 9, 2013 at 3:18 AM, Dean Jones <[email protected]> wrote:

> Hello folks,
>
> I see that it's possible to use mahout to train a naive bayes
> classifier using n-grams as features (or I guess, strictly speaking,
> mahout can be used to generate sequence files containing n-grams; I
> suspect the naive bayes trainer is indifferent to the form of features
> it trains on). Is there any facility to generate character n-grams
> instead of word n-grams?
>
> Thanks,
>
> Dean.
>

Reply via email to