I would recommend the SGD classifiers.  I would also consider hierarchical
use of SGD classifiers for >40 categories or so.

On Fri, Dec 2, 2011 at 5:46 PM, Tom Pierce <[email protected]> wrote:

> Hi,
>
> I've run into the same or a similar error; I've filed MAHOUT-911 with
> a set of Wikipedia categories you can use to trigger this condition
> using the Wikipedia/NaiveBayes example recipe (classifier application
> fails in either mapreduce or sequential mode).
>
> -tom
>
> On Wed, Nov 16, 2011 at 7:51 AM, Lyall Morrison
> <[email protected]> wrote:
> > Hi everyone,
> >
> > I'm trying to classify some unsorted text files into different categories
> > using a Bayesian classifier, and it's going well until I try to run a
> > classifier with more than about 30 categories in it (the limit is between
> > 27 and 32, I haven't nailed it down yet).
> >
> > The training process claims to work fine up to the ~150 categories I have
> > identified, but actually running the classifier with a model with too
> many
> > categories in it causes it to hang without reporting any errors.
> >
> > Can anyone tell me if there is a known limit here or suggest an easy way
> to
> > diagnose this? My next resort is source diving, which I would prefer to
> > avoid if I can.
> >
> > If I'm reading it correctly, the version I'm using is Mahout 0.5-SNAPSHOT
> > which I haven't been keeping up to date as I feel better using a static
> > codebase while I'm mucking around - at least that way if something stops
> > working I know it's my fault ;)
> >
> > Thanks for your time,
> >
> > Lyall Morrison
>

Reply via email to