I would recommend the SGD classifiers. I would also consider hierarchical use of SGD classifiers for >40 categories or so.
On Fri, Dec 2, 2011 at 5:46 PM, Tom Pierce <[email protected]> wrote: > Hi, > > I've run into the same or a similar error; I've filed MAHOUT-911 with > a set of Wikipedia categories you can use to trigger this condition > using the Wikipedia/NaiveBayes example recipe (classifier application > fails in either mapreduce or sequential mode). > > -tom > > On Wed, Nov 16, 2011 at 7:51 AM, Lyall Morrison > <[email protected]> wrote: > > Hi everyone, > > > > I'm trying to classify some unsorted text files into different categories > > using a Bayesian classifier, and it's going well until I try to run a > > classifier with more than about 30 categories in it (the limit is between > > 27 and 32, I haven't nailed it down yet). > > > > The training process claims to work fine up to the ~150 categories I have > > identified, but actually running the classifier with a model with too > many > > categories in it causes it to hang without reporting any errors. > > > > Can anyone tell me if there is a known limit here or suggest an easy way > to > > diagnose this? My next resort is source diving, which I would prefer to > > avoid if I can. > > > > If I'm reading it correctly, the version I'm using is Mahout 0.5-SNAPSHOT > > which I haven't been keeping up to date as I feel better using a static > > codebase while I'm mucking around - at least that way if something stops > > working I know it's my fault ;) > > > > Thanks for your time, > > > > Lyall Morrison >
