Hi everyone, I'm trying to classify some unsorted text files into different categories using a Bayesian classifier, and it's going well until I try to run a classifier with more than about 30 categories in it (the limit is between 27 and 32, I haven't nailed it down yet).
The training process claims to work fine up to the ~150 categories I have identified, but actually running the classifier with a model with too many categories in it causes it to hang without reporting any errors. Can anyone tell me if there is a known limit here or suggest an easy way to diagnose this? My next resort is source diving, which I would prefer to avoid if I can. If I'm reading it correctly, the version I'm using is Mahout 0.5-SNAPSHOT which I haven't been keeping up to date as I feel better using a static codebase while I'm mucking around - at least that way if something stops working I know it's my fault ;) Thanks for your time, Lyall Morrison
