When I use "trainclassifier" I am able to run the 20 news groups just fine. I'm also able to train on my own data up until around 10M training documents.
Once I have enough training data, I find that "trainclassifier" succeeds and "testclassifier" fails. I have no idea if it was a training or testing problem. The errors reported by "testclassifier" are http://pastebin.com/YKqbjAQH . I have a suspicion that I am training on too much data, and need to increase the minDf, but I don't see a way to do it with "trainclassifier" While looking around for a fix, I read that "trainclassifier" is the old way, and that "trainnb" fixed some unusual back-end errors (which I suspect is what I'm getting). What is the difference? Is there any reason for me to start figuring how to use "trainnb"?
