2014-07-24 16:43 GMT+02:00 Kartik Kumar Perisetla <kartik.p...@gmail.com>: > I actually used part of text of one wikipedia article which was used in > training. I was expecting it to detect the category for which it was used as > training instance. But it predicted as some other category and thus I > thought it did not give accurate prediction. > > Please correct my understanding if its wrong here.
Models can underfit, that is fail to giv perfect predictions even on the training set. For text classification as for other tasks, underfitting problem can be caused both by problems at the extracted features level, inadequate model parameter settings (e.g. the strength model regularization), inadequate model class and label noise (bad quality of the class labels them-selves) A good way to understand model underfitting and overfitting (in relation to the training set size) is to plot learning curves, both for the score on the training set and on the validation set, see for instance: http://scikit-learn.org/stable/auto_examples/plot_learning_curve.html -- Olivier ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general