Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Olivier Grisel
2014-07-24 16:43 GMT+02:00 Kartik Kumar Perisetla : > I actually used part of text of one wikipedia article which was used in > training. I was expecting it to detect the category for which it was used as > training instance. But it predicted as some other category and thus I > thought it did not g

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Kartik Kumar Perisetla
I actually used part of text of one wikipedia article which was used in training. I was expecting it to detect the category for which it was used as training instance. But it predicted as some other category and thus I thought it did not give accurate prediction. Please correct my understanding if

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Lars Buitinck
2014-07-24 4:35 GMT+02:00 Kartik Kumar Perisetla : > Also, Could someone please throw some light on how HashingVectorizer works? https://larsmans.github.io/ilps-hashing-trick/ https://en.wikipedia.org/wiki/Feature_hashing http://metaoptimize.com/qa/questions/6943/what-is-the-hashing-trick ---

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Olivier Grisel
2014-07-24 4:35 GMT+02:00 Kartik Kumar Perisetla : > Hello, > > I am creating a content classifier using scikit-learn through > HashingVectorizer( using this as reference: > http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_core_classification.html). > > The training dataset I am u

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Eustache DIEMERT
> But when I test the prediction for a new sentence or text, it gives wrong prediction. How do you measure that ? Having a few badly classified instances does not necessarily means the learning has failed. A good classification accuracy for text classification is typically > 80%, what is yours ?

[Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-23 Thread Kartik Kumar Perisetla
Hello, I am creating a content classifier using scikit-learn through HashingVectorizer( using this as reference: http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_core_classification.html ). The training dataset I am using wikipedia. For example, for "management" category I am tr