2014-07-24 16:43 GMT+02:00 Kartik Kumar Perisetla :
> I actually used part of text of one wikipedia article which was used in
> training. I was expecting it to detect the category for which it was used as
> training instance. But it predicted as some other category and thus I
> thought it did not g
I actually used part of text of one wikipedia article which was used in
training. I was expecting it to detect the category for which it was used
as training instance. But it predicted as some other category and thus I
thought it did not give accurate prediction.
Please correct my understanding if
2014-07-24 4:35 GMT+02:00 Kartik Kumar Perisetla :
> Also, Could someone please throw some light on how HashingVectorizer works?
https://larsmans.github.io/ilps-hashing-trick/
https://en.wikipedia.org/wiki/Feature_hashing
http://metaoptimize.com/qa/questions/6943/what-is-the-hashing-trick
---
2014-07-24 4:35 GMT+02:00 Kartik Kumar Perisetla :
> Hello,
>
> I am creating a content classifier using scikit-learn through
> HashingVectorizer( using this as reference:
> http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_core_classification.html).
>
> The training dataset I am u
> But when I test the prediction for a new sentence or text, it gives wrong
prediction.
How do you measure that ?
Having a few badly classified instances does not necessarily means the
learning has failed.
A good classification accuracy for text classification is typically > 80%,
what is yours ?
Hello,
I am creating a content classifier using scikit-learn through
HashingVectorizer( using this as reference:
http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_core_classification.html
).
The training dataset I am using wikipedia. For example, for "management"
category I am tr