Hey there!

I have seen in the past some few research papers that combined tfidf 
based features with LDA topic model features and they could increase 
their accuracy by some useful extent.

I now wanted to do the same. As a simple step I just attended the topic 
features to each train and test sample with the existing tfidf features 
and performed my standard LinearSVC - oh btw thanks that the confusion 
with dense and sparse is now resolved in 0.12 ;) - on it.

The problem now is, that the results are overall exactly similar. Some 
classes perform better and some worse.

I am not exactly sure if this is a data problem, or comes from my lack 
of understanding of such feature extension techniques.

Is it possible that the huge amount of tfidf features somehow overrules 
the rather small number of topic features? Do I maybe have to some 
feature modification - because tfidf and LDA features are of different 
nature?

Maybe it is also due to the classifier and I need something else?

Would be happy if someone could shed a little light on my problems ;)

Regards,
Philipp

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to