Hi Philipp.
First, you should ensure that the features all have approximately the same 
scale.
For example they should all be between zero and one - if the LDA features
are much smaller than the other ones, then they will probably not be weighted 
much.

Which LDA package did you use?

I am not very experienced with this kind of model, but maybe it would be helpful
to look at some univariate statistics, like ``feature_selection.chi2``, to see
if the LDA features are actually helpful.

Cheers,
Andy


----- Ursprüngliche Mail -----
Von: "Philipp Singer" <kill...@gmail.com>
An: scikit-learn-general@lists.sourceforge.net
Gesendet: Freitag, 14. September 2012 13:47:30
Betreff: [Scikit-learn-general] Combining TFIDF and LDA features

Hey there!

I have seen in the past some few research papers that combined tfidf 
based features with LDA topic model features and they could increase 
their accuracy by some useful extent.

I now wanted to do the same. As a simple step I just attended the topic 
features to each train and test sample with the existing tfidf features 
and performed my standard LinearSVC - oh btw thanks that the confusion 
with dense and sparse is now resolved in 0.12 ;) - on it.

The problem now is, that the results are overall exactly similar. Some 
classes perform better and some worse.

I am not exactly sure if this is a data problem, or comes from my lack 
of understanding of such feature extension techniques.

Is it possible that the huge amount of tfidf features somehow overrules 
the rather small number of topic features? Do I maybe have to some 
feature modification - because tfidf and LDA features are of different 
nature?

Maybe it is also due to the classifier and I need something else?

Would be happy if someone could shed a little light on my problems ;)

Regards,
Philipp

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to