Hey there! I have seen in the past some few research papers that combined tfidf based features with LDA topic model features and they could increase their accuracy by some useful extent.
I now wanted to do the same. As a simple step I just attended the topic features to each train and test sample with the existing tfidf features and performed my standard LinearSVC - oh btw thanks that the confusion with dense and sparse is now resolved in 0.12 ;) - on it. The problem now is, that the results are overall exactly similar. Some classes perform better and some worse. I am not exactly sure if this is a data problem, or comes from my lack of understanding of such feature extension techniques. Is it possible that the huge amount of tfidf features somehow overrules the rather small number of topic features? Do I maybe have to some feature modification - because tfidf and LDA features are of different nature? Maybe it is also due to the classifier and I need something else? Would be happy if someone could shed a little light on my problems ;) Regards, Philipp ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general