Am 14.09.2012 14:53, schrieb Andreas Müller: > Hi Philipp. Hey Andreas! > First, you should ensure that the features all have approximately the same > scale. > For example they should all be between zero and one - if the LDA features > are much smaller than the other ones, then they will probably not be weighted > much.
LDA features sum up to 1 for one sample, because they describe the probability of one sample to belong to the different topics (in this case 500). So basically, they are between 0 and 1. > > Which LDA package did you use? We used Mallet's LDA implementation, because from experience they have the most established smoothing processes. http://mallet.cs.umass.edu/ If we just train on the LDA features we btw get reasonable results, a bit worse than pure TFIDF. > > I am not very experienced with this kind of model, but maybe it would be > helpful > to look at some univariate statistics, like ``feature_selection.chi2``, to see > if the LDA features are actually helpful. Yeah, this would be something I could look into. I have already tried to to feature selection with chi2 but not actually looked at the specific statistics. > > Cheers, > Andy Regards, Philipp > > > ----- Ursprüngliche Mail ----- > Von: "Philipp Singer" <kill...@gmail.com> > An: scikit-learn-general@lists.sourceforge.net > Gesendet: Freitag, 14. September 2012 13:47:30 > Betreff: [Scikit-learn-general] Combining TFIDF and LDA features > > Hey there! > > I have seen in the past some few research papers that combined tfidf > based features with LDA topic model features and they could increase > their accuracy by some useful extent. > > I now wanted to do the same. As a simple step I just attended the topic > features to each train and test sample with the existing tfidf features > and performed my standard LinearSVC - oh btw thanks that the confusion > with dense and sparse is now resolved in 0.12 ;) - on it. > > The problem now is, that the results are overall exactly similar. Some > classes perform better and some worse. > > I am not exactly sure if this is a data problem, or comes from my lack > of understanding of such feature extension techniques. > > Is it possible that the huge amount of tfidf features somehow overrules > the rather small number of topic features? Do I maybe have to some > feature modification - because tfidf and LDA features are of different > nature? > > Maybe it is also due to the classifier and I need something else? > > Would be happy if someone could shed a little light on my problems ;) > > Regards, > Philipp > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general