Okay, so I did a fast chi2 check and it seems like some LDA features have high p-values, so they should be helpful at least.
Am 14.09.2012 15:06, schrieb Andreas Müller: > I'd be interested in the outcome. > Let us know when you get it to work :) > > > ----- Ursprüngliche Mail ----- > Von: "Philipp Singer" <kill...@gmail.com> > An: scikit-learn-general@lists.sourceforge.net > Gesendet: Freitag, 14. September 2012 14:00:48 > Betreff: Re: [Scikit-learn-general] Combining TFIDF and LDA features > > Am 14.09.2012 14:53, schrieb Andreas Müller: >> Hi Philipp. > > Hey Andreas! >> First, you should ensure that the features all have approximately the same >> scale. >> For example they should all be between zero and one - if the LDA features >> are much smaller than the other ones, then they will probably not be >> weighted much. > > LDA features sum up to 1 for one sample, because they describe the > probability of one sample to belong to the different topics (in this > case 500). So basically, they are between 0 and 1. >> >> Which LDA package did you use? > > We used Mallet's LDA implementation, because from experience they have > the most established smoothing processes. http://mallet.cs.umass.edu/ > > If we just train on the LDA features we btw get reasonable results, a > bit worse than pure TFIDF. >> >> I am not very experienced with this kind of model, but maybe it would be >> helpful >> to look at some univariate statistics, like ``feature_selection.chi2``, to >> see >> if the LDA features are actually helpful. > > Yeah, this would be something I could look into. I have already tried to > to feature selection with chi2 but not actually looked at the specific > statistics. >> >> Cheers, >> Andy > > Regards, > Philipp >> >> >> ----- Ursprüngliche Mail ----- >> Von: "Philipp Singer" <kill...@gmail.com> >> An: scikit-learn-general@lists.sourceforge.net >> Gesendet: Freitag, 14. September 2012 13:47:30 >> Betreff: [Scikit-learn-general] Combining TFIDF and LDA features >> >> Hey there! >> >> I have seen in the past some few research papers that combined tfidf >> based features with LDA topic model features and they could increase >> their accuracy by some useful extent. >> >> I now wanted to do the same. As a simple step I just attended the topic >> features to each train and test sample with the existing tfidf features >> and performed my standard LinearSVC - oh btw thanks that the confusion >> with dense and sparse is now resolved in 0.12 ;) - on it. >> >> The problem now is, that the results are overall exactly similar. Some >> classes perform better and some worse. >> >> I am not exactly sure if this is a data problem, or comes from my lack >> of understanding of such feature extension techniques. >> >> Is it possible that the huge amount of tfidf features somehow overrules >> the rather small number of topic features? Do I maybe have to some >> feature modification - because tfidf and LDA features are of different >> nature? >> >> Maybe it is also due to the classifier and I need something else? >> >> Would be happy if someone could shed a little light on my problems ;) >> >> Regards, >> Philipp >> >> ------------------------------------------------------------------------------ >> Got visibility? >> Most devs has no idea what their production app looks like. >> Find out how fast your code is with AppDynamics Lite. >> http://ad.doubleclick.net/clk;262219671;13503038;y? >> http://info.appdynamics.com/FreeJavaPerformanceDownload.html >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> ------------------------------------------------------------------------------ >> Got visibility? >> Most devs has no idea what their production app looks like. >> Find out how fast your code is with AppDynamics Lite. >> http://ad.doubleclick.net/clk;262219671;13503038;y? >> http://info.appdynamics.com/FreeJavaPerformanceDownload.html >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general