Am 14.09.2012 15:28, schrieb Philipp Singer: > Okay, so I did a fast chi2 check and it seems like some LDA features > have high p-values, so they should be helpful at least.
Oh, sorry. We want the lowest p-values, right? But that's the same case. There are many with low p-values. > > Am 14.09.2012 15:06, schrieb Andreas Müller: >> I'd be interested in the outcome. >> Let us know when you get it to work :) >> >> >> ----- Ursprüngliche Mail ----- >> Von: "Philipp Singer" <kill...@gmail.com> >> An: scikit-learn-general@lists.sourceforge.net >> Gesendet: Freitag, 14. September 2012 14:00:48 >> Betreff: Re: [Scikit-learn-general] Combining TFIDF and LDA features >> >> Am 14.09.2012 14:53, schrieb Andreas Müller: >>> Hi Philipp. >> >> Hey Andreas! >>> First, you should ensure that the features all have approximately the >>> same scale. >>> For example they should all be between zero and one - if the LDA >>> features >>> are much smaller than the other ones, then they will probably not be >>> weighted much. >> >> LDA features sum up to 1 for one sample, because they describe the >> probability of one sample to belong to the different topics (in this >> case 500). So basically, they are between 0 and 1. >>> >>> Which LDA package did you use? >> >> We used Mallet's LDA implementation, because from experience they have >> the most established smoothing processes. http://mallet.cs.umass.edu/ >> >> If we just train on the LDA features we btw get reasonable results, a >> bit worse than pure TFIDF. >>> >>> I am not very experienced with this kind of model, but maybe it would >>> be helpful >>> to look at some univariate statistics, like >>> ``feature_selection.chi2``, to see >>> if the LDA features are actually helpful. >> >> Yeah, this would be something I could look into. I have already tried to >> to feature selection with chi2 but not actually looked at the specific >> statistics. >>> >>> Cheers, >>> Andy >> >> Regards, >> Philipp >>> >>> >>> ----- Ursprüngliche Mail ----- >>> Von: "Philipp Singer" <kill...@gmail.com> >>> An: scikit-learn-general@lists.sourceforge.net >>> Gesendet: Freitag, 14. September 2012 13:47:30 >>> Betreff: [Scikit-learn-general] Combining TFIDF and LDA features >>> >>> Hey there! >>> >>> I have seen in the past some few research papers that combined tfidf >>> based features with LDA topic model features and they could increase >>> their accuracy by some useful extent. >>> >>> I now wanted to do the same. As a simple step I just attended the topic >>> features to each train and test sample with the existing tfidf features >>> and performed my standard LinearSVC - oh btw thanks that the confusion >>> with dense and sparse is now resolved in 0.12 ;) - on it. >>> >>> The problem now is, that the results are overall exactly similar. Some >>> classes perform better and some worse. >>> >>> I am not exactly sure if this is a data problem, or comes from my lack >>> of understanding of such feature extension techniques. >>> >>> Is it possible that the huge amount of tfidf features somehow overrules >>> the rather small number of topic features? Do I maybe have to some >>> feature modification - because tfidf and LDA features are of different >>> nature? >>> >>> Maybe it is also due to the classifier and I need something else? >>> >>> Would be happy if someone could shed a little light on my problems ;) >>> >>> Regards, >>> Philipp >>> >>> ------------------------------------------------------------------------------ >>> >>> Got visibility? >>> Most devs has no idea what their production app looks like. >>> Find out how fast your code is with AppDynamics Lite. >>> http://ad.doubleclick.net/clk;262219671;13503038;y? >>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> ------------------------------------------------------------------------------ >>> >>> Got visibility? >>> Most devs has no idea what their production app looks like. >>> Find out how fast your code is with AppDynamics Lite. >>> http://ad.doubleclick.net/clk;262219671;13503038;y? >>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> ------------------------------------------------------------------------------ >> >> Got visibility? >> Most devs has no idea what their production app looks like. >> Find out how fast your code is with AppDynamics Lite. >> http://ad.doubleclick.net/clk;262219671;13503038;y? >> http://info.appdynamics.com/FreeJavaPerformanceDownload.html >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> ------------------------------------------------------------------------------ >> >> Got visibility? >> Most devs has no idea what their production app looks like. >> Find out how fast your code is with AppDynamics Lite. >> http://ad.doubleclick.net/clk;262219671;13503038;y? >> http://info.appdynamics.com/FreeJavaPerformanceDownload.html >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general