Am 14.09.2012 15:10, schrieb amir rahimi: > Have you done tests using some other classifiers such as gradient > boosting which has a kind of internal feature selection?
Actually not, but I wanted to try that out, if the runtime allows it. > > On Fri, Sep 14, 2012 at 5:36 PM, Andreas Müller > <amuel...@ais.uni-bonn.de <mailto:amuel...@ais.uni-bonn.de>> wrote: > > I'd be interested in the outcome. > Let us know when you get it to work :) > > > ----- Ursprüngliche Mail ----- > Von: "Philipp Singer" <kill...@gmail.com <mailto:kill...@gmail.com>> > An: scikit-learn-general@lists.sourceforge.net > <mailto:scikit-learn-general@lists.sourceforge.net> > Gesendet: Freitag, 14. September 2012 14:00:48 > Betreff: Re: [Scikit-learn-general] Combining TFIDF and LDA features > > Am 14.09.2012 14:53, schrieb Andreas Müller: > > Hi Philipp. > > Hey Andreas! > > First, you should ensure that the features all have approximately > the same scale. > > For example they should all be between zero and one - if the LDA > features > > are much smaller than the other ones, then they will probably not > be weighted much. > > LDA features sum up to 1 for one sample, because they describe the > probability of one sample to belong to the different topics (in this > case 500). So basically, they are between 0 and 1. > > > > Which LDA package did you use? > > We used Mallet's LDA implementation, because from experience they have > the most established smoothing processes. http://mallet.cs.umass.edu/ > > If we just train on the LDA features we btw get reasonable results, a > bit worse than pure TFIDF. > > > > I am not very experienced with this kind of model, but maybe it > would be helpful > > to look at some univariate statistics, like > ``feature_selection.chi2``, to see > > if the LDA features are actually helpful. > > Yeah, this would be something I could look into. I have already tried to > to feature selection with chi2 but not actually looked at the specific > statistics. > > > > Cheers, > > Andy > > Regards, > Philipp > > > > > > ----- Ursprüngliche Mail ----- > > Von: "Philipp Singer" <kill...@gmail.com <mailto:kill...@gmail.com>> > > An: scikit-learn-general@lists.sourceforge.net > <mailto:scikit-learn-general@lists.sourceforge.net> > > Gesendet: Freitag, 14. September 2012 13:47:30 > > Betreff: [Scikit-learn-general] Combining TFIDF and LDA features > > > > Hey there! > > > > I have seen in the past some few research papers that combined tfidf > > based features with LDA topic model features and they could increase > > their accuracy by some useful extent. > > > > I now wanted to do the same. As a simple step I just attended the > topic > > features to each train and test sample with the existing tfidf > features > > and performed my standard LinearSVC - oh btw thanks that the > confusion > > with dense and sparse is now resolved in 0.12 ;) - on it. > > > > The problem now is, that the results are overall exactly similar. > Some > > classes perform better and some worse. > > > > I am not exactly sure if this is a data problem, or comes from my > lack > > of understanding of such feature extension techniques. > > > > Is it possible that the huge amount of tfidf features somehow > overrules > > the rather small number of topic features? Do I maybe have to some > > feature modification - because tfidf and LDA features are of > different > > nature? > > > > Maybe it is also due to the classifier and I need something else? > > > > Would be happy if someone could shed a little light on my problems ;) > > > > Regards, > > Philipp > > > > > > ------------------------------------------------------------------------------ > > Got visibility? > > Most devs has no idea what their production app looks like. > > Find out how fast your code is with AppDynamics Lite. > > http://ad.doubleclick.net/clk;262219671;13503038;y? > > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > <mailto:Scikit-learn-general@lists.sourceforge.net> > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > > ------------------------------------------------------------------------------ > > Got visibility? > > Most devs has no idea what their production app looks like. > > Find out how fast your code is with AppDynamics Lite. > > http://ad.doubleclick.net/clk;262219671;13503038;y? > > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > <mailto:Scikit-learn-general@lists.sourceforge.net> > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > <mailto:Scikit-learn-general@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > <mailto:Scikit-learn-general@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > -- > ---------------------------------------------------------------------- > #include <stdio.h> > double d[]={9299037773.178347,2226415.983937417,307.0}; > main(){d[2]--?d[0]*=4,d[1]*=5,main():printf((char*)d);} > ---------------------------------------------------------------------- > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > > > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general