2012/9/14 Andreas Müller <amuel...@ais.uni-bonn.de>: > Hi Philipp. > First, you should ensure that the features all have approximately the same > scale. > For example they should all be between zero and one - if the LDA features > are much smaller than the other ones, then they will probably not be weighted > much.
I totally agree - I had such an issue in my research as well (combining word presence features with SVD embeddings). I followed Blitzer et. al 2006 and normalized** both feature groups separately - e.g. you could normalize word presence features such that L1 norm equals 1 and do the same for the SVD embeddings. In my work I had the impression though, that L1|L2 normalization was inferior to simply scale the embeddings by a constant alpha such that the average L2 norm is 1.[1] ** normalization here means row level normalization - similar do document length normalization in TF/IDF. HTH, Peter Blitzer et al. 2006, Domain Adaptation using Structural Correspondence Learning, http://john.blitzer.com/papers/emnlp06.pdf [1] This is also described here: http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use > > Which LDA package did you use? > > I am not very experienced with this kind of model, but maybe it would be > helpful > to look at some univariate statistics, like ``feature_selection.chi2``, to see > if the LDA features are actually helpful. > > Cheers, > Andy > > > ----- Ursprüngliche Mail ----- > Von: "Philipp Singer" <kill...@gmail.com> > An: scikit-learn-general@lists.sourceforge.net > Gesendet: Freitag, 14. September 2012 13:47:30 > Betreff: [Scikit-learn-general] Combining TFIDF and LDA features > > Hey there! > > I have seen in the past some few research papers that combined tfidf > based features with LDA topic model features and they could increase > their accuracy by some useful extent. > > I now wanted to do the same. As a simple step I just attended the topic > features to each train and test sample with the existing tfidf features > and performed my standard LinearSVC - oh btw thanks that the confusion > with dense and sparse is now resolved in 0.12 ;) - on it. > > The problem now is, that the results are overall exactly similar. Some > classes perform better and some worse. > > I am not exactly sure if this is a data problem, or comes from my lack > of understanding of such feature extension techniques. > > Is it possible that the huge amount of tfidf features somehow overrules > the rather small number of topic features? Do I maybe have to some > feature modification - because tfidf and LDA features are of different > nature? > > Maybe it is also due to the classifier and I need something else? > > Would be happy if someone could shed a little light on my problems ;) > > Regards, > Philipp > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Peter Prettenhofer ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general