Have you done tests using some other classifiers such as gradient boosting
which has a kind of internal feature selection?

On Fri, Sep 14, 2012 at 5:36 PM, Andreas Müller <amuel...@ais.uni-bonn.de>wrote:

> I'd be interested in the outcome.
> Let us know when you get it to work :)
>
>
> ----- Ursprüngliche Mail -----
> Von: "Philipp Singer" <kill...@gmail.com>
> An: scikit-learn-general@lists.sourceforge.net
> Gesendet: Freitag, 14. September 2012 14:00:48
> Betreff: Re: [Scikit-learn-general] Combining TFIDF and LDA features
>
> Am 14.09.2012 14:53, schrieb Andreas Müller:
> > Hi Philipp.
>
> Hey Andreas!
> > First, you should ensure that the features all have approximately the
> same scale.
> > For example they should all be between zero and one - if the LDA features
> > are much smaller than the other ones, then they will probably not be
> weighted much.
>
> LDA features sum up to 1 for one sample, because they describe the
> probability of one sample to belong to the different topics (in this
> case 500). So basically, they are between 0 and 1.
> >
> > Which LDA package did you use?
>
> We used Mallet's LDA implementation, because from experience they have
> the most established smoothing processes. http://mallet.cs.umass.edu/
>
> If we just train on the LDA features we btw get reasonable results, a
> bit worse than pure TFIDF.
> >
> > I am not very experienced with this kind of model, but maybe it would be
> helpful
> > to look at some univariate statistics, like ``feature_selection.chi2``,
> to see
> > if the LDA features are actually helpful.
>
> Yeah, this would be something I could look into. I have already tried to
> to feature selection with chi2 but not actually looked at the specific
> statistics.
> >
> > Cheers,
> > Andy
>
> Regards,
> Philipp
> >
> >
> > ----- Ursprüngliche Mail -----
> > Von: "Philipp Singer" <kill...@gmail.com>
> > An: scikit-learn-general@lists.sourceforge.net
> > Gesendet: Freitag, 14. September 2012 13:47:30
> > Betreff: [Scikit-learn-general] Combining TFIDF and LDA features
> >
> > Hey there!
> >
> > I have seen in the past some few research papers that combined tfidf
> > based features with LDA topic model features and they could increase
> > their accuracy by some useful extent.
> >
> > I now wanted to do the same. As a simple step I just attended the topic
> > features to each train and test sample with the existing tfidf features
> > and performed my standard LinearSVC - oh btw thanks that the confusion
> > with dense and sparse is now resolved in 0.12 ;) - on it.
> >
> > The problem now is, that the results are overall exactly similar. Some
> > classes perform better and some worse.
> >
> > I am not exactly sure if this is a data problem, or comes from my lack
> > of understanding of such feature extension techniques.
> >
> > Is it possible that the huge amount of tfidf features somehow overrules
> > the rather small number of topic features? Do I maybe have to some
> > feature modification - because tfidf and LDA features are of different
> > nature?
> >
> > Maybe it is also due to the classifier and I need something else?
> >
> > Would be happy if someone could shed a little light on my problems ;)
> >
> > Regards,
> > Philipp
> >
> >
> ------------------------------------------------------------------------------
> > Got visibility?
> > Most devs has no idea what their production app looks like.
> > Find out how fast your code is with AppDynamics Lite.
> > http://ad.doubleclick.net/clk;262219671;13503038;y?
> > http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> ------------------------------------------------------------------------------
> > Got visibility?
> > Most devs has no idea what their production app looks like.
> > Find out how fast your code is with AppDynamics Lite.
> > http://ad.doubleclick.net/clk;262219671;13503038;y?
> > http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
----------------------------------------------------------------------
#include <stdio.h>
double d[]={9299037773.178347,2226415.983937417,307.0};
main(){d[2]--?d[0]*=4,d[1]*=5,main():printf((char*)d);}
----------------------------------------------------------------------
------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to