Am 14.09.2012 15:10, schrieb amir rahimi:
> Have you done tests using some other classifiers such as gradient
> boosting which has a kind of internal feature selection?

Actually not, but I wanted to try that out, if the runtime allows it.
>
> On Fri, Sep 14, 2012 at 5:36 PM, Andreas Müller
> <amuel...@ais.uni-bonn.de <mailto:amuel...@ais.uni-bonn.de>> wrote:
>
>     I'd be interested in the outcome.
>     Let us know when you get it to work :)
>
>
>     ----- Ursprüngliche Mail -----
>     Von: "Philipp Singer" <kill...@gmail.com <mailto:kill...@gmail.com>>
>     An: scikit-learn-general@lists.sourceforge.net
>     <mailto:scikit-learn-general@lists.sourceforge.net>
>     Gesendet: Freitag, 14. September 2012 14:00:48
>     Betreff: Re: [Scikit-learn-general] Combining TFIDF and LDA features
>
>     Am 14.09.2012 14:53, schrieb Andreas Müller:
>      > Hi Philipp.
>
>     Hey Andreas!
>      > First, you should ensure that the features all have approximately
>     the same scale.
>      > For example they should all be between zero and one - if the LDA
>     features
>      > are much smaller than the other ones, then they will probably not
>     be weighted much.
>
>     LDA features sum up to 1 for one sample, because they describe the
>     probability of one sample to belong to the different topics (in this
>     case 500). So basically, they are between 0 and 1.
>      >
>      > Which LDA package did you use?
>
>     We used Mallet's LDA implementation, because from experience they have
>     the most established smoothing processes. http://mallet.cs.umass.edu/
>
>     If we just train on the LDA features we btw get reasonable results, a
>     bit worse than pure TFIDF.
>      >
>      > I am not very experienced with this kind of model, but maybe it
>     would be helpful
>      > to look at some univariate statistics, like
>     ``feature_selection.chi2``, to see
>      > if the LDA features are actually helpful.
>
>     Yeah, this would be something I could look into. I have already tried to
>     to feature selection with chi2 but not actually looked at the specific
>     statistics.
>      >
>      > Cheers,
>      > Andy
>
>     Regards,
>     Philipp
>      >
>      >
>      > ----- Ursprüngliche Mail -----
>      > Von: "Philipp Singer" <kill...@gmail.com <mailto:kill...@gmail.com>>
>      > An: scikit-learn-general@lists.sourceforge.net
>     <mailto:scikit-learn-general@lists.sourceforge.net>
>      > Gesendet: Freitag, 14. September 2012 13:47:30
>      > Betreff: [Scikit-learn-general] Combining TFIDF and LDA features
>      >
>      > Hey there!
>      >
>      > I have seen in the past some few research papers that combined tfidf
>      > based features with LDA topic model features and they could increase
>      > their accuracy by some useful extent.
>      >
>      > I now wanted to do the same. As a simple step I just attended the
>     topic
>      > features to each train and test sample with the existing tfidf
>     features
>      > and performed my standard LinearSVC - oh btw thanks that the
>     confusion
>      > with dense and sparse is now resolved in 0.12 ;) - on it.
>      >
>      > The problem now is, that the results are overall exactly similar.
>     Some
>      > classes perform better and some worse.
>      >
>      > I am not exactly sure if this is a data problem, or comes from my
>     lack
>      > of understanding of such feature extension techniques.
>      >
>      > Is it possible that the huge amount of tfidf features somehow
>     overrules
>      > the rather small number of topic features? Do I maybe have to some
>      > feature modification - because tfidf and LDA features are of
>     different
>      > nature?
>      >
>      > Maybe it is also due to the classifier and I need something else?
>      >
>      > Would be happy if someone could shed a little light on my problems ;)
>      >
>      > Regards,
>      > Philipp
>      >
>      >
>     
> ------------------------------------------------------------------------------
>      > Got visibility?
>      > Most devs has no idea what their production app looks like.
>      > Find out how fast your code is with AppDynamics Lite.
>      > http://ad.doubleclick.net/clk;262219671;13503038;y?
>      > http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>      > _______________________________________________
>      > Scikit-learn-general mailing list
>      > Scikit-learn-general@lists.sourceforge.net
>     <mailto:Scikit-learn-general@lists.sourceforge.net>
>      > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>      >
>      >
>     
> ------------------------------------------------------------------------------
>      > Got visibility?
>      > Most devs has no idea what their production app looks like.
>      > Find out how fast your code is with AppDynamics Lite.
>      > http://ad.doubleclick.net/clk;262219671;13503038;y?
>      > http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>      > _______________________________________________
>      > Scikit-learn-general mailing list
>      > Scikit-learn-general@lists.sourceforge.net
>     <mailto:Scikit-learn-general@lists.sourceforge.net>
>      > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>      >
>
>
>     
> ------------------------------------------------------------------------------
>     Got visibility?
>     Most devs has no idea what their production app looks like.
>     Find out how fast your code is with AppDynamics Lite.
>     http://ad.doubleclick.net/clk;262219671;13503038;y?
>     http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>     _______________________________________________
>     Scikit-learn-general mailing list
>     Scikit-learn-general@lists.sourceforge.net
>     <mailto:Scikit-learn-general@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>     
> ------------------------------------------------------------------------------
>     Got visibility?
>     Most devs has no idea what their production app looks like.
>     Find out how fast your code is with AppDynamics Lite.
>     http://ad.doubleclick.net/clk;262219671;13503038;y?
>     http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>     _______________________________________________
>     Scikit-learn-general mailing list
>     Scikit-learn-general@lists.sourceforge.net
>     <mailto:Scikit-learn-general@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> --
> ----------------------------------------------------------------------
> #include <stdio.h>
> double d[]={9299037773.178347,2226415.983937417,307.0};
> main(){d[2]--?d[0]*=4,d[1]*=5,main():printf((char*)d);}
> ----------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to