2012/9/14 Andreas Müller <amuel...@ais.uni-bonn.de>:
> Hi Philipp.
> First, you should ensure that the features all have approximately the same 
> scale.
> For example they should all be between zero and one - if the LDA features
> are much smaller than the other ones, then they will probably not be weighted 
> much.

I totally agree - I had such an issue in my research as well
(combining word presence features with SVD embeddings).
I followed Blitzer et. al 2006 and normalized** both feature groups
separately - e.g. you could normalize word presence features such that
L1 norm equals 1 and do the same for the SVD embeddings.
In my work I had the impression though, that L1|L2 normalization was
inferior to simply scale the embeddings by a constant alpha such that
the average L2 norm is 1.[1]

** normalization here means row level normalization - similar do
document length normalization in TF/IDF.

HTH,
 Peter

Blitzer et al. 2006, Domain Adaptation using Structural Correspondence
Learning, http://john.blitzer.com/papers/emnlp06.pdf

[1] This is also described here:
http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use
>
> Which LDA package did you use?
>
> I am not very experienced with this kind of model, but maybe it would be 
> helpful
> to look at some univariate statistics, like ``feature_selection.chi2``, to see
> if the LDA features are actually helpful.
>
> Cheers,
> Andy
>
>
> ----- Ursprüngliche Mail -----
> Von: "Philipp Singer" <kill...@gmail.com>
> An: scikit-learn-general@lists.sourceforge.net
> Gesendet: Freitag, 14. September 2012 13:47:30
> Betreff: [Scikit-learn-general] Combining TFIDF and LDA features
>
> Hey there!
>
> I have seen in the past some few research papers that combined tfidf
> based features with LDA topic model features and they could increase
> their accuracy by some useful extent.
>
> I now wanted to do the same. As a simple step I just attended the topic
> features to each train and test sample with the existing tfidf features
> and performed my standard LinearSVC - oh btw thanks that the confusion
> with dense and sparse is now resolved in 0.12 ;) - on it.
>
> The problem now is, that the results are overall exactly similar. Some
> classes perform better and some worse.
>
> I am not exactly sure if this is a data problem, or comes from my lack
> of understanding of such feature extension techniques.
>
> Is it possible that the huge amount of tfidf features somehow overrules
> the rather small number of topic features? Do I maybe have to some
> feature modification - because tfidf and LDA features are of different
> nature?
>
> Maybe it is also due to the classifier and I need something else?
>
> Would be happy if someone could shed a little light on my problems ;)
>
> Regards,
> Philipp
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to