2012/9/14 Philipp Singer <kill...@gmail.com>: > Hey! > > Am 14.09.2012 15:10, schrieb Peter Prettenhofer: >> >> I totally agree - I had such an issue in my research as well >> (combining word presence features with SVD embeddings). >> I followed Blitzer et. al 2006 and normalized** both feature groups >> separately - e.g. you could normalize word presence features such that >> L1 norm equals 1 and do the same for the SVD embeddings. > > Isn't the normalization alread part of the tfidf transformation? > So basically the word presence tfidf features are already L2 normalized, > but maybe I misunderstand this completely.
I forgot that your LDA embedding is already L1 normalized (i.e. sums to 1). So both of your feature groups are already normalized; tf/idf is L2 and LDA is L1. > >> In my work I had the impression though, that L1|L2 normalization was >> inferior to simply scale the embeddings by a constant alpha such that >> the average L2 norm is 1.[1] > > Ah, I see. How would I exactly do that? Isn't that the same thing as the > normalization technique in scikit-learn is doing? Its as simple as computing the mean L2 norm and dividing the feature matrix by that number. Scaler does this per feature, Normalizer per sample - this computes one normalization constant for all features. Since the LDA embedding has an intrinsic semantic (document generated from topic distribution) - I don't think you should do this - please forget my comment. >> >> ** normalization here means row level normalization - similar do >> document length normalization in TF/IDF. >> >> HTH, >> Peter > > Regards, > Philipp >> >> Blitzer et al. 2006, Domain Adaptation using Structural Correspondence >> Learning, http://john.blitzer.com/papers/emnlp06.pdf >> >> [1] This is also described here: >> http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Peter Prettenhofer ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general