Hey! Am 14.09.2012 15:10, schrieb Peter Prettenhofer: > > I totally agree - I had such an issue in my research as well > (combining word presence features with SVD embeddings). > I followed Blitzer et. al 2006 and normalized** both feature groups > separately - e.g. you could normalize word presence features such that > L1 norm equals 1 and do the same for the SVD embeddings.
Isn't the normalization alread part of the tfidf transformation? So basically the word presence tfidf features are already L2 normalized, but maybe I misunderstand this completely. > In my work I had the impression though, that L1|L2 normalization was > inferior to simply scale the embeddings by a constant alpha such that > the average L2 norm is 1.[1] Ah, I see. How would I exactly do that? Isn't that the same thing as the normalization technique in scikit-learn is doing? > > ** normalization here means row level normalization - similar do > document length normalization in TF/IDF. > > HTH, > Peter Regards, Philipp > > Blitzer et al. 2006, Domain Adaptation using Structural Correspondence > Learning, http://john.blitzer.com/papers/emnlp06.pdf > > [1] This is also described here: > http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general