Hey!

Am 14.09.2012 15:10, schrieb Peter Prettenhofer:
>
> I totally agree - I had such an issue in my research as well
> (combining word presence features with SVD embeddings).
> I followed Blitzer et. al 2006 and normalized** both feature groups
> separately - e.g. you could normalize word presence features such that
> L1 norm equals 1 and do the same for the SVD embeddings.

Isn't the normalization alread part of the tfidf transformation?
So basically the word presence tfidf features are already L2 normalized, 
but maybe I misunderstand this completely.

> In my work I had the impression though, that L1|L2 normalization was
> inferior to simply scale the embeddings by a constant alpha such that
> the average L2 norm is 1.[1]

Ah, I see. How would I exactly do that? Isn't that the same thing as the 
normalization technique in scikit-learn is doing?
>
> ** normalization here means row level normalization - similar do
> document length normalization in TF/IDF.
>
> HTH,
>   Peter

Regards,
Philipp
>
> Blitzer et al. 2006, Domain Adaptation using Structural Correspondence
> Learning, http://john.blitzer.com/papers/emnlp06.pdf
>
> [1] This is also described here:
> http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use


------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to