2012/9/14 Philipp Singer <kill...@gmail.com>:
> Hey!
>
> Am 14.09.2012 15:10, schrieb Peter Prettenhofer:
>>
>> I totally agree - I had such an issue in my research as well
>> (combining word presence features with SVD embeddings).
>> I followed Blitzer et. al 2006 and normalized** both feature groups
>> separately - e.g. you could normalize word presence features such that
>> L1 norm equals 1 and do the same for the SVD embeddings.
>
> Isn't the normalization alread part of the tfidf transformation?
> So basically the word presence tfidf features are already L2 normalized,
> but maybe I misunderstand this completely.

I forgot that your LDA embedding is already L1 normalized (i.e. sums to 1).
So both of your feature groups are already normalized; tf/idf is L2
and LDA is L1.

>
>> In my work I had the impression though, that L1|L2 normalization was
>> inferior to simply scale the embeddings by a constant alpha such that
>> the average L2 norm is 1.[1]
>
> Ah, I see. How would I exactly do that? Isn't that the same thing as the
> normalization technique in scikit-learn is doing?

Its as simple as computing the mean L2 norm and dividing the feature
matrix by that number.
Scaler does this per feature, Normalizer per sample - this computes
one normalization constant for all features.

Since the LDA embedding has an intrinsic semantic (document generated
from topic distribution) -  I don't think you should do this - please
forget my comment.

>>
>> ** normalization here means row level normalization - similar do
>> document length normalization in TF/IDF.
>>
>> HTH,
>>   Peter
>
> Regards,
> Philipp
>>
>> Blitzer et al. 2006, Domain Adaptation using Structural Correspondence
>> Learning, http://john.blitzer.com/papers/emnlp06.pdf
>>
>> [1] This is also described here:
>> http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use
>
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to