Thank you Guillaume, that is helpful.

Cheers

Sole

On Tue, 24 Sep 2019 at 14:04, Guillaume Lemaître <g.lemaitr...@gmail.com>
wrote:

> One example where I saw it used was Scale-Invariant Feature Transform
> (SIFT). Normalizing each vector to have a unit length will compensate for
> affine changes in illumination between samples.
> The use case given in scikit-learn would be something similar but with
> text processing:
>
> "Scaling inputs to unit norms is a common operation for text
> classification or clustering for instance. For instance the dot product of
> two l2-normalized TF-IDF vectors is the cosine similarity of the vectors
> and is the base similarity metric for the Vector Space Model commonly used
> by the Information Retrieval community."
>
> So basically, you cancel a transform and it allows you to compare samples
> between each other.
>
> On Tue, 24 Sep 2019 at 14:04, Sole Galli <solegal...@gmail.com> wrote:
>
>> Sorry, ignore my question, I got it right now.
>>
>> It is calculating the norm of the observation vector (across variables),
>> and its distance varies obs per obs, that is why it needs to be
>> re-calculated, and therefore not stored.
>>
>> I would appreciate some articles / links with successful implementations
>> of this technique and why it adds value to ML. Would you be able to point
>> me to any?
>>
>> Cheers
>>
>> Sole
>>
>>
>>
>>
>>
>> On Tue, 24 Sep 2019 at 12:39, Sole Galli <solegal...@gmail.com> wrote:
>>
>>> Hello team,
>>>
>>> Quick question respect to the Normalizer().
>>>
>>> My understanding is that this transformer divides the values (rows) of a
>>> vector by the vector euclidean (l2) or manhattan distances (l1).
>>>
>>> From the sklearn docs, I understand that the Normalizer() does not learn
>>> the distances from the train set and stores them. It rathers normalises the
>>> data according to distance the data set presents, which could be or not,
>>> the same in test and train.
>>>
>>> Am I understanding this correctly?
>>>
>>> If so, what is the reason not to store these parameters in the
>>> Normalizer and use them to scale future data?
>>>
>>> If not getting it right, what am I missing?
>>>
>>> Many thanks and I will appreciate if you have an article on this to
>>> share.
>>>
>>> Cheers
>>>
>>> Sole
>>>
>>>
>>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to