Thank you Guillaume, that is helpful. Cheers
Sole On Tue, 24 Sep 2019 at 14:04, Guillaume Lemaître <g.lemaitr...@gmail.com> wrote: > One example where I saw it used was Scale-Invariant Feature Transform > (SIFT). Normalizing each vector to have a unit length will compensate for > affine changes in illumination between samples. > The use case given in scikit-learn would be something similar but with > text processing: > > "Scaling inputs to unit norms is a common operation for text > classification or clustering for instance. For instance the dot product of > two l2-normalized TF-IDF vectors is the cosine similarity of the vectors > and is the base similarity metric for the Vector Space Model commonly used > by the Information Retrieval community." > > So basically, you cancel a transform and it allows you to compare samples > between each other. > > On Tue, 24 Sep 2019 at 14:04, Sole Galli <solegal...@gmail.com> wrote: > >> Sorry, ignore my question, I got it right now. >> >> It is calculating the norm of the observation vector (across variables), >> and its distance varies obs per obs, that is why it needs to be >> re-calculated, and therefore not stored. >> >> I would appreciate some articles / links with successful implementations >> of this technique and why it adds value to ML. Would you be able to point >> me to any? >> >> Cheers >> >> Sole >> >> >> >> >> >> On Tue, 24 Sep 2019 at 12:39, Sole Galli <solegal...@gmail.com> wrote: >> >>> Hello team, >>> >>> Quick question respect to the Normalizer(). >>> >>> My understanding is that this transformer divides the values (rows) of a >>> vector by the vector euclidean (l2) or manhattan distances (l1). >>> >>> From the sklearn docs, I understand that the Normalizer() does not learn >>> the distances from the train set and stores them. It rathers normalises the >>> data according to distance the data set presents, which could be or not, >>> the same in test and train. >>> >>> Am I understanding this correctly? >>> >>> If so, what is the reason not to store these parameters in the >>> Normalizer and use them to scale future data? >>> >>> If not getting it right, what am I missing? >>> >>> Many thanks and I will appreciate if you have an article on this to >>> share. >>> >>> Cheers >>> >>> Sole >>> >>> >>> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn