One example where I saw it used was Scale-Invariant Feature Transform (SIFT). Normalizing each vector to have a unit length will compensate for affine changes in illumination between samples. The use case given in scikit-learn would be something similar but with text processing:
"Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community." So basically, you cancel a transform and it allows you to compare samples between each other. On Tue, 24 Sep 2019 at 14:04, Sole Galli <solegal...@gmail.com> wrote: > Sorry, ignore my question, I got it right now. > > It is calculating the norm of the observation vector (across variables), > and its distance varies obs per obs, that is why it needs to be > re-calculated, and therefore not stored. > > I would appreciate some articles / links with successful implementations > of this technique and why it adds value to ML. Would you be able to point > me to any? > > Cheers > > Sole > > > > > > On Tue, 24 Sep 2019 at 12:39, Sole Galli <solegal...@gmail.com> wrote: > >> Hello team, >> >> Quick question respect to the Normalizer(). >> >> My understanding is that this transformer divides the values (rows) of a >> vector by the vector euclidean (l2) or manhattan distances (l1). >> >> From the sklearn docs, I understand that the Normalizer() does not learn >> the distances from the train set and stores them. It rathers normalises the >> data according to distance the data set presents, which could be or not, >> the same in test and train. >> >> Am I understanding this correctly? >> >> If so, what is the reason not to store these parameters in the Normalizer >> and use them to scale future data? >> >> If not getting it right, what am I missing? >> >> Many thanks and I will appreciate if you have an article on this to share. >> >> Cheers >> >> Sole >> >> >> _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn