(normalize(X) * normalize(X)).sum(axis=1) works fine here. But I was unaware of these quirks in Python's implementation of pow:
Numpy seems to be consistent in returning nan when a negative float is raised to a non-integer (or equivalent float) power. By only calculating integer powers of negative floats, the absolute value is returned in suqareing. I assume this follows C conventions? Python, on the other hand, seems to do strange things: Numpy: >>> np.array(-.6) ** 2.1 nan >>> np.array(-.6+0j) ** 2.1 (0.32532987876940411+0.10570608538524294j) Python 3.6.2 returns the norm of the complex power: >>> -.6 ** 2.1 -0.3420720779420435 >>> (-.6 + 0j) ** 2.1 (0.3253298787694041+0.10570608538524294j) >>> (((-.6 + 0j) ** 2.1).real ** 2 + ((-.6 + 0j) ** 2.1).imag ** 2) ** .5 0.3420720779420434 Very strangely, putting the LHS in parentheses performs complex power in Python. >>> (-.6) ** 2.1 (0.3253298787694041+0.10570608538524294j) At https://docs.python.org/3/reference/expressions.html: Raising a negative number to a fractional power results in a complex <https://docs.python.org/3/library/functions.html#complex> number. (In earlier versions it raised a ValueError <https://docs.python.org/3/library/exceptions.html#ValueError>.) By "in earlier versions" it means Python 2. I don't know why this should only be the case where the LHS is parenthesised. Seems like a CPython bug! On 8 October 2017 at 16:08, Christopher Pfeifer <chrispfeifer8...@gmail.com> wrote: > I am attempting to validate the output of an L2 normalization function: > > *data_l2 = preprocessing.normalize(data, norm='l2') * # raw data > is below at end of this email > > output: > > array([[ 0.57649683, 0.53806371, 0.61492995], > [-0.53806371, -0.57649683, -0.61492995], > [ 0.3359268 , 0.90089461, -0.2748492 ], > [ 0.6676851 , -0.39566524, -0.63059148], > [-0.70710678, 0. , 0.70710678], > [-0.63116874, 0.45083482, 0.63116874]]) > > > Each row being a set of three features of an observation > > > I am under the belief that the sum of the 'squared' values of an instance > (row) should be virtually equal to 1 (normalized). > > > *Problem - 1:* > > the np.square() function is returning the absolute value of the sum of the > three features, even when the sum of the squares is clearly negative. > > np.square(-0.53806371) returns 0.28951255601896408 however, > (-0.53806371**2) returns -0.2895125560189641 > > The correct square of -0.53806371 is -0.2895125560189641 (a negative > number), even my 10 year old calculator gets it right. > > I can find nothing in the numpy documentation that indicates np.square() > always returns the absolute value, instead of the correctly signed value. > > *Question:* > > Is there a way to force np.square() to return the correctly signed square > value not the absolute value? > > > *Problem - 2:* > > For some of the observations (rows), the sum of the squared values (which > should be virtually 1), are nowhere near 1. > > > print 0.57649683**2 + 0.53806371**2 + 0.61492995**2 row 1 > > 0.9999999944260154 (this is virtually 1) > > > print -0.63116874**2 + 0.45083482**2 + 0.63116874**2 row 6 > > 0.203252034924 (*this is nowhere near 1*) > > > sum of the 'squared' values of an instance (row) should be virtually equal to > 1. > > > *Question:* > > Is the preprocessing.normalize(data, norm='l2') messing up, or is my raw data > being fed into the normalization routine to unrealistic (I made it up of both > positive and negative numbers. > > > *Raw Data* > > array([[ 1.5, 1.4, 1.6], > [-1.4, -1.5, -1.6], > [ 2.2, 5.9, -1.8], > [ 5.4, -3.2, -5.1], > [-1.4, 0. , 1.4], > [-1.4, 1. , 1.4]]) > > Thanks: Chris > > > P.S.: Not a real world problem, just trying to understand the functionality > of scikit-learn. Have only been working with the package for two weeks. > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn