Re: [Numpy-discussion] numpy.power -> numpy.random.choice Probabilities don't sum to 1

2015-12-19 Thread Andy Ray Terrel
A simple fix would certainly by pass the check in random.choice, but I
don't know how to get that. So let's focus on the summation.

I believe you are hitting an instability in summing small numbers as a
power to 10th order would produce. Here is an example:

mymatrix = np.random.rand(1024,1024).astype('float16')*1e-7
row_normalized = mymatrix / np.sum(mymatrix, axis=1, keepdims=True)
sums = row_normalized.sum(axis=1)
len(sums[sums != 1]) # ->  108

One can use things like Kahan summation and you will need to collect the
size of the error and truncate all numbers in mymatrix under that error.
I'm not quite sure how to quickly implement such a thing in numpy without a
loop.

On Fri, Dec 18, 2015 at 7:00 PM, Nathaniel Smith  wrote:

> On Fri, Dec 18, 2015 at 1:25 PM, Ryan R. Rosario 
> wrote:
> > Hi,
> >
> > I have a matrix whose entries I must raise to a certain power and then
> normalize by row. After I do that, when I pass some rows to
> numpy.random.choice, I get a ValueError: probabilities do not sum to 1.
> >
> > I understand that floating point is not perfect, and my matrix is so
> large that I cannot use np.longdouble because I will run out of RAM.
> >
> > As an example on a smaller matrix:
> >
> > np.power(mymatrix, 10, out=mymatrix)
> > row_normalized = np.apply_along_axis(lambda x: x / np.sum(x), 1,
> mymatrix)
>
> I'm sorry I don't have a solution to your actual problem off the top
> of my head, but it's probably helpful in general to know that a better
> way to write this would be just
>
>   row_normalized = mymatrix / np.sum(mymatrix, axis=1, keepdims=True)
>
> apply_along_axis is slow and can almost always be replaced by a
> broadcasting expression like this.
>
> > sums = row_normalized.sum(axis=1)
> > sums[np.where(sums != 1)]
>
> And here you can just write
>
>   sums[sums != 1]
>
> i.e. the call to where() isn't doing anything useful.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.power -> numpy.random.choice Probabilities don't sum to 1

2015-12-18 Thread Nathaniel Smith
On Fri, Dec 18, 2015 at 1:25 PM, Ryan R. Rosario  wrote:
> Hi,
>
> I have a matrix whose entries I must raise to a certain power and then 
> normalize by row. After I do that, when I pass some rows to 
> numpy.random.choice, I get a ValueError: probabilities do not sum to 1.
>
> I understand that floating point is not perfect, and my matrix is so large 
> that I cannot use np.longdouble because I will run out of RAM.
>
> As an example on a smaller matrix:
>
> np.power(mymatrix, 10, out=mymatrix)
> row_normalized = np.apply_along_axis(lambda x: x / np.sum(x), 1, mymatrix)

I'm sorry I don't have a solution to your actual problem off the top
of my head, but it's probably helpful in general to know that a better
way to write this would be just

  row_normalized = mymatrix / np.sum(mymatrix, axis=1, keepdims=True)

apply_along_axis is slow and can almost always be replaced by a
broadcasting expression like this.

> sums = row_normalized.sum(axis=1)
> sums[np.where(sums != 1)]

And here you can just write

  sums[sums != 1]

i.e. the call to where() isn't doing anything useful.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.power -> numpy.random.choice Probabilities don't sum to 1

2015-12-18 Thread Ryan R. Rosario
Hi,

I have a matrix whose entries I must raise to a certain power and then 
normalize by row. After I do that, when I pass some rows to 
numpy.random.choice, I get a ValueError: probabilities do not sum to 1.

I understand that floating point is not perfect, and my matrix is so large that 
I cannot use np.longdouble because I will run out of RAM.

As an example on a smaller matrix:

np.power(mymatrix, 10, out=mymatrix)
row_normalized = np.apply_along_axis(lambda x: x / np.sum(x), 1, mymatrix)
sums = row_normalized.sum(axis=1)
sums[np.where(sums != 1)]

array([ 0.9994,  0.9994,  1.0012, ...,  0.9994,
 0.9994,  0.9994], dtype=float32)

np.random.choice(range(row_normalized.shape[0]), 1, p=row_normalized[0, :])
…
ValueError: probabilities do not sum to 1


I also tried the normalize function in sklearn.preprocessing and have the same 
problem.

Is there a way to avoid this problem without having to make manual adjustments 
to get the row sums to = 1?

— Ryan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion