Re: [Numpy-discussion] A minor clarification no why count_nonzero is faster for boolean arrays

2015-12-17 Thread Benjamin Root
Would it make sense to at all to bring that optimization to np.sum()? I
know that I have np.sum() all over the place instead of count_nonzero,
partly because it is a MatLab-ism and partly because it is easier to write.
I had no clue that there was a performance difference.

Cheers!
Ben Root


On Thu, Dec 17, 2015 at 1:37 PM, CJ Carey  wrote:

> I believe this line is the reason:
>
> https://github.com/numpy/numpy/blob/c0e48cfbbdef9cca954b0c4edd0052e1ec8a30aa/numpy/core/src/multiarray/item_selection.c#L2110
>
> On Thu, Dec 17, 2015 at 11:52 AM, Raghav R V  wrote:
>
>> I was just playing with `count_nonzero` and found it to be significantly
>> faster for boolean arrays compared to integer arrays
>>
>>
>> >>> a = np.random.randint(0, 2, (100, 5))
>> >>> a_bool = a.astype(bool)
>>
>> >>> %timeit np.sum(a)
>> 10 loops, best of 3: 5.64 µs per loop
>>
>> >>> %timeit np.count_nonzero(a)
>> 100 loops, best of 3: 1.42 us per loop
>>
>> >>> %timeit np.count_nonzero(a_bool)
>> 100 loops, best of 3: 279 ns per loop (but why?)
>>
>> I tried looking into the code and dug my way through to this line
>> .
>> I am unable to dig further.
>>
>> I know this is probably a trivial question, but was wondering if anyone
>> could provide insight on why this is so?
>>
>> Thanks
>>
>> R
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A minor clarification no why count_nonzero is faster for boolean arrays

2015-12-17 Thread CJ Carey
I believe this line is the reason:
https://github.com/numpy/numpy/blob/c0e48cfbbdef9cca954b0c4edd0052e1ec8a30aa/numpy/core/src/multiarray/item_selection.c#L2110

On Thu, Dec 17, 2015 at 11:52 AM, Raghav R V  wrote:

> I was just playing with `count_nonzero` and found it to be significantly
> faster for boolean arrays compared to integer arrays
>
>
> >>> a = np.random.randint(0, 2, (100, 5))
> >>> a_bool = a.astype(bool)
>
> >>> %timeit np.sum(a)
> 10 loops, best of 3: 5.64 µs per loop
>
> >>> %timeit np.count_nonzero(a)
> 100 loops, best of 3: 1.42 us per loop
>
> >>> %timeit np.count_nonzero(a_bool)
> 100 loops, best of 3: 279 ns per loop (but why?)
>
> I tried looking into the code and dug my way through to this line
> .
> I am unable to dig further.
>
> I know this is probably a trivial question, but was wondering if anyone
> could provide insight on why this is so?
>
> Thanks
>
> R
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A minor clarification no why count_nonzero is faster for boolean arrays

2015-12-17 Thread Jaime Fernández del Río
On Thu, Dec 17, 2015 at 7:37 PM, CJ Carey  wrote:

> I believe this line is the reason:
>
> https://github.com/numpy/numpy/blob/c0e48cfbbdef9cca954b0c4edd0052e1ec8a30aa/numpy/core/src/multiarray/item_selection.c#L2110
>

The magic actually happens in  count_nonzero_bytes_384, a few lines before
that (line 1986).

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A minor clarification no why count_nonzero is faster for boolean arrays

2015-12-17 Thread Raghav R V
Thanks a lot everyone!

I am time and again amazed by how optimized numpy is! Hats off to you guys!

R

On Thu, Dec 17, 2015 at 11:02 PM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> On Thu, Dec 17, 2015 at 7:37 PM, CJ Carey 
> wrote:
>
>> I believe this line is the reason:
>>
>> https://github.com/numpy/numpy/blob/c0e48cfbbdef9cca954b0c4edd0052e1ec8a30aa/numpy/core/src/multiarray/item_selection.c#L2110
>>
>
> The magic actually happens in  count_nonzero_bytes_384, a few lines
> before that (line 1986).
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion