Re: [Numpy-discussion] question about in-place operations

2012-05-24 Thread Francesc Alted
On 5/22/12 9:08 PM, Massimo Di Pierro wrote:
> This problem is linear so probably Ram IO bound. I do not think I
> would benefit much for multiple cores. But I will give it a try. In
> the short term this is good enough for me.

Yeah, this what common sense seems to indicate, that RAM IO bound 
problems do not benefit from using multiple cores.  But reality is 
different:

 >>> import numpy as np
 >>> a = np.arange(1e8)
 >>> c = 1.0
 >>> time a*c
CPU times: user 0.22 s, sys: 0.20 s, total: 0.43 s
Wall time: 0.43 s
array([  0.e+00,   1.e+00,   2.e+00, ...,
  9.9970e+07,   9.9980e+07,   9.9990e+07])

Using numexpr with 1 thread:

 >>> import numexpr as ne
 >>> ne.set_num_threads(1)
8
 >>> time ne.evaluate("a*c")
CPU times: user 0.20 s, sys: 0.25 s, total: 0.45 s
Wall time: 0.45 s
array([  0.e+00,   1.e+00,   2.e+00, ...,
  9.9970e+07,   9.9980e+07,   9.9990e+07])

while using 8 threads (the machine has 8 physical cores):

 >>> ne.set_num_threads(8)
1
 >>> time ne.evaluate("a*c")
CPU times: user 0.39 s, sys: 0.68 s, total: 1.07 s
Wall time: 0.14 s
array([  0.e+00,   1.e+00,   2.e+00, ...,
  9.9970e+07,   9.9980e+07,   9.9990e+07])

which is 3x faster than using 1 single thread (look at wall time figures).

It has to be clear that this is purely due to the fact that several 
cores can transmit data to/from memory from/to CPU faster than one 
single core.  I have seen this behavior lots of times; for example, in 
slide 21 of this presentation:

http://pydata.org/pycon2012/numexpr-cython/slides.pdf

one can see how using several cores can speed up not only a polynomial 
computation, but also the simple expression "y = x", which is 
essentially a memory copy.

Another example where this effect can be seen is the Blosc compressor.  
For example, in:

http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks

the first points on each of the plots means that Blosc is in compression 
level 0, that is, it does not compress at all, and it basically copies 
data from origin to destination buffers.  Still, one can see that using 
several threads can accelerate this copy well beyond memcpy speed.

So, definitely, several cores can make your memory I/O bounded 
computations go faster.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Massimo Di Pierro
This problem is linear so probably Ram IO bound. I do not think I  
would benefit much for multiple cores. But I will give it a try. In  
the short term this is good enough for me.


On May 22, 2012, at 1:57 PM, Francesc Alted wrote:

> On 5/22/12 8:47 PM, Dag Sverre Seljebotn wrote:
>> On 05/22/2012 04:54 PM, Massimo DiPierro wrote:
>>> For now I will be doing this:
>>>
>>> import numpy
>>> import time
>>>
>>> a=numpy.zeros(200)
>>> b=numpy.zeros(200)
>>> c=1.0
>>>
>>> # naive solution
>>> t0 = time.time()
>>> for i in xrange(len(a)):
>>>  a[i] += c*b[i]
>>> print time.time()-t0
>>>
>>> # possible solution
>>> n=10
>>> t0 = time.time()
>>> for i in xrange(0,len(a),n):
>>>  a[i:i+n] += c*b[i:i+n]
>>> print time.time()-t0
>>>
>>> the second "possible" solution appears 1000x faster then the  
>>> former in my tests and uses little extra memory. It is only 2x  
>>> slower than b*=c.
>>>
>>> Any reason not to do it?
>> No, this is perfectly fine, you just manually did what numexpr does.
>
> Yeah.  You basically re-discovered the blocking technique.  For a more
> general example on how to apply the blocking technique with NumPy see
> the section "CPU vs Memory Benchmark" in:
>
> https://python.g-node.org/python-autumnschool-2010/materials/starving_cpus
>
> Of course numexpr has less overhead (and can use multiple cores) than
> using plain NumPy.
>
> -- 
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Massimo Di Pierro
Thank you Dag,

I will look into it. Is there any documentation about ufunc?
Is this the file core/src/umath/ufunc_object.c

Massimo


On May 22, 2012, at 1:47 PM, Dag Sverre Seljebotn wrote:

> On 05/22/2012 04:54 PM, Massimo DiPierro wrote:
>> For now I will be doing this:
>>
>> import numpy
>> import time
>>
>> a=numpy.zeros(200)
>> b=numpy.zeros(200)
>> c=1.0
>>
>> # naive solution
>> t0 = time.time()
>> for i in xrange(len(a)):
>> a[i] += c*b[i]
>> print time.time()-t0
>>
>> # possible solution
>> n=10
>> t0 = time.time()
>> for i in xrange(0,len(a),n):
>> a[i:i+n] += c*b[i:i+n]
>> print time.time()-t0
>>
>> the second "possible" solution appears 1000x faster then the former  
>> in my tests and uses little extra memory. It is only 2x slower than  
>> b*=c.
>>
>> Any reason not to do it?
>
> No, this is perfectly fine, you just manually did what numexpr does.
>
>
> On 05/22/2012 04:47 PM, Massimo DiPierro wrote:
>> Thank you. I will look into numexpr.
>>
>> Anyway, I do not need arbitrary expressions. If there were  
>> something like
>>
>> numpy.add_scaled(a,scale,b)
>>
>> with support for scale in int, float, complex, this would be
> sufficient for me.
>
> But of course, few needs *arbitrary* expressions -- it's just that the
> ones they want are not already compiled.
>
> It's the last 5% functionality that's different for everybody...
>
> (But the example you mention could make a nice ufunc; so an  
> alternative
> for you would be to look at the C implementation of np.add and try to
> submit a pull request for numpy.add_scaled)
>
> Dag
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Francesc Alted
On 5/22/12 8:47 PM, Dag Sverre Seljebotn wrote:
> On 05/22/2012 04:54 PM, Massimo DiPierro wrote:
>> For now I will be doing this:
>>
>> import numpy
>> import time
>>
>> a=numpy.zeros(200)
>> b=numpy.zeros(200)
>> c=1.0
>>
>> # naive solution
>> t0 = time.time()
>> for i in xrange(len(a)):
>>   a[i] += c*b[i]
>> print time.time()-t0
>>
>> # possible solution
>> n=10
>> t0 = time.time()
>> for i in xrange(0,len(a),n):
>>   a[i:i+n] += c*b[i:i+n]
>> print time.time()-t0
>>
>> the second "possible" solution appears 1000x faster then the former in my 
>> tests and uses little extra memory. It is only 2x slower than b*=c.
>>
>> Any reason not to do it?
> No, this is perfectly fine, you just manually did what numexpr does.

Yeah.  You basically re-discovered the blocking technique.  For a more 
general example on how to apply the blocking technique with NumPy see 
the section "CPU vs Memory Benchmark" in:

https://python.g-node.org/python-autumnschool-2010/materials/starving_cpus

Of course numexpr has less overhead (and can use multiple cores) than 
using plain NumPy.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Dag Sverre Seljebotn
On 05/22/2012 04:54 PM, Massimo DiPierro wrote:
> For now I will be doing this:
>
> import numpy
> import time
>
> a=numpy.zeros(200)
> b=numpy.zeros(200)
> c=1.0
>
> # naive solution
> t0 = time.time()
> for i in xrange(len(a)):
>  a[i] += c*b[i]
> print time.time()-t0
>
> # possible solution
> n=10
> t0 = time.time()
> for i in xrange(0,len(a),n):
>  a[i:i+n] += c*b[i:i+n]
> print time.time()-t0
>
> the second "possible" solution appears 1000x faster then the former in my 
> tests and uses little extra memory. It is only 2x slower than b*=c.
>
> Any reason not to do it?

No, this is perfectly fine, you just manually did what numexpr does.


On 05/22/2012 04:47 PM, Massimo DiPierro wrote:
 > Thank you. I will look into numexpr.
 >
 > Anyway, I do not need arbitrary expressions. If there were something like
 >
 > numpy.add_scaled(a,scale,b)
 >
 > with support for scale in int, float, complex, this would be 
sufficient for me.

But of course, few needs *arbitrary* expressions -- it's just that the 
ones they want are not already compiled.

It's the last 5% functionality that's different for everybody...

(But the example you mention could make a nice ufunc; so an alternative 
for you would be to look at the C implementation of np.add and try to 
submit a pull request for numpy.add_scaled)

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Massimo DiPierro
Thank you this does it.



On May 22, 2012, at 9:59 AM, Robert Kern wrote:

> On Tue, May 22, 2012 at 3:47 PM, Massimo DiPierro
>  wrote:
>> Thank you. I will look into numexpr.
>> 
>> Anyway, I do not need arbitrary expressions. If there were something like
>> 
>> numpy.add_scaled(a,scale,b)
>> 
>> with support for scale in int, float, complex, this would be sufficient for 
>> me.
> 
> BLAS has the xAXPY functions, which will do this for float and complex.
> 
> import numpy as np
> from scipy.linalg import fblas
> 
> def add_scaled_inplace(a, scale, b):
>if np.issubdtype(a.dtype, complex):
>fblas.zaxpy(b, a, a=scale)
>else:
>fblas.daxpy(b, a, a=scale)
> 
> -- 
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Robert Kern
On Tue, May 22, 2012 at 3:47 PM, Massimo DiPierro
 wrote:
> Thank you. I will look into numexpr.
>
> Anyway, I do not need arbitrary expressions. If there were something like
>
> numpy.add_scaled(a,scale,b)
>
> with support for scale in int, float, complex, this would be sufficient for 
> me.

BLAS has the xAXPY functions, which will do this for float and complex.

import numpy as np
from scipy.linalg import fblas

def add_scaled_inplace(a, scale, b):
if np.issubdtype(a.dtype, complex):
fblas.zaxpy(b, a, a=scale)
else:
fblas.daxpy(b, a, a=scale)

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Massimo DiPierro
For now I will be doing this:

import numpy
import time

a=numpy.zeros(200)
b=numpy.zeros(200)
c=1.0

# naive solution
t0 = time.time()
for i in xrange(len(a)):
a[i] += c*b[i]
print time.time()-t0

# possible solution
n=10
t0 = time.time()
for i in xrange(0,len(a),n):
a[i:i+n] += c*b[i:i+n]
print time.time()-t0

the second "possible" solution appears 1000x faster then the former in my tests 
and uses little extra memory. It is only 2x slower than b*=c.

Any reason not to do it?

On May 22, 2012, at 9:32 AM, Dag Sverre Seljebotn wrote:

> On 05/22/2012 04:25 PM, Massimo DiPierro wrote:
>> hello everybody,
>> 
>> first of all thanks to the developed for bumpy which is very useful. I am 
>> building a software that uses numpy+pyopencl for lattice qcd computations. 
>> One problem that I am facing is that I need to perform most operations on 
>> arrays in place and I must avoid creating temporary arrays (because my 
>> arrays are many gigabyte large).
>> 
>> One typical operation is this
>> 
>> a[i] += const * b[i]
>> 
>> What is the efficient way to do is when a and b are arbitrary arrays?  const 
>> is usually a complex number.
>> a and b have the same shape but are not necessarily uni-dimensional.
> 
> I don't think NumPy support this; if you can't modify b[i] in-place, I 
> think your only option will be one of numexpr/Theano/Cython.
> 
> Dag
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Massimo DiPierro
Thank you. I will look into numexpr. 

Anyway, I do not need arbitrary expressions. If there were something like

numpy.add_scaled(a,scale,b)

with support for scale in int, float, complex, this would be sufficient for me.

Massimo

On May 22, 2012, at 9:32 AM, Dag Sverre Seljebotn wrote:

> On 05/22/2012 04:25 PM, Massimo DiPierro wrote:
>> hello everybody,
>> 
>> first of all thanks to the developed for bumpy which is very useful. I am 
>> building a software that uses numpy+pyopencl for lattice qcd computations. 
>> One problem that I am facing is that I need to perform most operations on 
>> arrays in place and I must avoid creating temporary arrays (because my 
>> arrays are many gigabyte large).
>> 
>> One typical operation is this
>> 
>> a[i] += const * b[i]
>> 
>> What is the efficient way to do is when a and b are arbitrary arrays?  const 
>> is usually a complex number.
>> a and b have the same shape but are not necessarily uni-dimensional.
> 
> I don't think NumPy support this; if you can't modify b[i] in-place, I 
> think your only option will be one of numexpr/Theano/Cython.
> 
> Dag
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about in-place operations

2012-05-22 Thread Dag Sverre Seljebotn
On 05/22/2012 04:25 PM, Massimo DiPierro wrote:
> hello everybody,
>
> first of all thanks to the developed for bumpy which is very useful. I am 
> building a software that uses numpy+pyopencl for lattice qcd computations. 
> One problem that I am facing is that I need to perform most operations on 
> arrays in place and I must avoid creating temporary arrays (because my arrays 
> are many gigabyte large).
>
> One typical operation is this
>
> a[i] += const * b[i]
>
> What is the efficient way to do is when a and b are arbitrary arrays?  const 
> is usually a complex number.
> a and b have the same shape but are not necessarily uni-dimensional.

I don't think NumPy support this; if you can't modify b[i] in-place, I 
think your only option will be one of numexpr/Theano/Cython.

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] question about in-place operations

2012-05-22 Thread Massimo DiPierro
hello everybody,

first of all thanks to the developed for bumpy which is very useful. I am 
building a software that uses numpy+pyopencl for lattice qcd computations. One 
problem that I am facing is that I need to perform most operations on arrays in 
place and I must avoid creating temporary arrays (because my arrays are many 
gigabyte large).

One typical operation is this

a[i] += const * b[i]

What is the efficient way to do is when a and b are arbitrary arrays?  const is 
usually a complex number.
a and b have the same shape but are not necessarily uni-dimensional.

Massimo

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion