Re: [Numpy-discussion] question about in-place operations
On 5/22/12 9:08 PM, Massimo Di Pierro wrote: > This problem is linear so probably Ram IO bound. I do not think I > would benefit much for multiple cores. But I will give it a try. In > the short term this is good enough for me. Yeah, this what common sense seems to indicate, that RAM IO bound problems do not benefit from using multiple cores. But reality is different: >>> import numpy as np >>> a = np.arange(1e8) >>> c = 1.0 >>> time a*c CPU times: user 0.22 s, sys: 0.20 s, total: 0.43 s Wall time: 0.43 s array([ 0.e+00, 1.e+00, 2.e+00, ..., 9.9970e+07, 9.9980e+07, 9.9990e+07]) Using numexpr with 1 thread: >>> import numexpr as ne >>> ne.set_num_threads(1) 8 >>> time ne.evaluate("a*c") CPU times: user 0.20 s, sys: 0.25 s, total: 0.45 s Wall time: 0.45 s array([ 0.e+00, 1.e+00, 2.e+00, ..., 9.9970e+07, 9.9980e+07, 9.9990e+07]) while using 8 threads (the machine has 8 physical cores): >>> ne.set_num_threads(8) 1 >>> time ne.evaluate("a*c") CPU times: user 0.39 s, sys: 0.68 s, total: 1.07 s Wall time: 0.14 s array([ 0.e+00, 1.e+00, 2.e+00, ..., 9.9970e+07, 9.9980e+07, 9.9990e+07]) which is 3x faster than using 1 single thread (look at wall time figures). It has to be clear that this is purely due to the fact that several cores can transmit data to/from memory from/to CPU faster than one single core. I have seen this behavior lots of times; for example, in slide 21 of this presentation: http://pydata.org/pycon2012/numexpr-cython/slides.pdf one can see how using several cores can speed up not only a polynomial computation, but also the simple expression "y = x", which is essentially a memory copy. Another example where this effect can be seen is the Blosc compressor. For example, in: http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks the first points on each of the plots means that Blosc is in compression level 0, that is, it does not compress at all, and it basically copies data from origin to destination buffers. Still, one can see that using several threads can accelerate this copy well beyond memcpy speed. So, definitely, several cores can make your memory I/O bounded computations go faster. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
This problem is linear so probably Ram IO bound. I do not think I would benefit much for multiple cores. But I will give it a try. In the short term this is good enough for me. On May 22, 2012, at 1:57 PM, Francesc Alted wrote: > On 5/22/12 8:47 PM, Dag Sverre Seljebotn wrote: >> On 05/22/2012 04:54 PM, Massimo DiPierro wrote: >>> For now I will be doing this: >>> >>> import numpy >>> import time >>> >>> a=numpy.zeros(200) >>> b=numpy.zeros(200) >>> c=1.0 >>> >>> # naive solution >>> t0 = time.time() >>> for i in xrange(len(a)): >>> a[i] += c*b[i] >>> print time.time()-t0 >>> >>> # possible solution >>> n=10 >>> t0 = time.time() >>> for i in xrange(0,len(a),n): >>> a[i:i+n] += c*b[i:i+n] >>> print time.time()-t0 >>> >>> the second "possible" solution appears 1000x faster then the >>> former in my tests and uses little extra memory. It is only 2x >>> slower than b*=c. >>> >>> Any reason not to do it? >> No, this is perfectly fine, you just manually did what numexpr does. > > Yeah. You basically re-discovered the blocking technique. For a more > general example on how to apply the blocking technique with NumPy see > the section "CPU vs Memory Benchmark" in: > > https://python.g-node.org/python-autumnschool-2010/materials/starving_cpus > > Of course numexpr has less overhead (and can use multiple cores) than > using plain NumPy. > > -- > Francesc Alted > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
Thank you Dag, I will look into it. Is there any documentation about ufunc? Is this the file core/src/umath/ufunc_object.c Massimo On May 22, 2012, at 1:47 PM, Dag Sverre Seljebotn wrote: > On 05/22/2012 04:54 PM, Massimo DiPierro wrote: >> For now I will be doing this: >> >> import numpy >> import time >> >> a=numpy.zeros(200) >> b=numpy.zeros(200) >> c=1.0 >> >> # naive solution >> t0 = time.time() >> for i in xrange(len(a)): >> a[i] += c*b[i] >> print time.time()-t0 >> >> # possible solution >> n=10 >> t0 = time.time() >> for i in xrange(0,len(a),n): >> a[i:i+n] += c*b[i:i+n] >> print time.time()-t0 >> >> the second "possible" solution appears 1000x faster then the former >> in my tests and uses little extra memory. It is only 2x slower than >> b*=c. >> >> Any reason not to do it? > > No, this is perfectly fine, you just manually did what numexpr does. > > > On 05/22/2012 04:47 PM, Massimo DiPierro wrote: >> Thank you. I will look into numexpr. >> >> Anyway, I do not need arbitrary expressions. If there were >> something like >> >> numpy.add_scaled(a,scale,b) >> >> with support for scale in int, float, complex, this would be > sufficient for me. > > But of course, few needs *arbitrary* expressions -- it's just that the > ones they want are not already compiled. > > It's the last 5% functionality that's different for everybody... > > (But the example you mention could make a nice ufunc; so an > alternative > for you would be to look at the C implementation of np.add and try to > submit a pull request for numpy.add_scaled) > > Dag > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
On 5/22/12 8:47 PM, Dag Sverre Seljebotn wrote: > On 05/22/2012 04:54 PM, Massimo DiPierro wrote: >> For now I will be doing this: >> >> import numpy >> import time >> >> a=numpy.zeros(200) >> b=numpy.zeros(200) >> c=1.0 >> >> # naive solution >> t0 = time.time() >> for i in xrange(len(a)): >> a[i] += c*b[i] >> print time.time()-t0 >> >> # possible solution >> n=10 >> t0 = time.time() >> for i in xrange(0,len(a),n): >> a[i:i+n] += c*b[i:i+n] >> print time.time()-t0 >> >> the second "possible" solution appears 1000x faster then the former in my >> tests and uses little extra memory. It is only 2x slower than b*=c. >> >> Any reason not to do it? > No, this is perfectly fine, you just manually did what numexpr does. Yeah. You basically re-discovered the blocking technique. For a more general example on how to apply the blocking technique with NumPy see the section "CPU vs Memory Benchmark" in: https://python.g-node.org/python-autumnschool-2010/materials/starving_cpus Of course numexpr has less overhead (and can use multiple cores) than using plain NumPy. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
On 05/22/2012 04:54 PM, Massimo DiPierro wrote: > For now I will be doing this: > > import numpy > import time > > a=numpy.zeros(200) > b=numpy.zeros(200) > c=1.0 > > # naive solution > t0 = time.time() > for i in xrange(len(a)): > a[i] += c*b[i] > print time.time()-t0 > > # possible solution > n=10 > t0 = time.time() > for i in xrange(0,len(a),n): > a[i:i+n] += c*b[i:i+n] > print time.time()-t0 > > the second "possible" solution appears 1000x faster then the former in my > tests and uses little extra memory. It is only 2x slower than b*=c. > > Any reason not to do it? No, this is perfectly fine, you just manually did what numexpr does. On 05/22/2012 04:47 PM, Massimo DiPierro wrote: > Thank you. I will look into numexpr. > > Anyway, I do not need arbitrary expressions. If there were something like > > numpy.add_scaled(a,scale,b) > > with support for scale in int, float, complex, this would be sufficient for me. But of course, few needs *arbitrary* expressions -- it's just that the ones they want are not already compiled. It's the last 5% functionality that's different for everybody... (But the example you mention could make a nice ufunc; so an alternative for you would be to look at the C implementation of np.add and try to submit a pull request for numpy.add_scaled) Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
Thank you this does it. On May 22, 2012, at 9:59 AM, Robert Kern wrote: > On Tue, May 22, 2012 at 3:47 PM, Massimo DiPierro > wrote: >> Thank you. I will look into numexpr. >> >> Anyway, I do not need arbitrary expressions. If there were something like >> >> numpy.add_scaled(a,scale,b) >> >> with support for scale in int, float, complex, this would be sufficient for >> me. > > BLAS has the xAXPY functions, which will do this for float and complex. > > import numpy as np > from scipy.linalg import fblas > > def add_scaled_inplace(a, scale, b): >if np.issubdtype(a.dtype, complex): >fblas.zaxpy(b, a, a=scale) >else: >fblas.daxpy(b, a, a=scale) > > -- > Robert Kern > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
On Tue, May 22, 2012 at 3:47 PM, Massimo DiPierro wrote: > Thank you. I will look into numexpr. > > Anyway, I do not need arbitrary expressions. If there were something like > > numpy.add_scaled(a,scale,b) > > with support for scale in int, float, complex, this would be sufficient for > me. BLAS has the xAXPY functions, which will do this for float and complex. import numpy as np from scipy.linalg import fblas def add_scaled_inplace(a, scale, b): if np.issubdtype(a.dtype, complex): fblas.zaxpy(b, a, a=scale) else: fblas.daxpy(b, a, a=scale) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
For now I will be doing this: import numpy import time a=numpy.zeros(200) b=numpy.zeros(200) c=1.0 # naive solution t0 = time.time() for i in xrange(len(a)): a[i] += c*b[i] print time.time()-t0 # possible solution n=10 t0 = time.time() for i in xrange(0,len(a),n): a[i:i+n] += c*b[i:i+n] print time.time()-t0 the second "possible" solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. Any reason not to do it? On May 22, 2012, at 9:32 AM, Dag Sverre Seljebotn wrote: > On 05/22/2012 04:25 PM, Massimo DiPierro wrote: >> hello everybody, >> >> first of all thanks to the developed for bumpy which is very useful. I am >> building a software that uses numpy+pyopencl for lattice qcd computations. >> One problem that I am facing is that I need to perform most operations on >> arrays in place and I must avoid creating temporary arrays (because my >> arrays are many gigabyte large). >> >> One typical operation is this >> >> a[i] += const * b[i] >> >> What is the efficient way to do is when a and b are arbitrary arrays? const >> is usually a complex number. >> a and b have the same shape but are not necessarily uni-dimensional. > > I don't think NumPy support this; if you can't modify b[i] in-place, I > think your only option will be one of numexpr/Theano/Cython. > > Dag > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
Thank you. I will look into numexpr. Anyway, I do not need arbitrary expressions. If there were something like numpy.add_scaled(a,scale,b) with support for scale in int, float, complex, this would be sufficient for me. Massimo On May 22, 2012, at 9:32 AM, Dag Sverre Seljebotn wrote: > On 05/22/2012 04:25 PM, Massimo DiPierro wrote: >> hello everybody, >> >> first of all thanks to the developed for bumpy which is very useful. I am >> building a software that uses numpy+pyopencl for lattice qcd computations. >> One problem that I am facing is that I need to perform most operations on >> arrays in place and I must avoid creating temporary arrays (because my >> arrays are many gigabyte large). >> >> One typical operation is this >> >> a[i] += const * b[i] >> >> What is the efficient way to do is when a and b are arbitrary arrays? const >> is usually a complex number. >> a and b have the same shape but are not necessarily uni-dimensional. > > I don't think NumPy support this; if you can't modify b[i] in-place, I > think your only option will be one of numexpr/Theano/Cython. > > Dag > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
On 05/22/2012 04:25 PM, Massimo DiPierro wrote: > hello everybody, > > first of all thanks to the developed for bumpy which is very useful. I am > building a software that uses numpy+pyopencl for lattice qcd computations. > One problem that I am facing is that I need to perform most operations on > arrays in place and I must avoid creating temporary arrays (because my arrays > are many gigabyte large). > > One typical operation is this > > a[i] += const * b[i] > > What is the efficient way to do is when a and b are arbitrary arrays? const > is usually a complex number. > a and b have the same shape but are not necessarily uni-dimensional. I don't think NumPy support this; if you can't modify b[i] in-place, I think your only option will be one of numexpr/Theano/Cython. Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] question about in-place operations
hello everybody, first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). One typical operation is this a[i] += const * b[i] What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. a and b have the same shape but are not necessarily uni-dimensional. Massimo ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion