Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

2012-12-03 Thread Nathaniel Smith
On Mon, Dec 3, 2012 at 1:28 AM, Raul Cota r...@virtualmaterials.com wrote:
 I finally decided to track down the problem and I started by getting
 Python 2.6 from source and profiling it in one of my cases. By far the
 biggest bottleneck came out to be PyString_FromFormatV which is a
 function to assemble a string for a Python error caused by a failure to
 find an attribute when multiarray calls PyObject_GetAttrString. This
 function seems to get called way too often from NumPy. The real
 bottleneck of trying to find the attribute when it does not exist is not
 that it fails to find it, but that it builds a string to set a Python
 error. In other words, something as simple as a[0]  3.5 internally
 result in a call to set a python error .

 I downloaded NumPy code (for Python 2.6) and tracked down all the calls
 like this,

   ret = PyObject_GetAttrString(obj, __array_priority__);

 and changed to
  if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
  PyTuple_CheckExact(obj) ||
  PyFloat_CheckExact(obj) ||
  PyInt_CheckExact(obj) ||
  PyString_CheckExact(obj) ||
  PyUnicode_CheckExact(obj)){
  //Avoid expensive calls when I am sure the attribute
  //does not exist
  ret = NULL;
  }
  else{
  ret = PyObject_GetAttrString(obj, __array_priority__);

 ( I think I found about 7 spots )

If the problem is the exception construction, then maybe this would
work about as well?

if (PyObject_HasAttrString(obj, __array_priority__) {
ret = PyObject_GetAttrString(obj, __array_priority__);
} else {
ret = NULL;
}

If so then it would be an easier and more reliable way to accomplish this.

 I also noticed (not as bad in my case) that calls to PyObject_GetBuffer
 also resulted in Python errors being set thus unnecessarily slower code.

 With this change, something like this,
  for i in xrange(100):
  if a[1]  35.0:
  pass

 went down from 0.8 seconds to 0.38 seconds.

Huh, why is PyObject_GetBuffer even getting called in this case?

 A bogus test like this,
 for i in xrange(100):
  a = array([1., 2., 3.])

 went down from 8.5 seconds to 2.5 seconds.

I can see why we'd call PyObject_GetBuffer in this case, but not why
it would take 2/3rds of the total run-time...

 - The core of my problems I think boil down to things like this
 s = a[0]
 assigning a float64 into s as opposed to a native float ?
 Is there any way to hack code to change it to extract a native float
 instead ? (probably crazy talk, but I thought I'd ask :) ).
 I'd prefer to not use s = a.item(0) because I would have to change too
 much code and it is not even that much faster. For example,
  for i in xrange(100):
  if a.item(1)  35.0:
  pass
 is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)

I'm confused here -- first you say that your problems would be fixed
if a[0] gave you a native float, but then you say that a.item(0)
(which is basically a[0] that gives a native float) is still too slow?
(OTOH at 40% speedup is pretty good, even if it is just a
microbenchmark :-).) Array scalars are definitely pretty slow:

In [9]: timeit a[0]
100 loops, best of 3: 151 ns per loop

In [10]: timeit a.item(0)
1000 loops, best of 3: 169 ns per loop

In [11]: timeit a[0]  35.0
100 loops, best of 3: 989 ns per loop

In [12]: timeit a.item(0)  35.0
100 loops, best of 3: 233 ns per loop

It is probably possible to make numpy scalars faster... I'm not even
sure why they go through the ufunc machinery, like Travis said, since
they don't even follow the ufunc rules:

In [3]: np.array(2) * [1, 2, 3]  # 0-dim array coerces and broadcasts
Out[3]: array([2, 4, 6])

In [4]: np.array(2)[()] * [1, 2, 3]  # scalar acts like python integer
Out[4]: [1, 2, 3, 1, 2, 3]

But you may want to experiment a bit more to make sure this is
actually the problem. IME guesses about speed problems are almost
always wrong (even when I take this rule into account and only guess
when I'm *really* sure).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

2012-12-03 Thread josef . pktd
On Mon, Dec 3, 2012 at 6:14 AM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Dec 3, 2012 at 1:28 AM, Raul Cota r...@virtualmaterials.com wrote:
 I finally decided to track down the problem and I started by getting
 Python 2.6 from source and profiling it in one of my cases. By far the
 biggest bottleneck came out to be PyString_FromFormatV which is a
 function to assemble a string for a Python error caused by a failure to
 find an attribute when multiarray calls PyObject_GetAttrString. This
 function seems to get called way too often from NumPy. The real
 bottleneck of trying to find the attribute when it does not exist is not
 that it fails to find it, but that it builds a string to set a Python
 error. In other words, something as simple as a[0]  3.5 internally
 result in a call to set a python error .

 I downloaded NumPy code (for Python 2.6) and tracked down all the calls
 like this,

   ret = PyObject_GetAttrString(obj, __array_priority__);

 and changed to
  if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
  PyTuple_CheckExact(obj) ||
  PyFloat_CheckExact(obj) ||
  PyInt_CheckExact(obj) ||
  PyString_CheckExact(obj) ||
  PyUnicode_CheckExact(obj)){
  //Avoid expensive calls when I am sure the attribute
  //does not exist
  ret = NULL;
  }
  else{
  ret = PyObject_GetAttrString(obj, __array_priority__);

 ( I think I found about 7 spots )

 If the problem is the exception construction, then maybe this would
 work about as well?

 if (PyObject_HasAttrString(obj, __array_priority__) {
 ret = PyObject_GetAttrString(obj, __array_priority__);
 } else {
 ret = NULL;
 }

 If so then it would be an easier and more reliable way to accomplish this.

 I also noticed (not as bad in my case) that calls to PyObject_GetBuffer
 also resulted in Python errors being set thus unnecessarily slower code.

 With this change, something like this,
  for i in xrange(100):
  if a[1]  35.0:
  pass

 went down from 0.8 seconds to 0.38 seconds.

 Huh, why is PyObject_GetBuffer even getting called in this case?

 A bogus test like this,
 for i in xrange(100):
  a = array([1., 2., 3.])

 went down from 8.5 seconds to 2.5 seconds.

 I can see why we'd call PyObject_GetBuffer in this case, but not why
 it would take 2/3rds of the total run-time...

 - The core of my problems I think boil down to things like this
 s = a[0]
 assigning a float64 into s as opposed to a native float ?
 Is there any way to hack code to change it to extract a native float
 instead ? (probably crazy talk, but I thought I'd ask :) ).
 I'd prefer to not use s = a.item(0) because I would have to change too
 much code and it is not even that much faster. For example,
  for i in xrange(100):
  if a.item(1)  35.0:
  pass
 is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)

 I'm confused here -- first you say that your problems would be fixed
 if a[0] gave you a native float, but then you say that a.item(0)
 (which is basically a[0] that gives a native float) is still too slow?
 (OTOH at 40% speedup is pretty good, even if it is just a
 microbenchmark :-).) Array scalars are definitely pretty slow:

 In [9]: timeit a[0]
 100 loops, best of 3: 151 ns per loop

 In [10]: timeit a.item(0)
 1000 loops, best of 3: 169 ns per loop

 In [11]: timeit a[0]  35.0
 100 loops, best of 3: 989 ns per loop

 In [12]: timeit a.item(0)  35.0
 100 loops, best of 3: 233 ns per loop

 It is probably possible to make numpy scalars faster... I'm not even
 sure why they go through the ufunc machinery, like Travis said, since
 they don't even follow the ufunc rules:

 In [3]: np.array(2) * [1, 2, 3]  # 0-dim array coerces and broadcasts
 Out[3]: array([2, 4, 6])

 In [4]: np.array(2)[()] * [1, 2, 3]  # scalar acts like python integer
 Out[4]: [1, 2, 3, 1, 2, 3]

I thought it still behaves like a numpy animal

 np.array(-2)[()] ** [1, 2, 3]
array([-2,  4, -8])
 np.array(-2)[()] ** 0.5
nan

 np.array(-2).item() ** [1, 2, 3]
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: unsupported operand type(s) for ** or pow(): 'int' and 'list'
 np.array(-2).item() ** 0.5
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: negative number cannot be raised to a fractional power


 np.array(0)[()] ** (-1)
inf
 np.array(0).item() ** (-1)
Traceback (most recent call last):
  File stdin, line 1, in module
ZeroDivisionError: 0.0 cannot be raised to a negative power

and similar

I often try to avoid python scalars to avoid surprising behavior,
and try to work defensively or fixed bugs by switching to np.power(...)
(for example in the distributions).

Josef


 But you may want to experiment a bit more to make sure this is
 actually the problem. IME guesses about speed problems are almost
 always wrong (even when I take this rule into account and only guess

[Numpy-discussion] scalars and strange casting

2012-12-03 Thread josef . pktd
A followup on the previous thread on scalar speed.

operations with numpy scalars

I can *maybe* understand this

 np.array(2)[()] *  [0.5, 1]
[0.5, 1, 0.5, 1]

but don't understand this

 np.array(2.+0.1j)[()] * [0.5, 1]
__main__:1: ComplexWarning: Casting complex values to real discards
the imaginary part
[0.5, 1, 0.5, 1]


The difference in behavior compared to the other operators, +,-, /,**,
 looks, at least, like an inconsistency to me.


Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32
Type help, copyright, credits or license for more information.
 import numpy as np
 np.array(2.+0.1j)[()] * [0.5, 1]
__main__:1: ComplexWarning: Casting complex values to real discards
the imaginary part
[0.5, 1, 0.5, 1]
 np.array(2.+0.1j)[()] ** [0.5, 1]
array([ 1.41465516+0.0353443j,  2.+0.1j  ])
 np.array(2.+0.1j)[()] + [0.5, 1]
array([ 2.5+0.1j,  3.0+0.1j])
 np.array(2.+0.1j)[()] / [0.5, 1]
array([ 4.+0.2j,  2.+0.1j])


 np.array(2)[()] *  [0.5, 1]
[0.5, 1, 0.5, 1]
 np.array(2)[()] /  [0.5, 1]
array([ 4.,  2.])
 np.array(2)[()] **  [0.5, 1]
array([ 1.41421356,  2.])
 np.array(2)[()] -  [0.5, 1]
array([ 1.5,  1. ])
 np.__version__
'1.5.1'

or
 np.array(-2.+0.1j)[()] * [0.5, 1]
[]
 np.multiply(np.array(-2.+0.1j)[()], [0.5, 1])
array([-1.+0.05j, -2.+0.1j ])

 np.array([-2.+0.1j])[0] * [0.5, 1]
[]
 np.multiply(np.array([-2.+0.1j])[0], [0.5, 1])
array([-1.+0.05j, -2.+0.1j ])

Josef
defensive programming = don't use python, use numpy arrays,
or at least remember which kind of animals you have
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

2012-12-03 Thread Raul Cota
Thanks Christoph.

It seemed to work. Will do profile runs today/tomorrow and see what come 
out.


Raul



On 02/12/2012 7:33 PM, Christoph Gohlke wrote:
 On 12/2/2012 5:28 PM, Raul Cota wrote:
 Hello,

 First a quick summary of my problem and at the end I include the basic
 changes I am suggesting to the source (they may benefit others)

 I am ages behind in times and I am still using Numeric in Python 2.2.3.
 The main reason why it has taken so long to upgrade is because NumPy
 kills performance on several of my tests.

 I am sorry if this topic has been discussed before. I tried parsing the
 mailing list and also google and all I found were comments related to
 the fact that such is life when you use NumPy for small arrays.

 In my case I have several thousands of lines of code where data
 structures rely heavily on Numeric arrays but it is unpredictable if the
 problem at hand will result in large or small arrays. Furthermore, once
 the vectorized operations complete, the values could be assigned into
 scalars and just do simple math or loops. I am fairly sure the core of
 my problems is that the 'float64' objects start propagating all over the
 program data structures (not in arrays) and they are considerably slower
 for just about everything when compared to the native python float.

 Conclusion, it is not practical for me to do a massive re-structuring of
 code to improve speed on simple things like a[0]  4 (assuming a is
 an array) which is about 10 times slower than b  4 (assuming b is a
 float)


 I finally decided to track down the problem and I started by getting
 Python 2.6 from source and profiling it in one of my cases. By far the
 biggest bottleneck came out to be PyString_FromFormatV which is a
 function to assemble a string for a Python error caused by a failure to
 find an attribute when multiarray calls PyObject_GetAttrString. This
 function seems to get called way too often from NumPy. The real
 bottleneck of trying to find the attribute when it does not exist is not
 that it fails to find it, but that it builds a string to set a Python
 error. In other words, something as simple as a[0]  3.5 internally
 result in a call to set a python error .

 I downloaded NumPy code (for Python 2.6) and tracked down all the calls
 like this,

 ret = PyObject_GetAttrString(obj, __array_priority__);

 and changed to
if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
PyTuple_CheckExact(obj) ||
PyFloat_CheckExact(obj) ||
PyInt_CheckExact(obj) ||
PyString_CheckExact(obj) ||
PyUnicode_CheckExact(obj)){
//Avoid expensive calls when I am sure the attribute
//does not exist
ret = NULL;
}
else{
ret = PyObject_GetAttrString(obj, __array_priority__);



 ( I think I found about 7 spots )


 I also noticed (not as bad in my case) that calls to PyObject_GetBuffer
 also resulted in Python errors being set thus unnecessarily slower code.


 With this change, something like this,
for i in xrange(100):
if a[1]  35.0:
pass

 went down from 0.8 seconds to 0.38 seconds.

 A bogus test like this,
 for i in xrange(100):
a = array([1., 2., 3.])

 went down from 8.5 seconds to 2.5 seconds.



 Altogether, these simple changes got me half way to the speed I used to
 get in Numeric and I could not see any slow down in any of my cases that
 benefit from heavy array manipulation. I am out of ideas on how to
 improve further though.

 Few questions:
 - Is there any interest for me to provide the exact details of the code
 I changed ?

 - I managed to compile NumPy through setup.py but I am not sure how to
 force it to generate pdb files from my Visual Studio Compiler. I need
 the pdb files such that I can run my profiler on NumPy. Anybody has any
 experience with this ? (Visual Studio)

 Change the compiler and linker flags in
 Python\Lib\distutils\msvc9compiler.py to:

 self.compile_options = ['/nologo', '/Ox', '/MD', '/W3', '/DNDEBUG', '/Zi']
 self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:YES', '/DEBUG']

 Then rebuild numpy.

 Christoph



 - The core of my problems I think boil down to things like this
 s = a[0]
 assigning a float64 into s as opposed to a native float ?
 Is there any way to hack code to change it to extract a native float
 instead ? (probably crazy talk, but I thought I'd ask :) ).
 I'd prefer to not use s = a.item(0) because I would have to change too
 much code and it is not even that much faster. For example,
for i in xrange(100):
if a.item(1)  35.0:
pass
 is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)


 I apologize again if this topic has already been discussed.


 Regards,

 Raul


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 

Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

2012-12-03 Thread Raul Cota
On 02/12/2012 8:31 PM, Travis Oliphant wrote:
 Raul,

 This is *fantastic work*. While many optimizations were done 6 years ago 
 as people started to convert their code, that kind of report has trailed off 
 in the last few years.   I have not seen this kind of speed-comparison for 
 some time --- but I think it's definitely beneficial.

I'll clean up a bit as a Macro and comment.


 NumPy still has quite a bit that can be optimized.   I think your example is 
 really great.Perhaps it's worth making a C-API macro out of the short-cut 
 to the attribute string so it can be used by others.It would be 
 interesting to see where your other slow-downs are. I would be interested 
 to see if the slow-math of float64 is hurting you.It would be possible, 
 for example, to do a simple subclass of the ndarray that overloads 
 a[integer] to be the same as array.item(integer).  The latter syntax 
 returns python objects (i.e. floats) instead of array scalars.

 Also, it would not be too difficult to add fast-math paths for int64, 
 float32, and float64 scalars (so they don't go through ufuncs but do 
 scalar-math like the float and int objects in Python.

Thanks. I'll dig a bit more into the code.



 A related thing we've been working on lately which might help you is Numba 
 which might help speed up functions that have code like:  a[0]  4 :  
 http://numba.pydata.org.

 Numba will translate the expression a[0]  4 to a machine-code address-lookup 
 and math operation which is *much* faster when a is a NumPy array.
 Presently this requires you to wrap your function call in a decorator:

 from numba import autojit

 @autojit
 def function_to_speed_up(...):
   pass

 In the near future (2-4 weeks), numba will grow the experimental ability to 
 basically replace all your function calls with @autojit versions in a Python 
 function.I would love to see something like this work:

 python -m numba filename.py

 To get an effective autojit on all the filename.py functions (and optionally 
 on all python modules it imports).The autojit works out of the box today 
 --- you can get Numba from PyPI (or inside of the completely free Anaconda 
 CE) to try it out.

This looks very interesting. Will check it out.


 Best,

 -Travis




 On Dec 2, 2012, at 7:28 PM, Raul Cota wrote:

 Hello,

 First a quick summary of my problem and at the end I include the basic
 changes I am suggesting to the source (they may benefit others)

 I am ages behind in times and I am still using Numeric in Python 2.2.3.
 The main reason why it has taken so long to upgrade is because NumPy
 kills performance on several of my tests.

 I am sorry if this topic has been discussed before. I tried parsing the
 mailing list and also google and all I found were comments related to
 the fact that such is life when you use NumPy for small arrays.

 In my case I have several thousands of lines of code where data
 structures rely heavily on Numeric arrays but it is unpredictable if the
 problem at hand will result in large or small arrays. Furthermore, once
 the vectorized operations complete, the values could be assigned into
 scalars and just do simple math or loops. I am fairly sure the core of
 my problems is that the 'float64' objects start propagating all over the
 program data structures (not in arrays) and they are considerably slower
 for just about everything when compared to the native python float.

 Conclusion, it is not practical for me to do a massive re-structuring of
 code to improve speed on simple things like a[0]  4 (assuming a is
 an array) which is about 10 times slower than b  4 (assuming b is a
 float)


 I finally decided to track down the problem and I started by getting
 Python 2.6 from source and profiling it in one of my cases. By far the
 biggest bottleneck came out to be PyString_FromFormatV which is a
 function to assemble a string for a Python error caused by a failure to
 find an attribute when multiarray calls PyObject_GetAttrString. This
 function seems to get called way too often from NumPy. The real
 bottleneck of trying to find the attribute when it does not exist is not
 that it fails to find it, but that it builds a string to set a Python
 error. In other words, something as simple as a[0]  3.5 internally
 result in a call to set a python error .

 I downloaded NumPy code (for Python 2.6) and tracked down all the calls
 like this,

   ret = PyObject_GetAttrString(obj, __array_priority__);

 and changed to
  if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
  PyTuple_CheckExact(obj) ||
  PyFloat_CheckExact(obj) ||
  PyInt_CheckExact(obj) ||
  PyString_CheckExact(obj) ||
  PyUnicode_CheckExact(obj)){
  //Avoid expensive calls when I am sure the attribute
  //does not exist
  ret = NULL;
  }
  else{
  ret = PyObject_GetAttrString(obj, __array_priority__);



 ( I think I found about 7 spots )


 I also noticed 

Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

2012-12-03 Thread Raul Cota
On 03/12/2012 4:14 AM, Nathaniel Smith wrote:
 On Mon, Dec 3, 2012 at 1:28 AM, Raul Cota r...@virtualmaterials.com wrote:
 I finally decided to track down the problem and I started by getting
 Python 2.6 from source and profiling it in one of my cases. By far the
 biggest bottleneck came out to be PyString_FromFormatV which is a
 function to assemble a string for a Python error caused by a failure to
 find an attribute when multiarray calls PyObject_GetAttrString. This
 function seems to get called way too often from NumPy. The real
 bottleneck of trying to find the attribute when it does not exist is not
 that it fails to find it, but that it builds a string to set a Python
 error. In other words, something as simple as a[0]  3.5 internally
 result in a call to set a python error .

 I downloaded NumPy code (for Python 2.6) and tracked down all the calls
 like this,

ret = PyObject_GetAttrString(obj, __array_priority__);

 and changed to
   if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
   PyTuple_CheckExact(obj) ||
   PyFloat_CheckExact(obj) ||
   PyInt_CheckExact(obj) ||
   PyString_CheckExact(obj) ||
   PyUnicode_CheckExact(obj)){
   //Avoid expensive calls when I am sure the attribute
   //does not exist
   ret = NULL;
   }
   else{
   ret = PyObject_GetAttrString(obj, __array_priority__);

 ( I think I found about 7 spots )
 If the problem is the exception construction, then maybe this would
 work about as well?

 if (PyObject_HasAttrString(obj, __array_priority__) {
  ret = PyObject_GetAttrString(obj, __array_priority__);
 } else {
  ret = NULL;
 }

 If so then it would be an easier and more reliable way to accomplish this.

I did think of that one but at least in Python 2.6 the implementation is 
just a wrapper to PyObject_GetAttrSting that clears the error


PyObject_HasAttrString(PyObject *v, const char *name)
{
 PyObject *res = PyObject_GetAttrString(v, name);
 if (res != NULL) {
 Py_DECREF(res);
 return 1;
 }
 PyErr_Clear();
 return 0;
}


so it is just as bad when it fails and a waste when it succeeds (it will 
end up finding it twice).
In my opinion, Python's source code should offer a version of 
PyObject_GetAttrString that does not raise an error but that is a 
completely different topic.


 I also noticed (not as bad in my case) that calls to PyObject_GetBuffer
 also resulted in Python errors being set thus unnecessarily slower code.

 With this change, something like this,
   for i in xrange(100):
   if a[1]  35.0:
   pass

 went down from 0.8 seconds to 0.38 seconds.
 Huh, why is PyObject_GetBuffer even getting called in this case?

Sorry for being misleading in an already long and confusing email.
PyObject_GetBuffer
is not getting called doing an if call. This call showed up in my 
profiler as a time consuming task that raised python errors 
unnecessarily (not nearly as bad as often as PyObject_GetAttrString ) 
but since I was already there I decided to look into it as well.


The point I was trying to make was that I did both changes (avoiding 
PyObject_GetBuffer, PyObject_GetAttrString) when I came up with the times.


 A bogus test like this,
 for i in xrange(100):
   a = array([1., 2., 3.])

 went down from 8.5 seconds to 2.5 seconds.
 I can see why we'd call PyObject_GetBuffer in this case, but not why
 it would take 2/3rds of the total run-time...

Same scenario. This total time includes both changes (avoiding 
PyObject_GetBuffer, PyObject_GetAttrString).
If my memory helps, I believe PyObject_GetBuffer gets called once for 
every 9 times of a call to PyObject_GetAttrString in this scenario.


 - The core of my problems I think boil down to things like this
 s = a[0]
 assigning a float64 into s as opposed to a native float ?
 Is there any way to hack code to change it to extract a native float
 instead ? (probably crazy talk, but I thought I'd ask :) ).
 I'd prefer to not use s = a.item(0) because I would have to change too
 much code and it is not even that much faster. For example,
   for i in xrange(100):
   if a.item(1)  35.0:
   pass
 is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)
 I'm confused here -- first you say that your problems would be fixed
 if a[0] gave you a native float, but then you say that a.item(0)
 (which is basically a[0] that gives a native float) is still too slow?

Don't get me wrong. I am confused too when it gets beyond my suggested 
changes :) . My theory for saying that a.item(1) is not the same to 
a[1] returning a float was that perhaps the overhead of the dot operator 
is too big.
At the end of the day, I do want to profile NumPy and find out if there 
is anything I can do to speed things up.

To bring things more into context, I don't really care to speed up a 
bogus loop with if statements.
My bottom line is,
- I am 

Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

2012-12-03 Thread Chris Barker - NOAA Federal
Raul,

Thanks for doing this work -- both the profiling and actual
suggestions for how to improve the code -- whoo hoo!

In general, it seem that numpy performance for scalars and very small
arrays (i.e (2,), (3,) maybe (3,3), the kind of thing that you'd use
to hold a coordinate point or the like, not small as in fits in
cache) is pretty slow. In principle, a basic array scalar operation
could be as fast as a numpy native numeric type, and it would be great
is small array operations were, too.

It may be that the route to those performance improvements is
special-case code, which is ugly, but I think could really be worth it
for the common types and operations.

I'm really out of my depth for suggesting (or contributing) actual
soluitons, but +1 for the idea!

-Chris

NOTE: Here's a example of what I'm talking about -- say you are
scaling an (x,y) point by a (s_x, s_y) scale factor:

def numpy_version(point, scale):
return point * scale


def tuple_version(point, scale):
return (point[0] * scale[0], point[1] * scale[1])


In [36]: point_arr, sca
scale  scale_arr

In [36]: point_arr, scale_arr
Out[36]: (array([ 3.,  5.]), array([ 2.,  3.]))

In [37]: timeit tuple_version(point, scale)
100 loops, best of 3: 397 ns per loop

In [38]: timeit numpy_version(point_arr, scale_arr)
10 loops, best of 3: 2.32 us per loop

It would be great if numpy could get closer to tuple performance for
this sor tof thing...


-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Apparently Non-deterministic behaviour of complex-array instantiation values

2012-12-03 Thread Pauli Virtanen
03.12.2012 22:10, Karl Kappler kirjoitti:
[clip]
 I.e.. the imaginary part is initialized to a different value. From
 reading up on forums I think I understand that when an array is
 allocated without specific values, it will be given random values which
 are very small, ie. ~1e-316 or so. But it would seem that sometimes
 initallization is done to a finite quantity. I know I can try to
 initialize the array using np.zeros() instead of np.ndarray(), but it is
 the principle I am concerned about.

The memory is not initialized in any way [*] if you get the array from
np.empty(..) or np.ndarray(...). It contains whatever that happens to be
at that location. It just happens that typical memory content when
viewed in floating point often looks like that.

[*] Except that the OS zeroes new memory pages given to the process.
Processes however reuse the pages they are given.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

2012-12-03 Thread Raul Cota
Chris,

thanks for the feedback,

fyi,
the minor changes I talked about have different performance enhancements 
depending on scenario,

e.g,

1) Array * Array
point = array( [2.0, 3.0])
scale = array( [2.4, 0.9] )

retVal = point * scale
#The line above runs 1.1 times faster with my new code (but it runs 3 
times faster in Numeric in Python 2.2)
#i.e. pretty meaningless but still far from old Numeric

2) Array * Tuple (item by item)
point = array( [2.0, 3.0])
scale =  (2.4, 0.9 )

retVal = point[0]  scale[0], point[1]  scale[1]
#The line above runs 1.8 times faster with my new code (but it runs 6.8 
times faster in Numeric in Python 2.2)
#i.e. pretty decent speed up but quite far from old Numeric


I am not saying that I would ever do something exactly like (2) in my 
code nor am I saying that the changes in NumPy Vs Numeric are not 
beneficial. My point is that performance in small size problems is 
fairly far from what it used to be in Numeric particularly when dealing 
with scalars and it is problematic at least to me.


I am currently looking around to see if there are practical ways to 
speed things up without slowing anything else down. Will keep you posted.


regards,

Raul


On 03/12/2012 12:49 PM, Chris Barker - NOAA Federal wrote:
 Raul,

 Thanks for doing this work -- both the profiling and actual
 suggestions for how to improve the code -- whoo hoo!

 In general, it seem that numpy performance for scalars and very small
 arrays (i.e (2,), (3,) maybe (3,3), the kind of thing that you'd use
 to hold a coordinate point or the like, not small as in fits in
 cache) is pretty slow. In principle, a basic array scalar operation
 could be as fast as a numpy native numeric type, and it would be great
 is small array operations were, too.

 It may be that the route to those performance improvements is
 special-case code, which is ugly, but I think could really be worth it
 for the common types and operations.

 I'm really out of my depth for suggesting (or contributing) actual
 soluitons, but +1 for the idea!

 -Chris

 NOTE: Here's a example of what I'm talking about -- say you are
 scaling an (x,y) point by a (s_x, s_y) scale factor:

 def numpy_version(point, scale):
  return point * scale


 def tuple_version(point, scale):
  return (point[0] * scale[0], point[1] * scale[1])


 In [36]: point_arr, sca
 scale  scale_arr

 In [36]: point_arr, scale_arr
 Out[36]: (array([ 3.,  5.]), array([ 2.,  3.]))

 In [37]: timeit tuple_version(point, scale)
 100 loops, best of 3: 397 ns per loop

 In [38]: timeit numpy_version(point_arr, scale_arr)
 10 loops, best of 3: 2.32 us per loop

 It would be great if numpy could get closer to tuple performance for
 this sor tof thing...


 -Chris



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Weird Travis-CI bugs in the release 1.7.x branch

2012-12-03 Thread Ondřej Čertík
Hi,

I started to work on the release again and noticed weird failures at Travis-CI:

https://github.com/numpy/numpy/pull/2782

The first commit (8a18fc7) should not trigger this failure:

==
FAIL: test_iterator.test_iter_array_cast
--
Traceback (most recent call last):
  File 
/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/nose/case.py,
line 197, in runTest
self.test(*self.arg)
  File 
/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/core/tests/test_iterator.py,
line 836, in test_iter_array_cast
assert_equal(i.operands[0].strides, (-96,8,-32))
  File 
/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/testing/utils.py,
line 252, in assert_equal
assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k,err_msg), verbose)
  File 
/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/testing/utils.py,
line 314, in assert_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
item=0

 ACTUAL: 96
 DESIRED: -96

So I pushed a whitespace commit into the PR (516b478) yet it has the
same failure. So it's there, it's
not some random fluke at Travis. I created this testing PR:

https://github.com/numpy/numpy/pull/2783

to try to nail it down. But I can't see what could have caused this,
because the release branch was passing all tests
last time I worked on it.

Any ideas?

Btw, I managed to reproduce the SPARC64 bug:

https://github.com/numpy/numpy/issues/2668

so that's good. Now I just need to debug it.

Ondrej

P.S. My thesis was finally approved by the grad school today,
doing some final changes took more time than expected, but
I think that I am done now.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Weird Travis-CI bugs in the release 1.7.x branch

2012-12-03 Thread Nathaniel Smith
On 4 Dec 2012 02:27, Ondřej Čertík ondrej.cer...@gmail.com wrote:

 Hi,

 I started to work on the release again and noticed weird failures at
Travis-CI:
[…]
   File
/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/core/tests/test_iterator.py,

The problem is that Travis started installing numpy in all python
virtualenvs by default, and our Travis build script just runs setup.py
install, which is too dumb to notice that there is a numpy already
installed and just overwrites it. The file mentioned above doesn't even
exist in 1.7, it's left over from the 1.6 install.

I did a PR to fix this in master a few days ago, you want to back port
that. (Sorry for lack of link, I'm on my phone.)

 P.S. My thesis was finally approved by the grad school today,
 doing some final changes took more time than expected, but
 I think that I am done now.

Congratulations Dr. Čertík!

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion