Re: [Numpy-discussion] Status of NumPy and Python 3.3
On Mon, Jul 30, 2012 at 5:00 PM, Ronan Lamy ronan.l...@gmail.com wrote: Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit : On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy ronan.l...@gmail.com wrote: Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit : Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit : Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit : Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good! And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following: b'\x01XXX'[0:1] b'\xbb' The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue: a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M' Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible. I'll open an issue on GitHub ASAP. https://github.com/numpy/numpy/issues/370 Thanks Ronan, nice work! Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.) Pauli found out how to fix the code, so I'll try to send a PR tonight. So this PR is now in and the issue is fixed. As far as swapping the unicode issues, I finally understand what is going on and I posted my current understanding into the Python tracker issue (http://bugs.python.org/issue15540) which was recently created for this same issue: http://bugs.python.org/msg167280 but it was determined that it is not a bug in Python so it is closed now. Finally, I have submitted a reworked version of my patch here: https://github.com/numpy/numpy/pull/372 It implements things in a clean way. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] 2 greatest values, in a 3-d array, along one axis
Hello everyone, I'm trying to determine the 2 greatest values, in a 3-d array, along one axis. Here is an approach: # -- # procedure to determine greatest 2 values for 3rd dimension of 3-d array ... import numpy, numpy.ma xcnt, ycnt, zcnt = 2,3,4 # actual case is (1024, 1024, 8) p0 = numpy.empty ((xcnt,ycnt,zcnt)) for z in range (zcnt) : p0[:,:,z] = z*z zaxis = 2# max values to be determined for 3rd axis p0max = numpy.max (p0, axis=zaxis) # max values for zaxis maxindices = numpy.argmax (p0, axis=zaxis)# indices of max values p1 = p0.copy()# work array to scan for 2nd highest values j, i = numpy.meshgrid (numpy.arange (ycnt), numpy.arange (xcnt)) p1[i,j,maxindices] = numpy.NaN# flag all max values p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide all max values p1max = numpy.max (p1, axis=zaxis) # 2nd highest values for zaxis # additional code to analyze p0max and p1max goes here # -- I would appreciate feedback on a simpler approach -- e.g., one that does not require masked arrays and or use of magic values like NaN. Thanks, -- jv ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2 greatest values, in a 3-d array, along one axis
Here goes a 1D simple implementation. It shouldn't be difficult to generalize to more dimensions, as all the functions support axis argument: a=np.array([1, 2, 3, 5, 2]) a.max() # This is the maximum value 5 mask=np.zeros_like(a) mask[np.argmax(a)]=1 a=np.ma.masked_array(a, mask=mask) a.max() # Second maximum value 3 I am using a masked array, so the structure of the array remains (ie, you can still use it in multi-dimensional arrays). I could have deleted de value, but then that wouldn't be useful for your case. On Fri, Aug 3, 2012 at 4:18 PM, Jim Vickroy jim.vick...@noaa.gov wrote: Hello everyone, I'm trying to determine the 2 greatest values, in a 3-d array, along one axis. Here is an approach: # -- # procedure to determine greatest 2 values for 3rd dimension of 3-d array ... import numpy, numpy.ma xcnt, ycnt, zcnt = 2,3,4 # actual case is (1024, 1024, 8) p0 = numpy.empty ((xcnt,ycnt,zcnt)) for z in range (zcnt) : p0[:,:,z] = z*z zaxis = 2# max values to be determined for 3rd axis p0max = numpy.max (p0, axis=zaxis) # max values for zaxis maxindices = numpy.argmax (p0, axis=zaxis)# indices of max values p1 = p0.copy()# work array to scan for 2nd highest values j, i = numpy.meshgrid (numpy.arange (ycnt), numpy.arange (xcnt)) p1[i,j,maxindices] = numpy.NaN# flag all max values p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide all max values p1max = numpy.max (p1, axis=zaxis) # 2nd highest values for zaxis # additional code to analyze p0max and p1max goes here # -- I would appreciate feedback on a simpler approach -- e.g., one that does not require masked arrays and or use of magic values like NaN. Thanks, -- jv ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2 greatest values, in a 3-d array, along one axis
Here is the 3D implementation: http://pastebin.com/ERVLWhbS I was only able to do it using a double nested loop, but I am sure someone more clever than me can do a slicing trick to overcome it. Otherwise, I hope this is fast enough for your purpose. David. On Fri, Aug 3, 2012 at 4:41 PM, Daπid davidmen...@gmail.com wrote: Here goes a 1D simple implementation. It shouldn't be difficult to generalize to more dimensions, as all the functions support axis argument: a=np.array([1, 2, 3, 5, 2]) a.max() # This is the maximum value 5 mask=np.zeros_like(a) mask[np.argmax(a)]=1 a=np.ma.masked_array(a, mask=mask) a.max() # Second maximum value 3 I am using a masked array, so the structure of the array remains (ie, you can still use it in multi-dimensional arrays). I could have deleted de value, but then that wouldn't be useful for your case. On Fri, Aug 3, 2012 at 4:18 PM, Jim Vickroy jim.vick...@noaa.gov wrote: Hello everyone, I'm trying to determine the 2 greatest values, in a 3-d array, along one axis. Here is an approach: # -- # procedure to determine greatest 2 values for 3rd dimension of 3-d array ... import numpy, numpy.ma xcnt, ycnt, zcnt = 2,3,4 # actual case is (1024, 1024, 8) p0 = numpy.empty ((xcnt,ycnt,zcnt)) for z in range (zcnt) : p0[:,:,z] = z*z zaxis = 2# max values to be determined for 3rd axis p0max = numpy.max (p0, axis=zaxis) # max values for zaxis maxindices = numpy.argmax (p0, axis=zaxis)# indices of max values p1 = p0.copy()# work array to scan for 2nd highest values j, i = numpy.meshgrid (numpy.arange (ycnt), numpy.arange (xcnt)) p1[i,j,maxindices] = numpy.NaN# flag all max values p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide all max values p1max = numpy.max (p1, axis=zaxis) # 2nd highest values for zaxis # additional code to analyze p0max and p1max goes here # -- I would appreciate feedback on a simpler approach -- e.g., one that does not require masked arrays and or use of magic values like NaN. Thanks, -- jv ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2 greatest values, in a 3-d array, along one axis
On 3 August 2012 11:18, Jim Vickroy jim.vick...@noaa.gov wrote: Hello everyone, I'm trying to determine the 2 greatest values, in a 3-d array, along one axis. Here is an approach: # -- # procedure to determine greatest 2 values for 3rd dimension of 3-d array ... import numpy, numpy.ma xcnt, ycnt, zcnt = 2,3,4 # actual case is (1024, 1024, 8) p0 = numpy.empty ((xcnt,ycnt,zcnt)) for z in range (zcnt) : p0[:,:,z] = z*z zaxis = 2# max values to be determined for 3rd axis p0max = numpy.max (p0, axis=zaxis) # max values for zaxis maxindices = numpy.argmax (p0, axis=zaxis)# indices of max values p1 = p0.copy()# work array to scan for 2nd highest values j, i = numpy.meshgrid (numpy.arange (ycnt), numpy.arange (xcnt)) p1[i,j,maxindices] = numpy.NaN# flag all max values p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide all max values p1max = numpy.max (p1, axis=zaxis) # 2nd highest values for zaxis # additional code to analyze p0max and p1max goes here # -- I would appreciate feedback on a simpler approach -- e.g., one that does not require masked arrays and or use of magic values like NaN. Thanks, -- jv Here's a way that only uses argsort and fancy indexing: a = np.random.randint(10, size=(3,3,3)) print a [[[0 3 8] [4 2 8] [8 6 3]] [[0 6 7] [0 3 9] [0 9 1]] [[7 9 7] [5 2 9] [9 3 3]]] am = a.argsort(axis=2) maxs = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None], am[:,:,-1]] print maxs [[8 8 8] [7 9 9] [9 9 9]] seconds = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None], am[:,:,-2]] print seconds [[3 4 6] [6 3 1] [7 5 3]] And to double check: i, j = 0, 1 l = a[i, j,:] print l [4 2 8] print np.max(a[i,j,:]), maxs[i,j] 8 8 print l[np.argsort(l)][-2], second[i,j] 4 4 Good luck. Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2 greatest values, in a 3-d array, along one axis
Thanks for each of the improved solutions. The one using argsort took a little while for me to understand. I have a long way to go to fully utilize fancy indexing! -- jv On 8/3/2012 10:02 AM, Angus McMorland wrote: On 3 August 2012 11:18, Jim Vickroy jim.vick...@noaa.gov mailto:jim.vick...@noaa.gov wrote: Hello everyone, I'm trying to determine the 2 greatest values, in a 3-d array, along one axis. Here is an approach: # -- # procedure to determine greatest 2 values for 3rd dimension of 3-d array ... import numpy, numpy.ma http://numpy.ma xcnt, ycnt, zcnt = 2,3,4 # actual case is (1024, 1024, 8) p0 = numpy.empty ((xcnt,ycnt,zcnt)) for z in range (zcnt) : p0[:,:,z] = z*z zaxis = 2 # max values to be determined for 3rd axis p0max = numpy.max (p0, axis=zaxis) # max values for zaxis maxindices = numpy.argmax (p0, axis=zaxis)# indices of max values p1 = p0.copy() # work array to scan for 2nd highest values j, i = numpy.meshgrid (numpy.arange (ycnt), numpy.arange (xcnt)) p1[i,j,maxindices] = numpy.NaN # flag all max values p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide all max values p1max = numpy.max (p1, axis=zaxis) # 2nd highest values for zaxis # additional code to analyze p0max and p1max goes here # -- I would appreciate feedback on a simpler approach -- e.g., one that does not require masked arrays and or use of magic values like NaN. Thanks, -- jv Here's a way that only uses argsort and fancy indexing: a = np.random.randint(10, size=(3,3,3)) print a [[[0 3 8] [4 2 8] [8 6 3]] [[0 6 7] [0 3 9] [0 9 1]] [[7 9 7] [5 2 9] [9 3 3]]] am = a.argsort(axis=2) maxs = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None], am[:,:,-1]] print maxs [[8 8 8] [7 9 9] [9 9 9]] seconds = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None], am[:,:,-2]] print seconds [[3 4 6] [6 3 1] [7 5 3]] And to double check: i, j = 0, 1 l = a[i, j,:] print l [4 2 8] print np.max(a[i,j,:]), maxs[i,j] 8 8 print l[np.argsort(l)][-2], second[i,j] 4 4 Good luck. Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Licensing question
On 08/02/2012 10:44 PM, Damon McDougall wrote: Hi, I have a question about the licence for NumPy's codebase. I am currently writing a library and I'd like to release under some BSD-type licence. Unfortunately, my choice to link against MIT's FFTW library (released under the GPL) means that, in its current state, this is not possible. I'm an avid NumPy user and thought to myself that, since NumPy's licence is BSD, I'd be able to use some of the source code (with due credit, of course) instead of FFTW. Is this possible? I mean, can I redistribute *PART* of NumPy's codebase? Namely, the fftpack.c file? I was under the impression that I could only redistribute BSD source code as a whole and then I read the licence more carefully and it states that I can modify the source to suit my needs. I consider 'redistributing a single file and ignoring the other files' as a 'modification' under the BSD definition, but maybe I'm thinking too wishfully here. Any information on this matter would be greatly appreciated since I am a total code licence noob. Thank you. P.S. Yes, I know I could just release under the GPL, but I don't want to turn people off of packaging my work into a useful product licensed under BSD, or even make money from it. Not related to licensing, but here's another port of FFTPACK to C by Martin Reinecke, licensed under BSD. The README has the links to the original Fortran sources that this is based on. https://github.com/dagss/libfftpack Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2 greatest values, in a 3-d array, along one axis
Hi, On Fri, Aug 3, 2012 at 7:02 PM, Angus McMorland amcm...@gmail.com wrote: On 3 August 2012 11:18, Jim Vickroy jim.vick...@noaa.gov wrote: Hello everyone, I'm trying to determine the 2 greatest values, in a 3-d array, along one axis. Here is an approach: # -- # procedure to determine greatest 2 values for 3rd dimension of 3-d array ... import numpy, numpy.ma xcnt, ycnt, zcnt = 2,3,4 # actual case is (1024, 1024, 8) p0 = numpy.empty ((xcnt,ycnt,zcnt)) for z in range (zcnt) : p0[:,:,z] = z*z zaxis = 2# max values to be determined for 3rd axis p0max = numpy.max (p0, axis=zaxis) # max values for zaxis maxindices = numpy.argmax (p0, axis=zaxis)# indices of max values p1 = p0.copy()# work array to scan for 2nd highest values j, i = numpy.meshgrid (numpy.arange (ycnt), numpy.arange (xcnt)) p1[i,j,maxindices] = numpy.NaN# flag all max values p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide all max values p1max = numpy.max (p1, axis=zaxis) # 2nd highest values for zaxis # additional code to analyze p0max and p1max goes here # -- I would appreciate feedback on a simpler approach -- e.g., one that does not require masked arrays and or use of magic values like NaN. Thanks, -- jv Here's a way that only uses argsort and fancy indexing: a = np.random.randint(10, size=(3,3,3)) print a [[[0 3 8] [4 2 8] [8 6 3]] [[0 6 7] [0 3 9] [0 9 1]] [[7 9 7] [5 2 9] [9 3 3]]] am = a.argsort(axis=2) maxs = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None], am[:,:,-1]] print maxs [[8 8 8] [7 9 9] [9 9 9]] seconds = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None], am[:,:,-2]] print seconds [[3 4 6] [6 3 1] [7 5 3]] And to double check: i, j = 0, 1 l = a[i, j,:] print l [4 2 8] print np.max(a[i,j,:]), maxs[i,j] 8 8 print l[np.argsort(l)][-2], second[i,j] 4 4 Good luck. Here the np.indicies function may help a little bit, like: In []: a= randint(10, size= (3, 2, 4)) In []: a Out[]: array([[[1, 9, 6, 6], [0, 3, 4, 2]], [[4, 2, 4, 4], [5, 9, 4, 4]], [[6, 1, 4, 3], [5, 4, 5, 5]]]) In []: ndx= indices(a.shape) In []: # largest In []: a[a.argsort(0), ndx[1], ndx[2]][-1] Out[]: array([[6, 9, 6, 6], [5, 9, 5, 5]]) In []: # second largest In []: a[a.argsort(0), ndx[1], ndx[2]][-2] Out[]: array([[4, 2, 4, 4], [5, 4, 4, 4]]) My 2 cents, -eat Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Unicode revisited
Hey all, Ondrej has been working hard with feedback from many others on improving Unicode support in NumPy (especially for Python 3.3). Looking at what Python has done in Python 3.3 (PEP 393) and chatting on the Python issue tracker with the author of that PEP has made me wonder if we aren't doing the wrong thing in NumPy quite often. Basically, NumPy only supports UTF-32 in it's Unicode representation. All bytes in NumPy arrays should be either UTF-32LE or UTF-32BE.This is all pretty easy to understand as long as you stick with NumPy arrays only. The difficulty starts when you start to interact with the unicode array scalar (which is the same data-structure exactly as a Python unicode object with a different type-name --- numpy.unicode_).However, I overlooked the encoding argument to the standard unicode constructor which might have simplified what we are doing.If I understand things correctly, now, all we need to do is to decode the UTF-32LE or UTF-32BE raw bytes in the array (depending on the dtype) into a unicode object. This is easily accomplished with numpy.unicode_(bytes object, 'utf_32_be' or 'utf_32_le').There is also an encoding equivalent to go from the Python unicode object to the bytes representation in the NumPy array. I think this is what we should be doing in most of the places and it should considerably simplify the Unicode code in NumPy --- eliminating possibly the ucsnarrow.c file. Am I missing something? Thanks, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unicode revisited
On Fri, Aug 3, 2012 at 7:03 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, Ondrej has been working hard with feedback from many others on improving Unicode support in NumPy (especially for Python 3.3). Looking at what Python has done in Python 3.3 (PEP 393) and chatting on the Python issue tracker with the author of that PEP has made me wonder if we aren't doing the wrong thing in NumPy quite often. Basically, NumPy only supports UTF-32 in it's Unicode representation. All bytes in NumPy arrays should be either UTF-32LE or UTF-32BE.This is all pretty easy to understand as long as you stick with NumPy arrays only. The difficulty starts when you start to interact with the unicode array scalar (which is the same data-structure exactly as a Python unicode object with a different type-name --- numpy.unicode_).However, I overlooked the encoding argument to the standard unicode constructor which might have simplified what we are doing.If I understand things correctly, now, all we need to do is to decode the UTF-32LE or UTF-32BE raw bytes in the array (depending on the dtype) into a unicode object. This is easily accomplished with numpy.unicode_(bytes object, 'utf_32_be' or 'utf_32_le').There is also an encoding equivalent to go from the Python unicode object to the bytes representation in the NumPy array. I think this is what we should be doing in most of the places and it should considerably simplify the Unicode code in NumPy --- eliminating possibly the ucsnarrow.c file. Am I missing something? I can't comment on the rest, but I'd be happy to see the end of the ucsnarrow.c file. It needs more work to be properly generalized and if there is a way to avoid that, so much the better. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unicode revisited
On Fri, Aug 3, 2012 at 6:03 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, Ondrej has been working hard with feedback from many others on improving Unicode support in NumPy (especially for Python 3.3). Looking at what Python has done in Python 3.3 (PEP 393) and chatting on the Python issue tracker with the author of that PEP has made me wonder if we aren't doing the wrong thing in NumPy quite often. Basically, NumPy only supports UTF-32 in it's Unicode representation. All bytes in NumPy arrays should be either UTF-32LE or UTF-32BE.This is all pretty easy to understand as long as you stick with NumPy arrays only. The difficulty starts when you start to interact with the unicode array scalar (which is the same data-structure exactly as a Python unicode object with a different type-name --- numpy.unicode_).However, I overlooked the encoding argument to the standard unicode constructor which might have simplified what we are doing.If I understand things correctly, now, all we need to do is to decode the UTF-32LE or UTF-32BE raw bytes in the array (depending on the dtype) into a unicode object. This is easily accomplished with numpy.unicode_(bytes object, 'utf_32_be' or 'utf_32_le').There is also an encoding equivalent to go from the Python unicode object to the bytes representation in the NumPy array. I think this is what we should be doing in most of the places and it should considerably simplify the Unicode code in NumPy --- eliminating possibly the ucsnarrow.c file. Am I missing something? I guess we'll try and see. :) Would it make sense to merge https://github.com/numpy/numpy/pull/372 now, because it will make NumPy working in Python 3.3 (and it seems to me that the implementation is reasonable)? And then I'll work on trying to use your new approach, both for 2.7 and 3.2 and 3.3. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion