Re: [Numpy-discussion] bug in numpy.mean() ?
On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. I get the same result: In [1]: import numpy In [2]: data = numpy.load('data.npy') In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.min() Out[5]: 3040.498 In [6]: numpy.version.version Out[6]: '2.0.0.dev-433b02a' This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 32 vs. 64-bit process doesn't make a difference. The data matrix doesn't look too strange when I view it as an image -- all pretty smooth variation around the (min, max) range. But maybe it's still somehow floating-point pathological? This is fun too: In [12]: data.mean() Out[12]: 3067.024383998 In [13]: (data/3000).mean()*3000 Out[13]: 3020.807437501 In [15]: (data/2).mean()*2 Out[15]: 3067.024383998 In [16]: (data/200).mean()*200 Out[16]: 3013.67541 Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
You have a million 32-bit floating point numbers that are in the thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype: a.mean(dtype=np.float32) # default and lacks precision 3067.024383998 a.mean(dtype=np.float64) 3045.747251076416 a.mean(dtype=np.float128) 3045.7472510764160156 b=a.astype(np.float128) b.mean() 3045.7472510764160156 Otherwise you are left to using some alternative approach to calculate the mean. Bruce Interesting -- I knew that float64 accumulators were used with integer arrays, and I had just assumed that 64-bit or higher accumulators would be used with floating-point arrays too, instead of the array's dtype. This is actually quite a bit of a gotcha for floating-point imaging-type tasks -- good to know! Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Addressing arrays
a[x,y,:] Read the slicing part of the tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial (section 1.6) And the documentation: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html On Jan 30, 2012, at 10:25 AM, Ted To wrote: Hi, Is there some straightforward way to access an array by values across a subset of its dimensions? For example, if I have a three dimensional array a=(x,y,z), can I look at the values of z given particular values for x and y? Thanks, Ted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Addressing arrays
Ted, can you clarify what you're asking for? Maybe give a trivial example of an array and the desired output? I'm pretty sure this is a slicing question though: If I have a three dimensional array a=(x,y,z), can I look at the values of z given particular values for x and y? Given that element values are scalars in this case, and indices are (x,y,z) triples, it seems likely that looking for values of z given an (x,y) pair is an slicing-by-index question, no? For indexing-by-value, fancy indexing with boolean masks is usually the way to go... again, Ted (or Chao), if you can describe your indexing needs in a bit more detail, it's often easy to find a compact slicing and/or fancy-indexing strategy that works well and reasonably efficiently. Zach On Jan 30, 2012, at 10:33 AM, Chao YUE wrote: he is not asking for slicing. he is asking for how to index array by element value but not element index. 2012/1/30 Zachary Pincus zachary.pin...@yale.edu a[x,y,:] Read the slicing part of the tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial (section 1.6) And the documentation: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html On Jan 30, 2012, at 10:25 AM, Ted To wrote: Hi, Is there some straightforward way to access an array by values across a subset of its dimensions? For example, if I have a three dimensional array a=(x,y,z), can I look at the values of z given particular values for x and y? Thanks, Ted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Addressing arrays
Thanks! That works great if I only want to search over one index but I can't quite figure out what to do with more than a single index. So suppose I have a labeled, multidimensional array with labels 'month', 'year' and 'quantity'. a[['month','year']] gives me an array of indices but a[['month','year']]==(1,1960) produces False. I'm sure I simply don't know the proper syntax and I apologize for that -- I'm kind of new to numpy. I think that your best bet is to form the boolean masks independently and then logical-and them together: mask = (a['month'] == 1) (a['year'] == 1960) jan_60 = a[mask] Someone might have more insight here. Though I should note that if you have large data and are doing lots of queries like this, a more database-ish approach might be better. Something like sqlite's python bindings, or PyTables. Alternately, if your data are all time-series based things, PANDAS might be worth looking at. But the above approach should be just fine for non-huge datasets... Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] all elements equal
How about the following? exact: numpy.all(a == a[0]) inexact: numpy.allclose(a, a[0]) On Mar 5, 2012, at 2:19 PM, Keith Goodman wrote: On Mon, Mar 5, 2012 at 11:14 AM, Neal Becker ndbeck...@gmail.com wrote: What is a simple, efficient way to determine if all elements in an array (in my case, 1D) are equal? How about close? For the exactly equal case, how about: I[1] a = np.array([1,1,1,1]) I[2] np.unique(a).size O[2] 1# All equal I[3] a = np.array([1,1,1,2]) I[4] np.unique(a).size O[4] 2 # All not equal ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Viewing a float64 array with a float32 array
Hi, Is it possible to have a view of a float64 array that is itself float32? So that: A = np.arange(5, dtype='d') A.view(dtype='f') would return a size 5 float32 array looking at A's data? I think not. The memory layout of a 32-bit IEEE float is not a subset of that of a 64-bit float -- see e.g. the first table in: http://steve.hollasch.net/cgindex/coding/ieeefloat.html This would work for int8/int16 or other int types (so long as the number doesn't exceed the range of the smaller type), but AFAIK not floats. Note how the subset relationship works for the int8/int16 case, but not float32/float64: str(numpy.array(100,dtype=numpy.int8).data) 'd' str(numpy.array(100,dtype=numpy.int16).data) 'd\x00' str(numpy.array(-100,dtype=numpy.int16).data) '\x9c\xff' str(numpy.array(-100,dtype=numpy.int8).data) '\x9c' str(numpy.array(100,dtype=numpy.float32).data) '\x00\x00\xc8B' str(numpy.array(100,dtype=numpy.float64).data) '\x00\x00\x00\x00\x00\x00Y@' Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Viewing a float64 array with a float32 array
I'm not sure if you are referring to rounding errors but that's OK with me. I was thinking something along the lines of changing how numpy looks at the data of A's view by modifying say the stride attribute, etc. Yes, so was I. As you can see in my example with ints below, you could skip every other byte of the int16 array to look at an int8 array. This is because the memory layout of an int8 is a proper subset of int16. (Modulo endian-concerns of course...) But looking at the link I provided, you can see that taking the first 32 bits of an float64 (or the last 32 or any 32) does not yield something that can be interpreted as a float32. So there's no subset relationship, and you can't do the strides-trick. To be extra clear, look at the memory layout of a float that's expressible without rounding error: str(numpy.array(128,dtype=numpy.float64).data) '\x00\x00\x00\x00\x00\x00`@' str(numpy.array(128,dtype=numpy.float32).data) '\x00\x00\x00C' There's obviously no stride trick whereby one will look like the other. Zach On Wed, Mar 21, 2012, at 11:19, Zachary Pincus wrote: Hi, Is it possible to have a view of a float64 array that is itself float32? So that: A = np.arange(5, dtype='d') A.view(dtype='f') would return a size 5 float32 array looking at A's data? I think not. The memory layout of a 32-bit IEEE float is not a subset of that of a 64-bit float -- see e.g. the first table in: http://steve.hollasch.net/cgindex/coding/ieeefloat.html This would work for int8/int16 or other int types (so long as the number doesn't exceed the range of the smaller type), but AFAIK not floats. Note how the subset relationship works for the int8/int16 case, but not float32/float64: str(numpy.array(100,dtype=numpy.int8).data) 'd' str(numpy.array(100,dtype=numpy.int16).data) 'd\x00' str(numpy.array(-100,dtype=numpy.int16).data) '\x9c\xff' str(numpy.array(-100,dtype=numpy.int8).data) '\x9c' str(numpy.array(100,dtype=numpy.float32).data) '\x00\x00\xc8B' str(numpy.array(100,dtype=numpy.float64).data) '\x00\x00\x00\x00\x00\x00Y@' Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
That all sounds like no option -- sad. Cython is no solution cause, all I want is to leave Python Syntax in favor for strong OOP design patterns. What about ctypes? For straight numerical work where sometimes all one needs to hand across the python-to-C/C++/Fortran boundary is a pointer to some memory and the size of the memory region. So I will often just write a very thin C interface to whatever I'm using and then call into it with ctypes. So you could just design your algorithm in C++ with all the strong OOP design patterns you want, and then just write a minimal C interface on top with one or two functions that receive pointers to memory. Then allocate numpy arrays in python with whatever memory layout you need, and use the a ctypes attribute to find the pointer data etc. that you need to pass over. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slices from an index list
Here's one way you could do it: In [43]: indices = [0,1,2,3,5,7,8,9,10,12,13,14] In [44]: jumps = where(diff(indices) != 1)[0] + 1 In [45]: starts = hstack((0, jumps)) In [46]: ends = hstack((jumps, len(indices))) In [47]: slices = [slice(start, end) for start, end in zip(starts, ends)] In [48]: slices Out[48]: [slice(0, 4, None), slice(4, 5, None), slice(5, 9, None), slice(9, 12, None)] If you're only going to use the slices to divide up the list, you could use numpy.split and skip creating the slice objects: indices = [0,1,2,3,5,7,8,9,10,12,13,14] jumps = numpy.where(numpy.diff(indices) != 1)[0] + 1 numpy.split(indices, jumps) giving: [array([0, 1, 2, 3]), array([5]), array([ 7, 8, 9, 10]), array([12, 13, 14])] Zach (btw, Warren, the method to calculate the jumps is cute. I'll have to remember that.) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Fancy-indexing reorders output in corner cases?
Hello all, The below seems to be a bug, but perhaps it's unavoidably part of the indexing mechanism? It's easiest to show via example... note that using [0,1] to pull two columns out of the array gives the same shape as using :2 in the simple case, but when there's additional slicing happening, the shapes get transposed or something. In [2]: numpy.version.version # latest git version Out[2]: '1.7.0.dev-3d4' In [3]: d = numpy.empty((10, 9, 8, 7)) In [4]: d[:,:,:,[0,1]].shape Out[4]: (10, 9, 8, 2) In [5]: d[:,:,:,:2].shape Out[5]: (10, 9, 8, 2) In [6]: d[:,0,:,[0,1]].shape Out[6]: (2, 10, 8) In [7]: d[:,0,:,:2].shape Out[7]: (10, 8, 2) In [8]: d[0,:,:,[0,1]].shape Out[8]: (2, 9, 8) In [9]: d[0,:,:,:2].shape Out[9]: (9, 8, 2) Oddly, this error can appear/disappear depending on the position of the other axis sliced: In [14]: d = numpy.empty((10, 9, 8)) In [15]: d[:,:,[0,1]].shape Out[15]: (10, 9, 2) In [16]: d[:,0,[0,1]].shape Out[16]: (10, 2) In [17]: d[0,:,[0,1]].shape Out[17]: (2, 9) This cannot be the expected behavior, right? Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?
On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus zachary.pin...@yale.edu wrote: The below seems to be a bug, but perhaps it's unavoidably part of the indexing mechanism? It's easiest to show via example... note that using [0,1] to pull two columns out of the array gives the same shape as using :2 in the simple case, but when there's additional slicing happening, the shapes get transposed or something. When fancy indexing and slicing is mixed, the resulting shape is essentially unpredictable. Aah, right -- this does come up on the list not infrequently, doesn't it. I'd always thought it was more exotic usages that raised these issues. Good to know. The correct way to do it is to only use fancy indexing, i.e. generate the indices of the sliced dimension as well. Excellent -- thanks! Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x
There is a fine line here. We do need to make people clean up lax code in order to improve numpy, but hopefully we can keep the cleanups reasonable. Oh agreed. Somehow, though, I was surprised by this, even though I keep tabs on the numpy lists -- at no point did it become clear that big changes in how arrays get constructed and typecast are ahead that may require code fixes. That was my main point, but probably a PEBCAK issue more than anything. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x
On Tue, Jun 5, 2012 at 8:41 PM, Zachary Pincus zachary.pin...@yale.edu wrote: There is a fine line here. We do need to make people clean up lax code in order to improve numpy, but hopefully we can keep the cleanups reasonable. Oh agreed. Somehow, though, I was surprised by this, even though I keep tabs on the numpy lists -- at no point did it become clear that big changes in how arrays get constructed and typecast are ahead that may require code fixes. That was my main point, but probably a PEBCAK issue more than anything. It was fairly extensively discussed when introduced, http://thread.gmane.org/gmane.comp.python.numeric.general/44206, and again at some later point. Those are the not-yet-finalized changes in 1.7; Zachary (I think) is talking about problems upgrading from ~1.5 to 1.6. Yes, unless I'm wrong I experienced these problems from 1.5.something to 1.6.1. I didn't take notes as it was in the middle of a deadline-crunch so I just fixed the code and moved on (long, stupid story about why the upgrade before a deadline...). It's just that the issues mentioned above seem to have hit me too and I wanted to mention that. But unhelpfully, I think, without code, and now I've hijacked this thread! Sorry. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303)
On 10.10.2012 15:42, Nathaniel Smith wrote: This PR submitted a few months ago adds a substantial new API to numpy, so it'd be great to get more review. No-one's replied yet, though... Any thoughts, anyone? Is it useful, could it be better...? Fast neighbor search is what scipy.spatial.cKDTree is designed for. There is an brand new version in SciPy master. It's not neighbor *search*, it's applying a function over an (arbitrary chosen and weighted) moving neighborhood in an nd array. It would be useful for the author of the PR to post a detailed comparison of this functionality with scipy.ndimage.generic_filter, which appears to have very similar functionality. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303)
Hi Tim, It's tricky to find functionality sometimes because as you've seen numpy and especially scipy are spread over very diverse domains with very diverse terminology! Usually someone on one or the other of the lists can help folks find some functionality, if not by name than by description... In any case, though, I hope you'll keep your code around and accessible to people in standalone form. Despite being a bit slower than the ndimage stuff, it seems like you've got a better set of boundary conditions, and some other niceties that may appeal to others too. Zach On Sun, Oct 14, 2012 at 8:24 PM, Zachary Pincus zachary.pin...@yale.edu wrote: It would be useful for the author of the PR to post a detailed comparison of this functionality with scipy.ndimage.generic_filter, which appears to have very similar functionality. I'll be durned. I created neighbor because I didn't find what I wanted, and to find now that I just didn't look in the right place is well ... Let's just say that I went for a long run last night. Searching for ndimage, I found that is has been around a long, long time. First in numarray, then moved to scipy. Really I could only nitpick about minor differences - kinda like a primary political campaign. On the face of it though, generic_filter looks better. First off it is written in C so likely will be faster and more efficient memory use. I didn't look at optimizing neighbor at all and at least my part of it is pure Python. Of course for all of the small differences, I like my choices better. :-) I would like to make a mild suggestion. Emphasis on mild. Maybe ndimage, in all or in part, should be brought into (back into?) Numpy and renamed. About the PR. Given that the neighbor functionality exists already, I will close the PR later today. Move along, nothing to see here... Side note: I wrote arraypad with the future idea that it would become easyish to include that functionality in other places, for example in neighbor. A Don't Repeat Yourself idea. Previously I had only seen Fortran pad capabilities in some of the Fast Fourier Transform functions. The generic_filter function includes several padding functions - written in C. This means that if arraypad needs be more efficient we have C code to base a better arraypad. Another side node: The measurements functions in ndimage are called zonal functions in the GIS field. Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Manipulate neighboring points in 2D array
I have 2D array, let's say: `np.random.random((100,100))` and I want to do simple manipulation on each point neighbors, like divide their values by 3. So for each array value, x, and it neighbors n: n n nn/3 n/3 n/3 n x n - n/3 x n/3 n n nn/3 n/3 n/3 I searched a bit, and found about scipy ndimage filters, but if I'm not wrong, there is no such function. Of course me being wrong is quite possible, as I did not comprehend whole ndimage module, but I tried generic filter for example and browser other functions. Is there better way to make above manipulation, instead using for loop over every array element? I am not sure I understand the above manipulation... typically neighborhood operators take an array element and the its neighborhood and then give a single output that becomes the value of the new array at that point. That is, a 3x3 neighborhood filter would act as a function F(R^{3x3}) - R. It appears that what you're talking about above is a function F(R^{3x3}) - R^{3x3}. But how is this output to map onto the original array positions? Is the function to be applied to non-overlapping neighborhoods? Is it to be applied to all neighborhoods and then summed at each position to give the output array? If you can describe the problem in a bit more detail, with perhaps some sample input and output for what you desire (and/or with some pseudocode describing how it would work in a looping-over-each-element approach), I'm sure folks can figure out how best to do this in numpy. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Manipulate neighboring points in 2D array
You are right. I needed generic filter - to update current point, and not the neighbors as I wrote. Initial code is slow loop over 2D python lists, which I'm trying to convert to numpy and make it useful. In that loop there is inner loop for calculating neighbors properties, which confused me yesterday, and mislead to search for something that probably does not make sense. It's clear now :) It's possible that some generic filter operations can be cast in terms of pure-numpy operations, or composed out of existing filters available in scipy.ndimage. If you can describe the filter operation you wish to perform, perhaps someone can make some suggestions. Alternately, scipy.ndimage.generic_filter can take an arbitrary python function. Though it's not really fast... Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] sampling arrays
Thank you, No if the location ( space time or depth) of choice is not available then the function I was looking for should give an interpolated value at the choice. with best regards, Sudheer scipy.ndimage.map_coordinates may be exactly what you want. - Original Message - From: Henry Gomersall h...@cantab.net To: Discussion of Numerical Python numpy-discussion@scipy.org Cc: Sent: Sunday, 16 June 2013 2:49 PM Subject: Re: [Numpy-discussion] sampling arrays On Sun, 2013-06-16 at 14:48 +0800, Sudheer Joseph wrote: Is it possible to sample a 4D array of numpy at given dimensions with out writing loops? ie a smart python way? It's not clear how what you want to do is different from simply indexing the array...? Henry ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
And as you pointed out, most of the time for non-trivial datasets the numpy operations will be faster. (I'm daunted by the notion of trying to do linear algebra on lists of tuples, assuming that's the relevant set of operations given the comparison to the matrix class.) Note the important and pretty common exception of building up a list one element (or row of elements) at a time. Here, python lists usually rule, unless the final size is known in advance. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] str/bytes object from arr.data?
As str objects are supposed to be immutable, I think anything official that makes a string from a numpy array is supposed to copy the data. But I think you can use ctypes to wrap a pointer and a length as a python string. Zach On Sep 27, 2010, at 8:28 AM, Francesc Alted wrote: Hi, Anybody knows a way to get a str object (bytes for Python = 2.6) out of a buffer object (i.e. the .data attribute of ndarrays) without copying data? I need this for avoid creating a new function that deals with buffer objects (instead of reusing the one for str/byte objects that I already have). So, a matter of laziness :-) Thanks, -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] point line distance
Here's a good list of basic geometry algorithms: http://www.softsurfer.com/algorithms.htm Zach On Oct 6, 2010, at 5:08 PM, Renato Fabbri wrote: supose you have a line defined by two points and a point. you want the distance what are easiest possibilities? i am doing it, but its nasty'n ugly -- GNU/Linux User #479299 skype: fabbri.renato ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] function name as parameter
I'm trying to write an implementation of the amoeba function from numerical recipes and need to be able to pass a function name and parameter list to be called from within the amoeba function. Simply passing the name as a string doesn't work since python doesn't know it is a function and throws a typeerror. Is there something similar to IDL's 'call_function' routine in python/numpy or a pythonic/numpy means of passing function names? Just pass the function itself! For example: def foo(): print 6 def call_function_repeatedly(func, count): for i in range(count): func() call_function_repeatedly(foo, 2) # calls foo twice bar = foo bar() # still calls foo... we've just assigned the function to a different name In python, functions (and classes, and everything else) are first- class objects and can be assigned to variables, passed around, etc, etc, just as anything else. However, note that scipy.optimize.fmin implements the Nelder-Mead simplex algorithm, which is (I think) the same as the amoeba optimizer. Also you might be interested in the openopt package, which implements more optimizers a bit more consistently than scipy.optimize. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] function name as parameter
Hi Robert, so in a big data analysis framework, that is essentially written in C ++, exposed to python with SWIG, plus dedicated python modules, the user performs an analysis choosing some given modules by name,as in : myOpt=foo my_analyse.perform(use_optimizer=myOpt) The attribute use_optimizer is then checked to perform the right calls/instantiations of python but also C++ objects. and the actual crunching number is in the C++ part. But then I realize that I need to tweak this optimizer's state, and the optimizer object is accessible from a library pyOpt that has been swigified from a C++ library. Then I access the object by calling optObject = eval(pyOpt.%s(some_args)%myOpt) where myOpt would be foo in this particular analysis. This is because what the attribute use_optimizer expects is also the object name in the library, of course. It goes without saying that I could do : if myOpt==foo: optObject=pyOpt.foo(some_args) else: and if you guys tell me it is way safer, I will do that instead of the use of eval that I liked because of the compactness. Well, optObject=getattr(pyOpt, myOpt) is a bit nicer and marginally more likely to return sane error messages than optObject=eval(pyOpt. %s%myOpt). Then you could do result=optObject(some_args)... But why not have the user just pass in the relevant optObject from the pyOpt namespace (or some restricted namespace that just has the relevant functions exposed)? E.g. my_analysis.perform(optimizer=pyOpt.Amoeba) rather than my_analysis.perform(optimizer='Amoeba') This lets users do introspection on the pyOpt namespace to see what functions they can choose from, which is rather friendlier in an interactive environment. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] short circuit != ?
This is silly: the structure of the python language prevents meaningful short-circuiting in the case of np.any(a!=b) While it's true that np.any itself may short-circuit, the 'a!=b' statement itself will be evaluated in its entirety before the result (a boolean array) is passed to np.any. This is just how python (like most other languages) works. This means that the big-O complexity isn't reduced any: all elements of a and b have to be examined. If any short-circuits, then at least all the elements don't need to be examined twice! If performance actually matters here (unlikely?), a quick bit of cython code could do this easily enough in the simple case where a and b are the same shape (so no broadcasting required). Note that the below hard-codes the dtype also. import cython cimport numpy import numpy TYPE_T = numpy.int64_t TYPE = numpy.int64 @cython.boundscheck(False) def equal(a, b): cdef: numpy.ndarray[TYPE, ndim=1, negative_indices=False] a_flat = a.astype(TYPE).flatten() numpy.ndarray[TYPE, ndim=1, negative_indices=False] b_flat = b.astype(TYPE).flatten() unsigned int i, l assert a_flat.shape[0] == b_flat.shape[0] l = a_flat.shape[0] for i in range(l): if a_flat[i] != b_flat[i]: return False return True Zach On Oct 27, 2010, at 9:44 AM, Skipper Seabold wrote: On Wed, Oct 27, 2010 at 9:37 AM, Johann Cohen-Tanugi co...@lpta.in2p3.fr wrote: On 10/27/2010 03:31 PM, Neal Becker wrote: Johann Cohen-Tanugi wrote: how about np.any(a!=b) ?? On 10/27/2010 12:25 PM, Neal Becker wrote: Is there a way to get a short circuit != ? That is, compare 2 arrays but stop as soon as the first element comparison fails? I'm assuming that np.all (a != b) will _not_ do this, but will first compare all elements. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I don't think that will do short ciruit, will it? I think that will compare each element, returning a bool array, then short-circuit eval that bool array. In [3]: a=np.array([2,3, In [4]: b=np.array([2,5, In [5]: np.any(a!=b) Out[5]: True it does not return a bool array, it seems. I do not see how you would broadcast the notion of any along axes maybe? Not definitive by any means, but In [28]: a = np.arange(1,1000) In [29]: b = np.arange(1,1000) In [30]: timeit np.any(a!=b) 10 loops, best of 3: 37.8 ms per loop In [31]: a[0] = 10. In [32]: timeit np.any(a!=b) 10 loops, best of 3: 24.7 ms per loop In [33]: a[0] = 1 In [34]: a[-1] = 1 In [35]: timeit np.any(a!=b) 10 loops, best of 3: 37.7 ms per loop It seems to at least take less time when the difference is at the beginning, though I'm sure there could be exceptions. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Help! I'm having a problem in searching through the *elements* if a 2d array. I have a loop over a numpy array: n,m = G.shape print n,m for i in xrange(n): for j in xrange(m): print type(G), type(G[i,j]), type(float(G[i,j])) g = float(abs(G[i,j])) if g cut: print i,j,G[i,j] However, I'm getting the error message: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() despite the fact that I'm not computing a truth value of an array. I'd like to just compare to G[i,j] or abs(G[i,j]), which should be a *single* value, and *not* an array. I even call 'float' here to make sure that I cast this to a normal python float. But I still get this error message. Which line is raising the error? My guess is it's the only truth-value testing line: if g cut. (I'm not sure what else could error here in that way...) It looks like g is probably a scalar, but your code isn't showing where cut comes from, nor have you printed out it's type... is it an array? Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__
But wouldn't the performance hit only come when I use it in this way? __getattr__ is only called if the named attribute is *not* found (I guess it falls off the end of the case statement, or is the result of the attribute hash table miss). That's why I said that __getattr__ would perhaps work better. So do you want me to try out an implementation and supply a patch? If so, where should I send the patch? Ian Note that there are various extant projects that I think attempt to provide similar functionality to what you're wanting (unless I badly misread your original email, in which case apologies): http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes You might want to check these out (or pitch in with those efforts) before starting up your own variant. (Though if your idea has more modest goals, then it might be a good complement to these more extensive/intrusive solutions.) Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] N dimensional dichotomy optimization
On Nov 23, 2010, at 10:57 AM, Gael Varoquaux wrote: On Tue, Nov 23, 2010 at 04:33:00PM +0100, Sebastian Walter wrote: At first glance it looks as if a relaxation is simply not possible: either there are additional rows or not. But with some technical transformations it is possible to reformulate the problem into a form that allows the relaxation of the integer constraint in a natural way. Maybe this is also possible in your case? Well, given that it is a cross-validation score that I am optimizing, there is not simple algorithm giving this score, so it's not obvious at all that there is a possible relaxation. A road to follow would be to find an oracle giving empirical risk after estimation of the penalized problem, and try to relax this oracle. That's two steps further than I am (I apologize if the above paragraph is incomprehensible, I am getting too much in the technivalities of my problem. Otherwise, well, let me know if you find a working solution ;) Nelder-Mead seems to be working fine, so far. It will take a few weeks (or more) to have a real insight on what works and what doesn't. Jumping in a little late, but it seems that simulated annealing might be a decent method here: take random steps (drawing from a distribution of integer step sizes), reject steps that fall outside the fitting range, and accept steps according to the standard annealing formula. Something with a global optimum but spikes along the way is pretty well-suited to SA in general, and it's also an easy algorithm to make work on a lattice. If you're in high dimensions, there are also bolt- on methods for biasing the steps toward good directions as opposed to just taking isotropic random steps. Again, pretty easy to think of discrete implementations of this... Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Threshold
mask = numpy.zeros(medical_image.shape, dtype=uint16) mask[ numpy.logical_and( medical_image = lower, medical_image = upper)] = 255 Where lower and upper are the threshold bounds. Here I' m marking the array positions where medical_image is between the threshold bounds with 255, where isn' t with 0. The question is: Is there a better way to do that? This will give you a True/False boolean mask: mask = numpy.logical_and( medical_image = lower, medical_image = upper) And this a 0/255 mask: mask = 255*numpy.logical_and( medical_image = lower, medical_image = upper) You can make the code a bit more terse/idiomatic by using the bitwise operators, which do logical operations on boolean arrays: mask = 255*((medical_image = lower) (medical_image = upper)) Though this is a bit annoying as the bitwise ops ( | ^ ~) have higher precedence than the comparison ops ( = =), so you need to parenthesize carefully, as above. Zach On Dec 2, 2010, at 7:35 AM, totonixs...@gmail.com wrote: Hi all, I' m developing a medical software named InVesalius [1], it is a free software. It uses numpy arrays to store the medical images (CT and MRI) and the mask, the mask is used to mark the region of interest and to create 3D surfaces. Those array generally have 512x512 elements. The mask is created based in threshold, with lower and upper bound, this way: mask = numpy.zeros(medical_image.shape, dtype=uint16) mask[ numpy.logical_and( medical_image = lower, medical_image = upper)] = 255 Where lower and upper are the threshold bounds. Here I' m marking the array positions where medical_image is between the threshold bounds with 255, where isn' t with 0. The question is: Is there a better way to do that? Thank! [1] - svn.softwarepublico.gov.br/trac/invesalius ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Arrays with aliased elements?
def repeat(arr, num): arr = numpy.asarray(arr) return numpy.ndarray(arr.shape+(num,), dtype=arr.dtype, buffer=arr, strides=arr.strides+(0,)) There are limits to what these sort of stride tricks can accomplish, but repeating as above, or similar, is feasible. On Jan 1, 2011, at 8:42 PM, Enzo Michelangeli wrote: Is there any way, not involving compilation of C code, to define ndarrays where some rows or columns share the same data buffers? For example, something built by a hypothetical variant of the np.repeat() function, such that, if a = array([2,3]), calling: b = np.aliasedrepeat(x, [1, 2], axis=0) would return in b: array([[2, 3], [2, 3], [2, 3]]) ...with the understanding that the three rows would actually share the same data, so setting e.g.: b[0,1] = 5 ...would change b into: array([[2, 5], [2, 5], [2, 5]]) In other words, something with a behaviour similar to a list of lists: a = [2,3] b = [a,a,a] b [[2, 3], [2, 3], [2, 3]] b[0][1] = 5 b [[2, 5], [2, 5], [2, 5]] This would save memory (and time spent in unnecessary copying) in some applications with large arrays, and would allow to cope with the current inability of weave.blitz to understand broadcasting rules, e.g. for calculating outer products (I mentioned this in a previous thread). Enzo ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Variable in an array name?
Is it possible to use a variable in an array name? I am looping through a bunch of calculations, and need to have each array as a separate entity. I'm pretty new to python and numpy, so forgive my ignorance. I'm sure there is a simple answer, but I can't seem to find it. let's say i have a variable 'i': i = 5 I would like my array to have the name array5 I know how I could do this manually, but not in a loop where i is redefined several times. There are ways to do this, but what you likely actually want is just to put several arrays in a python list and then index into the list, instead of constructing numbered names. e.g.: array_list = [] for whatever: array_list.append(numpy.array(whatever)) for array in array_list: do_something(array) given_array = array_list[i] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Variable in an array name?
Thank you very much for the prompt response. I have already done what you have suggested, but there are a few cases where I do need to have an array named with a variable (looping through large numbers of unrelated files and calculations that need to be dumped into different analyses). It would be extraordinarily helpful if someone could post a solution to this problem, regardless of inefficiency of the method. Thanks a ton for any additional help. You could store arrays associated with string names, or other identifiers, (as opposed to integer indices) in a python dict. Global and local namespaces are also just dicts that you can grab with globals() and locals(), if you really want to look up variable names algorithmically, but I promise you that this is really not what you want to be doing. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NOOB Alert: Looping Through Text Files...
textlist = [test1.txt, test2.txt, test3.txt] for i in textlist: text_file = open(textlist, a) text_file.write(\nI suck at Python and need help) text_file.close() But, this doesn't work. It gives me the error: coercing to Unicode: need string or buffer, list found Yeah, it's probably the wrong list; still, easy enough to answer... You want this: text_file = open(i, a) to open the individual file, as opposed to: text_file = open(textlist, a) which tries to open the whole list of filenames. Python cannot figure out how to turn the list into a single (unicode) filename, hence the error. As for your wanting to write files with new names: for txtfile in txtlist: f = open(txtfile, 'r') data = f.read() f.close() fnew = open(get_new_name(txtfile), 'w') fnew.write(data) fnew.write('\nHelp is on the way.') fnew.close() where get_new_name() or whatever is defined appropriately to transform the old name ('test1.txt', say) into 'test1_appended.txt'... Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] create a numpy array of images
I am using python for a while now and I have a requirement of creating a numpy array of microscopic tiff images ( this data is 3d, meaning there are 100 z slices of 512 X 512 pixels.) How can I create an array of images? It's quite straightforward to create a 3-d array to hold this kind of data: image_block = np.empty((100, 512, 512), dtype=??) now you can load it up by using some lib (PIL, or ???) to load the tif images, and then: for i in images: image_block[i,:,:] = i Notice that since PIL 1.1.6, PIL Image objects support the numpy interface: http://effbot.org/zone/pil-changes-116.htm For even longer than this, PIL has been somewhat broken with regard to 16-bit images (very common in microscopy); you may run into strange byte-ordering issues that scramble the data on reading or writing. Also, PIL's numpy interface is somewhat broken in similar ways. (Numerous people have provided patches to PIL, but these are never incorporated into any releases, as far as I can tell.) So try PIL, but if the images come out all wrong, you might want to check out the scikits.image package, which has hooks for various other image read/write tools. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to do this efficiently?
In a 1-d array, find the first point where all subsequent points have values less than a threshold, T. Maybe something like: last_greater = numpy.arange(arr.shape)[arr = T][-1] first_lower = last_greater + 1 There's probably a better way to do it, without the arange, though... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to do this efficiently?
In a 1-d array, find the first point where all subsequent points have values less than a threshold, T. Maybe something like: last_greater = numpy.arange(arr.shape)[arr = T][-1] first_lower = last_greater + 1 There's probably a better way to do it, without the arange, though... I'm trying to find the first point in a power spectrum such that all subsequent points are below some level. I've started with: db is my power spectrum in dB, It is already reversed. mag = np.maximum.accumulate (db) - db[-1] Now all I need is to find the first point such that mag -50. How to do this efficiently? Right -- that's what I showed above. Find the last point in mag that is = -50, and by definition the next point is the first point such that the remainder of mag is -50. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to do this efficiently?
On Feb 9, 2011, at 10:58 AM, Neal Becker wrote: Zachary Pincus wrote: In a 1-d array, find the first point where all subsequent points have values less than a threshold, T. Maybe something like: last_greater = numpy.arange(arr.shape)[arr = T][-1] first_lower = last_greater + 1 There's probably a better way to do it, without the arange, though... I'm trying to find the first point in a power spectrum such that all subsequent points are below some level. I've started with: db is my power spectrum in dB, It is already reversed. mag = np.maximum.accumulate (db) - db[-1] Now all I need is to find the first point such that mag -50. How to do this efficiently? Right -- that's what I showed above. Find the last point in mag that is = -50, and by definition the next point is the first point such that the remainder of mag is -50. But where is numpy's 'find_first' function? I can't seem to find it so I had to make my own in C++. As before, the line below does what you said you need, though not maximally efficiently. (Try it in an interpreter...) There may be another way in numpy that doesn't rely on constructing the index array, but this is the first thing that came to mind. last_greater = numpy.arange(arr.shape)[arr = T][-1] Let's unpack that dense line a bit: mask = arr = T indices = numpy.arange(arr.shape) above_threshold_indices = indices[mask] last_above_threshold_index = above_threshold_indices[-1] Does this make sense? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to do this efficiently?
As before, the line below does what you said you need, though not maximally efficiently. (Try it in an interpreter...) There may be another way in numpy that doesn't rely on constructing the index array, but this is the first thing that came to mind. last_greater = numpy.arange(arr.shape)[arr = T][-1] Let's unpack that dense line a bit: mask = arr = T indices = numpy.arange(arr.shape) above_threshold_indices = indices[mask] last_above_threshold_index = above_threshold_indices[-1] Does this make sense? This assumes monotonicity. Is that allowed? The twice-stated problem was: In a 1-d array, find the first point where all subsequent points have values less than a threshold, T. So that should do the trick... Though Alan's argmax solution is definitely a better one than indexing an indices array. Same logic and result though, just more compact. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to do this efficiently?
This assumes monotonicity. Is that allowed? The twice-stated problem was: [Note to avert email-miscommunications] BTW, I wasn't trying to snipe at you with that comment, Josef... I just meant to say that this solution solves the problem as Neal posed it, though that might not be the exact problem he has. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to do this efficiently?
In a 1-d array, find the first point where all subsequent points have values less than a threshold. This doesn't imply monotonicity. Suppose with have a sin curve, and I want to find the last trough. Or a business cycle and I want to find the last recession. Unless my english deteriorated recently. Not sure that I follow? I read the statement as: find the smallest non-negative index i such that array[j] t for all i = j len(array) for which the various solutions proposed work, except for the corner case where array.min() = t. I'm not sure where monotonicity comes into play, unless one interprets the all subsequent points clause as some number of subsequent points or something... For those sort of problems, I usually wind up smoothing the array [optional], and then using scipy.ndimage.maximum to calculate peak values within a given window, and then find the points that equal the peak values -- this works pretty robustly for peak/trough detection in arbitrary dimension. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [OT] any image io module that works with python3?
Here's a ctypes interface to FreeImage that I wrote a while back and was since cleaned up (and maintained) by the scikits.image folk: https://github.com/stefanv/scikits.image/blob/master/scikits/image/io/_plugins/freeimage_plugin.py If it doesn't work out of the box on python 3, then it should be pretty simple to fix. Zach On Mar 12, 2011, at 4:40 AM, Christoph Gohlke wrote: On 3/12/2011 1:08 AM, Nadav Horesh wrote: Having numpy, scipy, and matplotlib working reasonably with python3, a major piece of code I miss for a major python3 migration is an image IO. I found that pylab's imread works fine for png image, but I need to read all the other image format as well as png and jpeg output. Any hints (including advices how easyly construct my own module) are appreciated. Nadav. On Windows, PIL (private port at http://www.lfd.uci.edu/~gohlke/pythonlibs/#pil), PythonMagick http://www.imagemagick.org/download/python/, and pygame 1.9.2pre http://www.pygame.org are working reasonably well for image IO. Also the FreeImage library http://freeimage.sourceforge.net/ is easy to use with ctypes http://docs.python.org/py3k/library/ctypes.html. Christoph ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] strange behavior of np.minimum and np.maximum
a, b, c = np.array([10]), np.array([2]), np.array([7]) min_val = np.minimum(a, b, c) min_val array([2]) max_val = np.maximum(a, b, c) max_val array([10]) min_val array([10]) (I'm using numpy 1.4, and I observed the same behavior with numpy 2.0.0.dev8600 on another machine). I'm quite surprised by this behavior (It took me quite a long time to figure out what was happening in a script of mine that wasn't giving what I expected, because of np.maximum changing the output of np.minimum). Is it a bug, or am I missing something? Read the documentation for numpy.minimum and numpy.maximum: they give you element-wise minimum values from two arrays passed as arguments. E.g.: numpy.minimum([1,2,3],[3,2,1]) array([1, 2, 1]) The optional third parameter to numpy.minimum is an out array - an array to place the results into instead of making a new array for that purpose. (This can save time / memory in various cases.) This should therefore be enough to explain the above behavior. (That is, min_val and max_val wind up being just other names for the array 'c', which gets modified in-place by the numpy.minimum and numpy.maximum.) If you want the minimum value of a sequence of arbitrary length, use the python min() function. If you have a numpy array already and you want the minimum (global, or along a particular axis), use the min() method of the array, or numpy.min(arr). Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reading tiff images
On Apr 26, 2011, at 2:31 PM, Daniel Lepage wrote: You need PIL no matter what; scipy.misc.imread, scipy.ndimage.imread, and scikits.image.io.imread all call PIL. scikits.image.io also has a ctypes wrapper for the freeimage library. I prefer these (well, I wrote them), though apparently there are some 64-bit issues (crashes?). I haven't been working on a 64-bit system so I haven't been able to address them, but I will be soon. It's a very thin wrapper around a simple image IO library, so there's lots of room to add and extend as need be... All of the PIL wrappers are kluges around serious flaws in how PIL reads images, particularly non-8-bit images and in particular non- native-endian 16-bit images. Zach Theoretically there's no difference between any of them, although in actuality some use import Image and others use from PIL import Image; one of these may fail depending on how you installed PIL. (I'm not sure which is supposed to be standard - the PIL docs use both interchangeably, and I think the latest version of PIL on pypi sets it up so that both will work). I'd use whichever tool you're already importing - if you're using ndimage anyway, just use ndimage.imread rather than adding more imports. Note that using PIL directly is easy, but does require adding an extra step; OTOH, if you're familiar with PIL, you can use some of its transformations from the start, e.g. def imread(fname, mode='RGBA'): return np.asarray(Image.open(fname).convert(mode)) to ensure that you always get 4-channel images, even for images that were initially RGB or grayscale. HTH, Dan On Tue, Apr 26, 2011 at 2:00 PM, Mathew Yeates mat.yea...@gmail.com wrote: Hi What is current method of using ndiimage on a Tiff file? I've seen different methods using ndimage itself, scipy.misc and Pil. Mathew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast grayscale conversion
You could try: src_mono = src_rgb.astype(float).sum(axis=-1) / 3. But that speed does seem slow. Here are the relevant timings on my machine (a recent MacBook Pro) for a 3.1-megapixel-size array: In [16]: a = numpy.empty((2048, 1536, 3), dtype=numpy.uint8) In [17]: timeit numpy.dot(a.astype(float), numpy.ones(3)/3.) 10 loops, best of 3: 116 ms per loop In [18]: timeit a.astype(float).sum(axis=-1)/3. 10 loops, best of 3: 85.3 ms per loop In [19]: timeit a.astype(float) 10 loops, best of 3: 23.3 ms per loop On Jun 20, 2011, at 4:15 PM, Alex Flint wrote: At the moment I'm using numpy.dot to convert a WxHx3 RGB image to a grayscale image: src_mono = np.dot(src_rgb.astype(np.float), np.ones(3)/3.); This seems quite slow though (several seconds for a 3 megapixel image) - is there a more specialized routine better suited to this? Cheers, Alex ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] poor performance of sum with sub-machine-word integer types
Hello all, As a result of the fast greyscale conversion thread, I noticed an anomaly with numpy.ndararray.sum(): summing along certain axes is much slower with sum() than versus doing it explicitly, but only with integer dtypes and when the size of the dtype is less than the machine word. I checked in 32-bit and 64-bit modes and in both cases only once the dtype got as large as that did the speed difference go away. See below... Is this something to do with numpy or something inexorable about machine / memory architecture? Zach Timings -- 64-bit mode: -- In [2]: i = numpy.ones((1024,1024,4), numpy.int8) In [3]: timeit i.sum(axis=-1) 10 loops, best of 3: 131 ms per loop In [4]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 2.57 ms per loop In [5]: i = numpy.ones((1024,1024,4), numpy.int16) In [6]: timeit i.sum(axis=-1) 10 loops, best of 3: 131 ms per loop In [7]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 4.75 ms per loop In [8]: i = numpy.ones((1024,1024,4), numpy.int32) In [9]: timeit i.sum(axis=-1) 10 loops, best of 3: 131 ms per loop In [10]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 6.37 ms per loop In [11]: i = numpy.ones((1024,1024,4), numpy.int64) In [12]: timeit i.sum(axis=-1) 100 loops, best of 3: 16.6 ms per loop In [13]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 15.1 ms per loop Timings -- 32-bit mode: -- In [2]: i = numpy.ones((1024,1024,4), numpy.int8) In [3]: timeit i.sum(axis=-1) 10 loops, best of 3: 138 ms per loop In [4]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 3.68 ms per loop In [5]: i = numpy.ones((1024,1024,4), numpy.int16) In [6]: timeit i.sum(axis=-1) 10 loops, best of 3: 140 ms per loop In [7]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 4.17 ms per loop In [8]: i = numpy.ones((1024,1024,4), numpy.int32) In [9]: timeit i.sum(axis=-1) 10 loops, best of 3: 22.4 ms per loop In [10]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 12.2 ms per loop In [11]: i = numpy.ones((1024,1024,4), numpy.int64) In [12]: timeit i.sum(axis=-1) 10 loops, best of 3: 29.2 ms per loop In [13]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 10 loops, best of 3: 23.8 ms per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] poor performance of sum with sub-machine-word integer types
On Jun 21, 2011, at 1:16 PM, Charles R Harris wrote: It's because of the type conversion sum uses by default for greater precision. Aah, makes sense. Thanks for the detailed explanations and timings! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparing NumPy/IDL Performance
Hello Keith, While I also echo Johann's points about the arbitrariness and non-utility of benchmarking I'll briefly comment just on just a few tests to help out with getting things into idiomatic python/numpy: Tests 1 and 2 are fairly pointless (empty for loop and empty procedure) that won't actually influence the running time of well-written non-pathological code. Test 3: #Test 3 - Add 20 scalar ints nrep = 200 * scale_factor for i in range(nrep): a = i + 1 well, python looping is slow... one doesn't do such loops in idiomatic code if the underlying intent can be re-cast into array operations in numpy. But here the test is on such a simple operation that it's not clear how to recast in a way that would remain reasonable. Ideally you'd test something like: i = numpy.arange(20) for j in range(scale_factor): a = i + 1 but that sort of changes what the test is testing. Finally, test 21: #Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar for i in range(nrep): b = scipy.ndimage.filters.median_filter(a, size=(5, 5)) timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep) A median filter is definitely NOT a boxcar filter! You want uniform_filter: In [4]: a = numpy.empty((1000,1000)) In [5]: timeit scipy.ndimage.filters.median_filter(a, size=(5, 5)) 10 loops, best of 3: 93.2 ms per loop In [6]: timeit scipy.ndimage.filters.uniform_filter(a, size=(5, 5)) 10 loops, best of 3: 27.7 ms per loop Zach On Sep 26, 2011, at 10:19 AM, Keith Hughitt wrote: Hi all, Myself and several colleagues have recently started work on a Python library for solar physics, in order to provide an alternative to the current mainstay for solar physics, which is written in IDL. One of the first steps we have taken is to create a Python port of a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy. So my question is, does anyone see any places where we are doing things very inefficiently in Python? In order to try and ensure a fair comparison between IDL and Python there are some things (e.g. the style of timing and output) which we have deliberately chosen to do a certain way. In other cases, however, it is likely that we just didn't know a better method. Any feedback or suggestions people have would be greatly appreciated. Unfortunately, due to the proprietary nature of IDL, we cannot share the original version of time_test3, but hopefully the comments in time_test3.py will be clear enough. Thanks! Keith sunpy_time_test3_idl_python_2011-09-26.png___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparing NumPy/IDL Performance
I think the remaining delta between the integer and float boxcar smoothing is that the integer version (test 21) still uses median_filter(), while the float one (test 22) is using uniform_filter(), which is a boxcar. Other than that and the slow roll() implementation in numpy, things look pretty solid, yes? Zach On Sep 29, 2011, at 12:11 PM, Keith Hughitt wrote: Thank you all for the comments and suggestions. First off, I would like to say that I entirely agree with people's suggestions about lack of objectiveness in the test design, and the caveat about optimizing early. The main reason we put together the Python version of the benchmark was as a quick sanity check to make sure that there are no major show-stoppers before we began work on the library. We also wanted to put together something to show other people who are firmly in the IDL camp that this is a viable option. We did in fact put together another short test-suite (test_testr.py time_testr.pro) which consists of operations that would are frequently used by us, but it also is testing a very small portion of the kinds of things our library will eventually do. That said, I made a few small changes to the original benchmark, based on people's feedback, and put together a new plot. The changes made include: 1. Using xrange instead of range 2. Using uniform filter instead of median filter 3. Fixed a typo for tests 2 3 which resulted in slower Python results Again, note that some of the tests are testing non-numpy functionality. Several of the results still stand out, but overall the results are much more reasonable than before. Cheers, Keith time_test3_idl_vs_python.png___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] wanted: decent matplotlib alternative
I keep meaning to use matplotlib as well, but every time I try I also get really turned off by the matlabish interface in the examples. I get that it's a selling point for matlab refugees, but I find it counterintuitive in the same way Christoph seems to. I'm glad to hear the OO interface isn't as clunky as it looks on some of the doc pages, though. This is good news. Can anyone point out any good tutorials/docs on using matplotlib idiomatically via its OO interface? Zach On Oct 13, 2011, at 3:21 PM, Joe Kington wrote: Have a look at Chaco: http://code.enthought.com/chaco/ If you're wanting a more pythonic api, it's a good choice. Personally, I still prefer matplotlib. You don't every need to touch the state machine interface. The OO interface is slighly un-pythonic, but it's hardly clunky. I think you're referring to one of the webpage examples of it which avoids _any_ convenience functions. You can still use the convenience functions without having to rely on the state machine in any way. E.g.: import matplotlib.pyplot as plt fig, axes = plt.subplots(ncols=4) for ax in axes: ax.plot(range(10)) plt.show() All in all, matplotlib deliberately tries to mimic matlab for a lot of the conventions. This is mostly to make it easier to switch if you're already familiar with matlab. To each his own, but for better or worse, matplotlib is the most widely used plotting library for python. It's worth getting a bit more familiar with, if nothing else just to see past some of the rough edges. -Joe On Thu, Oct 13, 2011 at 1:58 PM, Christoph Groth c...@falma.de wrote: Hello, Is it just me who thinks that matplotlib is ugly and a pain to use? So far I haven't found a decent alternative usable from within python. (I haven't tried all the packages out there.) I'm mostly interested in 2d plots. Who is happy enough with a numpy-compatible plotting package to recommend it? Thanks, Christoph A few things I can't stand about matplotlib: * It works as a state machine. (There is an OO-API, too, but it's ugly and cumbersome to use, and most examples use the state machine mode or, even worse, a mixture of OO and global state.) * It uses inches by default. (I propose to switch to nails: 1 nail = 3 digits = 2 1/4 inches = 1/16 yard.) * subplot(211) (ugh!) * Concepts are named in a confusing way. (ax = subplot(112) anyone?) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy nogil API
As an example, it'd be nice to have scipy.ndimage available without the GIL: http://docs.scipy.org/doc/scipy/reference/ndimage.html Now, this *can* easily be done as the core is written in C++. I'm just pointing out that some people may wish more for calling scipy.ndimage inside their prange than for some parts of NumPy. Not exactly to your larger point wrt the GIL, but I *think* some skimage (née scikits.image) folks are trying to rewrite most of ndimage's functionality in cython. I don't know what the status of this effort is though... Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] array copy-to-self and views
Hello folks, I recently was trying to write code to modify an array in-place (so as not to invalidate any references to that array) via the standard python idiom for lists, e.g.: a[:] = numpy.flipud(a) Now, flipud returns a view on 'a', so assigning that to 'a[:]' provides pretty strange results as the buffer that is being read (the view) is simultaneously modified. Here is an example: In [2]: a = numpy.arange(10).reshape((5,2)) In [3]: a Out[3]: array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) In [4]: numpy.flipud(a) Out[4]: array([[8, 9], [6, 7], [4, 5], [2, 3], [0, 1]]) In [5]: a[:] = numpy.flipud(a) In [6]: a Out[6]: array([[8, 9], [6, 7], [4, 5], [6, 7], [8, 9]]) A question, then: Does this represent a bug? Or perhaps there is a better idiom for modifying an array in-place than 'a[:] = ...'? Or is incumbent on the user to ensure that any time an array is directly modified, that the modifying array is not a view of the original array? Thanks for any thoughts, Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] array copy-to-self and views
Zachary Pincus wrote: Hello folks, I recently was trying to write code to modify an array in-place (so as not to invalidate any references to that array) I'm not sure what this means exactly. Say one wants to keep two different variables referencing a single in- memory list, as so: a = [1,2,3] b = a Now, if 'b' and 'a' go to live in different places (different class instances or whatever) but we want 'b' and 'a' to always refer to the same in-memory object, so that 'id(a) == id(b)', we need to make sure to not assign a brand new list to either one. That is, if we do something like 'a = [i + 1 for i in a]' then 'id (a) != id(b)'. However, we can do 'a[:] = [i + 1 for i in a]' to modify a in-place. This is not super-common, but it's also not an uncommon python idiom. I was in my email simply pointing out that naïvely translating that idiom to the numpy case can cause unexpected behavior in the case of views. I think that this is is unquestionably a bug -- isn't the point of views that the user shouldn't need to care if a particular array object is a view or not? Given the lack of methods to query whether an array is a view, or what it might be a view on, this seems like a reasonable perspective... I mean, if certain operations produce completely different results when one of the operands is a view, that *seems* like a bug. It might not be worth fixing, but I can't see how that behavior would be considered a feature. However, I do think there's a legitimate question about whether it would be worth fixing -- there could be a lot of complicated checks to catch these kind of corner cases. via the standard python idiom for lists, e.g.: a[:] = numpy.flipud(a) Now, flipud returns a view on 'a', so assigning that to 'a[:]' provides pretty strange results as the buffer that is being read (the view) is simultaneously modified. yes, weird. So why not just: a = numpy.flipud(a) Since flipud returns a view, the new a will still be using the same data array. Does this satisfy your need above? Nope -- though 'a' and 'numpy.flipud(a)' share the same data, the actual ndarray instances are different. This means that any other references to the 'a' array (made via 'b = a' or whatever) now refer to the old 'a', not the flipped one. The only other option for sharing arrays is to encapsulate them as attributes of *another* object, which itself won't change. That seems a bit clumsy. It's too bad that to do this you need to know that flipud created a view, rather than a copy of the data, as if it were a copy, you would need to do the a[:] trick to make sure a kept the same data, but that's the price we pay for the flexibility and power of numpy -- the alternative is to have EVERYTHING create a copy, but there were be a substantial performance hit for that. Well, Anne's email suggests another alternative -- each time a view is created, keep track of the original array from whence it came, and then only make a copy when collisions like the above would take place. And actually, I suspect that views already need to keep a reference to their original array in order to keep that array from being deleted before the view is. But I don't know the guts of numpy well enough to say for sure. NOTE: the docstring doesn't make it clear that a view is created: help(numpy.flipud) Help on function flipud in module numpy.lib.twodim_base: flipud(m) returns an array with the columns preserved and rows flipped in the up/down direction. Works on the first dimension of m. NOTE2: Maybe these kinds of functions should have an optional flag that specified whether you want a view or a copy -- I'd have expected a copy in this case! Well, it seems like in most cases one does not need to care whether one is looking at a view or an array. The only time that comes to mind is when you're attempting to modify the array in-place, e.g. a[something] = something else Even if the maybe-bug above were easily fixable (again, not sure about that), you might *still* want to be able to figure out if a were a view before such a modification. Whether this needs a runtime 'is_view' method, or just consistent documentation about what returns a view, isn't clear to me. Certainly the latter couldn't hurt. QUESTION: How do you tell if two arrays are views on the same data: is checking if they have the same .base reliable? a = numpy.array((1,2,3,4)) b = a.view() a.base is b.base False No, I guess not. Maybe .base should return self if it's the originator of the data. Is there a reliable way? I usually just test by changing a value in one to see if it changes in the other, but that's one heck of kludge! a.__array_interface__['data'][0] == b.__array_interface__['data'] [0] True seems to work, but that's pretty ugly! Good question. As I mentioned above, I assume that this information is tracked internally to prevent
Re: [Numpy-discussion] array copy-to-self and views
A question, then: Does this represent a bug? Or perhaps there is a better idiom for modifying an array in-place than 'a[:] = ...'? Or is incumbent on the user to ensure that any time an array is directly modified, that the modifying array is not a view of the original array? Yes, it is and has always been incumbent on the user to ensure that any time an array is directly modified in-place that the modifying array is not a view of the original array. Fair enough. Now, how does a user ensure this -- say someone like me, who has been using numpy (et alia) for a couple of years, but clearly not long enough to have an 'intuitive' feel for every time something might be a view (a feeling that must seem quite natural to long-time numpy users, who may have forgotten precisely how long it takes to develop that level of intuition)? Documentation of what returns views helps, for sure. Would any other 'training' mechanisms help? Say a function that (despite Tim's pretty reasonable 'don't do that' warning) will return true when two arrays have overlapping memory? Or an 'inplace_modify' function that takes the time to make that check? Perhaps I'm the first to have views bite me in this precise way. However, if there are common failure-modes with views, I hope it's not too unreasonable to ask about ways that those common problems might be addressed. (Other than just saying train for ten years, and you too will have numpy-fu, my son.) Giving newbies tools to deal with common problems with admittedly dangerous constructs might be useful. Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] linalg.eigh orders eigenvalues/eigenvectors differently than linalg.eig
Hello all, It seems that the 'eigh' routine from numpy.linalg does not follow the same convention as numpy.linalg.eig in terms of the order of the returned eigenvalues. (And thus eigenvectors as well...) Specifically, eig returns eigenvalues in order from largest to smallest, while eigh returns them from smallest to largest. Example: a = numpy.array([[21, 28, 35],[28, 38, 48],[35, 48, 61]]) numpy.linalg.eigh(a) (array([ -1.02825542e-14, 7.04131679e-01, 1.19295868e+02]), array([[ 0.40824829, -0.81314396, -0.41488581], [-0.81649658, -0.12200588, -0.56431188], [ 0.40824829, 0.56913221, -0.71373795]])) numpy.linalg.eig(a) (array([ 1.19295868e+02, 7.04131679e-01, 4.62814557e-15]), array([[-0.41488581, -0.81314396, 0.40824829], [-0.56431188, -0.12200588, -0.81649658], [-0.71373795, 0.56913221, 0.40824829]])) Is this a bug? If it is, though, fixing it now might break code that depends on this 'wrong' order. (This is also present in scipy.linalg.) If not a bug, or not-fixable-now, then at least some documentation as to the convention regarding ordering of eigenvalues is probably worthwhile... Any thoughts? Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Distributing prebuilt numpy and other extensions
Hello folks, I've developed some command-line tools for biologists using python/ numpy and some custom C and Fortran extensions, and I'm trying to figure out how to easily distribute them... For people using linux, I figure a source distribution is no problem at all. (Right?) On the other hand, for Mac users (whose computers by default don't have the dev tools, and even then would need to get a fortran compiler elsewhere) I'd like to figure out something a bit easier. I'd like to somehow provide an installer (double-clickable or python script) that does a version check and then installs an appropriate version of prebuilt binaries for numpy and my C and Fortran extensions. Is this possible within the bounds of the python or numpy distutils? Would setuptools be a better way to go? Preferably it would be a dead easy, one-step thing... Or is this whole idea problematic, and better to stick with source distribution in all cases? Thanks for any advice, Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Greek Letters
I have found that the python 'unicode name' escape sequence, combined with the canonical list of unicode names ( http://unicode.org/Public/ UNIDATA/NamesList.txt ), is a good way of getting the symbols you want and still keeping the python code legible. From the above list, we see that the symbol name we want is GREEK SMALL LETTER CHI, so: chi = u'\N{GREEK SMALL LETTER CHI}' will do the trick. For chi^2, use: chi2 = u'\N{GREEK SMALL LETTER CHI}\N{SUPERSCRIPT TWO}' Note that to print these characters, we usually need to encode them somehow. My terminal supports UTF-8, so the following works for me: import codecs print codecs.encode(chi2, 'utf8') giving (if your mail reader supports utf8 and mine encodes it properly...): χ² Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine On Feb 20, 2007, at 3:56 PM, Mark Janikas wrote: Hello all, I was wondering how I could print the chi-squared symbol in python. I have been looking at the Unicode docs, but I figured I would ask for assistance here while I delve into it. Thanks for any help in advance. Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] what goes wrong with cos(), sin()
Your results are indeed around zero. numpy.allclose(0, 1.22460635382e-016) True It's not exactly zero because floating point math is in general not exact. You'll need to check out a reference about doing floating point operations numerically for more details, but in general you should not expect exact results due to the limited precision of any fixed-width digital representation of floats. A corrolary: in general do not two floating-point values for equality -- use something like numpy.allclose. (Exception -- equality is expected if the exact sequence of operations to generate two numbers were identical.) Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine On Feb 21, 2007, at 10:11 AM, WolfgangZillig wrote: Hi, I'm quite new to numpy/scipy so please excuse if my problem is too obvious. example code: import numpy as n print n.sin(n.pi) print n.cos(n.pi/2.0) results in: 1.22460635382e-016 6.12303176911e-017 I've expected something around 0. Can anybody explain what I am doing wrong here? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] the neighbourhood of each element of an array
Scipy's ndimage module has a function that takes a generic callback and calls it with the values of each neighborhood (of a given size, and optionally with a particular mask footprint) centered on each array element. That function handles boundary conditions, etc nicely. Unfortunately, I'm not sure if it works with masked arrays, and I think it hands a ravel'd set of pixels back to the callback function. You could probably hack masking in there by passing it the mask concatenated to the array, and then deal with the mask explicitly. Zach On Feb 23, 2007, at 11:39 AM, Bryan Cole wrote: On Fri, 2007-02-23 at 17:38 +0100, [EMAIL PROTECTED] wrote: Hi, Given a (possibly masked) 2d array x, is there a fast(er) way in Numpy to obtain the same result as the following few lines? d = 1 # neighbourhood 'radius' Nrow = x.shape[0] Ncol = x.shape[1] y = array([[x[i-d:i+d+1,j-d:j+d+1].ravel() for j in range(d,Ncol- d)] \ for i in range(d,Nrow-d)]) how about something like er = Nrow - d ec = Ncol - d y = array([x[i:er+i, j:ec+j] for j in arange(-d,d) for i in arange(-d,d)]) now you're looping over a small array and combining slices of the big array (as opposed to looping over the big array and combining slices from a small one). This should be faster for large Nrow, Ncol. BC What you get is an array containing all the elements in a neighbourhood for each element, disregarding the edges to avoid out-of-range problems. The code above becomes quite slow for e.g. a 2000x2000 array. Does anyone know a better approach? Ciao, Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Questions about cross-compiling extensions for mac-ppc and mac-intel
Hi folks, I've been doing a lot of web-reading on the subject, but have not been completely able to synthesize all of the disparate bits of advice about building python extensions as Mac-PPC and Mac-Intel fat binaries, so I'm turning to the wisdom of this list for a few questions. My general goal is to make a double-clickable Mac installer of a set of tools built around numpy, numpy's distutils, a very hacked-up version of PIL, and some fortran code too. To this end, I need to figure out how to get the numpy distutils to cross-compile, generating PPC code and Intel code in separate builds -- and/or generating a universal binary all in one go. (I'd like to distribute a universal version of numpy, but I think that my own code needs to be built/distributed separately for each architecture due to endian- ness issues.) Is there explicit support in distutils for this, or is it a matter of setting the proper environment variables to entice gcc and gfortran to generate code for a specific architecture? One problem is that PIL is a tricky beast, even in the neutered form that I'm using it. It does a compile-time check for the endian-ness of the system, and a compile-time search for the zlib to use, both of which are problematic. To address the former, I'd like to be able to (say) include something like 'config_endian --big' on the 'python setup.py' command-line, and have that information trickle down to the PIL config script (a few subpackages deep). Is this easy or possible? To address the latter, I think I need to have the PIL extensions dynamically link against '/Developer/SDKs/MacOSX10.4u.sdk/usr/lib/ libz.dylib' which is the fat-binary version of the library, using the headers from '/Developer/SDKs/MacOSX10.4u.sdk/usr/include/zlib.h '. Right now, PIL is using system_info from numpy.distutils to find the valid library paths on which libz and its headers might live. This is nice and more or less platform-neutral, which I like. How best should I convince/configure numpy.distutils.system_info to put '/ Developer/SDKs/MacOSX10.4u.sdk/usr/{lib,include}' on the output to get_include_dirs() and get_lib_dirs()? Thanks for any advice or counsel, Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy segmentation fault
If I recall correctly, there's a bug in numpy 1.0.1 on Linux-x86-64 that causes this segfault. This is fixed in the latest SVN version of numpy, so if you can grab that, it should work. I can't find the trac ticket, but I ran into this some weeks ago. Zach On Mar 14, 2007, at 1:36 PM, Cory Davis wrote: Hi there, I have just installed numpy-1.0.1 from source, which seemed to go fine. However when I try to import numpy I get a segmentation fault. A have a 64 bit machine running RedHat Enterprise Linux and Python 2.34 Any clues greatly appreciated. Cheers, Cory. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nd_image.affine_transform edge effects
Hello all, By the way, ringing at sharp edges is an intrinsic feature of higher- order spline interpolation, right? I believe this kind of interpolant is really intended for smooth (band-limited) data. I'm not sure why the pre-filtering makes a difference though; I don't yet understand well enough what the pre-filter actually does. I'm not sure what people normally do in computer graphics, since I'm working more with natural band-limited images, but in the case of Stefan's rotate_artifacts example, wouldn't it be appropriate to use the 1st- or 0th-order spline instead of the 2nd order? If your real-life data are smooth enough, however, then in theory the ringing with higher orders should go away. James is indeed correct. This whole thing is my mistake based on a mis-interpretation of terminology. I've more carefully re-read the paper on which the ndimage spline code is based ('Splines: A Perfect Fit for Signal/Image Processing' by Michael Unser; it's on citeseer). I now (hopefully) understand that the spline pre-filter, while explicitly analogous to a traditional anti-aliasing pre-filter, is actually computing the spline coefficients via a filtering-type operation. While a traditional anti-aliasing filter should blur an image (as a band-limiting step), the fact that the anti-aliasing pre- filter does not is of no concern since the filtered output isn't a band-limited set of pixels, but a set of coefficients for a band- limited spline. The actual transform operators then use these coefficients to (properly) compute pixel values at different locations. I just assumed that the pre-filtering was broken (even on natural images with smooth variations) because images pre-filtered with the spline filter didn't look like traditional pre-filtered images ought. IMO, ticket 213 should be closed as PEBCAK (I'll do that forthwith); further I'm sorry to have caused Peter to be further bothered about this non-issue. Zach On Mar 22, 2007, at 8:13 PM, James Turner wrote: By the way, ringing at sharp edges is an intrinsic feature of higher- order spline interpolation, right? I believe this kind of interpolant is really intended for smooth (band-limited) data. I'm not sure why the pre-filtering makes a difference though; I don't yet understand well enough what the pre-filter actually does. I'm not sure what people normally do in computer graphics, since I'm working more with natural band-limited images, but in the case of Stefan's rotate_artifacts example, wouldn't it be appropriate to use the 1st- or 0th-order spline instead of the 2nd order? If your real-life data are smooth enough, however, then in theory the ringing with higher orders should go away. Sorry if I'm stating the obvious and missing the real point! I just wanted to make sure this hasn't been overlooked... Likewise, sorry I didn't mention this before if it does turn out to be relevant. Let me know if you want references to explain what I said above. James. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nd_image.affine_transform edge effects
Hello all, On Mar 23, 2007, at 3:04 AM, Stefan van der Walt wrote: On Thu, Mar 22, 2007 at 11:20:37PM -0700, Zachary Pincus wrote: The actual transform operators then use these coefficients to (properly) compute pixel values at different locations. I just assumed that the pre-filtering was broken (even on natural images with smooth variations) because images pre-filtered with the spline filter didn't look like traditional pre-filtered images ought. IMO, ticket 213 should be closed as PEBCAK (I'll do that forthwith); further I'm sorry to have caused Peter to be further bothered about this non-issue. The ticket is more concerned with the usability of ndimage. Currently, the principle of least surprise is violated. If you use the default arguments of ndimage.rotate (except for axes, which should still be fixed to be (1,0) by default) and rotate Lena by 30 degrees, you get something with green splotches on (attached to the ticket). IMO, we should either change the default parameters or update the documentation before closing the ticket. Hmm, this is worrisome. There really shouldn't be ringing on continuous-tone images like Lena -- right? (And at no step in an image like that should gaussian filtering be necessary if you're doing spline interpolation -- also right?) I assumed the problem was a non-issue after some testing with a few natural images didn't introduce any artifacts like those, but now with the Lena example I'm not sure. This is frustrating, to be sure, mostly because of my limited knowledge on the subject -- just enough to be dangerous. Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nd_image.affine_transform edge effects
Hello folks, Hmm, this is worrisome. There really shouldn't be ringing on continuous-tone images like Lena -- right? (And at no step in an image like that should gaussian filtering be necessary if you're doing spline interpolation -- also right?) That's hard to say. Just because it's mainly a continuous-tone image doesn't necessarily mean it is well sampled everywhere. This depends both on the subject and the camera optics. Unlike the data I usually work with, I think everyday digital photographs (probably a photo scan in the case of Lena) do not generally have the detector sampling frequency matched to the optical resolution of the image. If that's true, the presence of aliasing in interpolated images depends on the structure of the subject and whether the scene has edges or high- frequency patterns in it. Based on my reading of the two excellent Unser papers (both the one that ndimage's spline code is based on, and the one that Travis linked to), it seems like a major point of using splines for interpolation is *better* behavior in the case of non-band-limited data than the traditional 'anti-aliasing [e.g. lowpass] filter' + 'resampling' + 'reconstruction [e.g. lowpass] filter' procedure. That is, based on my (very limited) understanding (I understand the basics of the sampling theorem and can follow the Unser paper's update thereof, but not much more than that), in the spline case the spline fitting step *replaces* the anti-aliasing filter in the above procedure as the method for dealing with non-band-limited data. And the claim is that it in fact works better in many cases. So it seems that something is wrong if the spline interpolation tools in ndimage only work properly in the band-limited case, or require lowpass filtering. Stefan's rotated Lena example is indeed a bit bizarre on zooming in! However, the artefacts are clearly localized to distinct edges, so I suspect this is indeed some kind of aliasing. Moreover, it looks like Lena has been decimated (reduced in size) prior to the rotation. That is definitely a good way to get artefacts, unless an anti-aliasing filter is applied before shrinking the image. My impression is that this image is probably somewhat undersampled (to understand exactly what that means, read up on the Sampling Theorem). Agreed, it looks like aliasing. Nevertheless, any resampling procedure is supposed to deal with this internally, right? Either by lowpass filtering (traditional case), or by spline fitting (spline case as described by Unser and understood by me) -- it shouldn't be letting aliasing bubble through, correct? The first was on Stefan's artificial data which had sharp edges, and got very nasty ringing artifacts even with 3rd order splines. From your recollection, is this expected behavior based on splines and the nature of Stefan's image, or more likely to be a bug? Your question was aimed at Travis, so I don't want to discourage him from answering it :-), but looking at this in more detail, I do think the amplitude of the artefacts here is greater than I might expect due to ringing with a quadratic b-spline kernel, which I think has minima with amplitudes 10% of the central peak. There has to be SOME oscillation, but in Stefan's rotate_artifacts example it seems to be at the level of ~100%. Also, it is not present on one of the inner edges for some reason. So I do wonder if the algorithm in nd_image is making this worse than it needs to be. Good observation! Here are cubic spline fits to a step and a delta function, which both have quite small amounts of ringing compared to what's observed -- at least on the color images. Maybe 10% ringing in each color channel adds up perceptually more than it does mathematically? import numpy, Gnuplot, scipy.interpolate # step x = numpy.arange(-10, 10) y = numpy.where(x 0, 1, 0) tck = scipy.interpolate.splrep(x, y, s=0, k=3) nx = numpy.linspace(-10, 9, 200, True) ny = scipy.interpolate.splev(nx, tck) d = Gnuplot.Data(numpy.transpose([x, y]), with='points') nd = Gnuplot.Data(numpy.transpose([nx,ny]), with='lines') g = Gnuplot.Gnuplot() g.plot(d, nd) ny.min() # -0.107490563074 -- largest ringing is ~10% of step size # delta x = numpy.arange(-10, 10) y = numpy.where(x == 0, 1, 0) tck = scipy.interpolate.splrep(x, y, s=0, k=3) nx = numpy.linspace(-10, 9, 200, True) ny = scipy.interpolate.splev(nx, tck) d = Gnuplot.Data(numpy.transpose([x, y]), with='points') nd = Gnuplot.Data(numpy.transpose([nx,ny]), with='lines') g = Gnuplot.Gnuplot() g.plot(d, nd) ny.min() # -0.136449610107 -- largest ringing is ~13% of impulse size Now let's use ndimage to rotate a step function by 45%, or zoom in on it: import numpy, scipy.ndimage o = numpy.ones((100, 50), dtype=float) z = numpy.zeros((100, 50), dtype=float) a = numpy.concatenate([o, z], axis=1) b = scipy.ndimage.rotate(a, 45, order=3) b.max() # 1.08832325055 b.min() # -0.0883232505527 c =
Re: [Numpy-discussion] nd_image.affine_transform edge effects
Thanks for the information and the paper link, James. I certainly appreciate the perspective, and now see why the anti-aliasing and reconstruction filtering might best be left to clients of a resampling procedure. Hopefully at least some of the kinks in the spline interpolation (to date: obligate mirror boundary conditions, extra edge values, integer overflow) can be smoothed out. I can't guarantee that I'll have the time or expertise to deal with ni_interpolation.c, but I'll try to give it a shot some time here. Zach On Mar 26, 2007, at 1:16 AM, James Turner wrote: PS... (sorry for all the posts, for anyone who isn't interested...) Agreed, it looks like aliasing. Nevertheless, any resampling procedure is supposed to deal with this internally, right? Either by lowpass filtering (traditional case), or by spline fitting (spline case as described by Unser and understood by me) -- it shouldn't be letting aliasing bubble through, correct? In the general case, I don't think it is appropriate for the resampling procedure to use low-pass filtering internally to avoid artefacts, except perhaps when downsampling. It probably makes sense for computer graphics work, but there are cases where the input data are band limited to begin with and any degradation in resolution is unacceptable. Where needed, I think low-pass filtering should either be the responsibility of the main program or an option. It's not even possible for the resampling procedure to prevent artefacts in every case, since the aliasing in a badly undersampled image cannot be removed post factum (this is for undersampled photos rather than artificial graphics, which I think are fundamentally different because everything is defined on the grid, although I haven't sat down and proved it mathematically). I'm also not sure how the procedure could decide on the level of smoothing needed for a given dataset without external information. Of course intermediate-order splines will probably keep everyone happy, being reasonably robust against ringing effects without causing much smoothing or interpolation error :-). By the way, I think you and Stefan might be interested in a medical imaging paper by Lehmann et al. (1999), which gives a very nice overview of the properties of different interpolation kernels: http://ieeexplore.ieee.org/Xplore/login.jsp?url=/ iel5/42/17698/00816070.pdf?arnumber=816070 For what it's worth, I'd agree with both of you that the numeric overflow should be documented if not fixed. It sounds like Stefan has figured out a solution for it though. If you make sense of the code in ni_interpolation.c, Stefan, I'd be very interested in how to make it calculate one less value at the edges :-). Cheers, James. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New Operators in Python
Hi folks, Sorry to rain on this parade, but unicode variable names and/or other syntactic elements have already been rejected for Python 3: http://www.python.org/dev/peps/pep-3099/ Python 3000 source code won't use non-ASCII Unicode characters for anything except string literals or comments. To tell the truth, in my code (for an n=1 example), I use elementwise (or broadcasted) multiplication about an order of magnitude more than matrix multiplication. Now, there's plenty of linear algebra in my work, but it's usually boxed off enough from the rest that converting everything to matrices, or using the 'dot' function (etc.), is just fine. Personally, I even prefer the current way that numpy does things -- I've always *really* disliked matlab's special syntax for elementwise multiplication. Now, clearly there are cases where this is handy, but at least in looking over my code, I find that those cases are quite rare, really. So, in large part, I really don't feel a strong need for new operators in numpy. (And in the rest of python too! The @ decorator pie-syntax is quite enough for line-noise characters, in my personal opinion. And I think that python-dev pretty well agrees too, based on the raging flame wars about adding even that.) Now, this is of course just my single opinion, but folks should recall that even C++, which rarely met a feature that it didn't like, drew the line at adding extra syntactic operators to the language that existed only to be overridden/implemented by users. (Which is precisely what's being proposed here.) Anyhow, feel free to disagree with me -- I'm no expert here. I'm only mentioning this as a public service to make it clear that most of what's being proposed in this thread is, for better or worse, 100% dead-in-the-water for Python 3, and the rest will have a fairly significant uphill battle. Zach On Mar 26, 2007, at 2:42 AM, dmitrey wrote: The unicode keyboards sailing everywhere is just a matter of time And python 2-symbol operators soon will look obsolete, this will increase migrating from python to Sun fortress etc. I took a look at their unicode syntax for math formulas http://research.sun.com/projects/plrg/faq/NAS-CG.pdf it looks (vs Python) (or even MATLAB) like Pentium 4 vs Intel 386 processors. It just allows copy-paste from articles of many formulas, including ro, 'beta' and other non-ascii symbols Also, implementing unicode will allow separate operators for (for example) MATLAB cross() equivalent (vector multiplication of vectors). WBR, D. René Bastian wrote: Hello, I am interest both in numarray type multiplication and matrix type multiplication. But I am not shure that I can buy an Unicode keyboard. May be it would be possible to implement a class with user definissable (?) signs. My choice : a * b - numarray type multi a !* b - matrix ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix indexing question
Since matrices are an iterable Python object, we *expect* to iterate over the contained objects. (Arrays.) I am not sure why this is not evident to all, but it is surely the sticking point in this discussion. A matrix is not a container of matrices. That it acts like one is surprsing. Surprises are bad unless they have a clear justification. Perhaps a clear justification exists, but it has not been offered in this discussion. I think that a clear justification has been offered several times on the list recently, though maybe not all in this thread. Matrices in numpy seem to exist as a construct to facilitate linear algebra. One such property is that row- and column-vector slices of matrices are (1xN) or (Nx1) matrices, because otherwise common linear algebra things -- like how you multiply a row-vector or a column vector by a matrix, and whether and when it needs to be transposed -- do not translate properly from linear algebra notation like in textbooks and papers. Once the matrix class is committed to providing row- and column- vector slices as other matrices, it makes no sense to have iteration over the matrix provide a different data type than slicing. Basically, my rule of thumb has been to *only* use matrices when I'm copying linear algebra operations out of textbooks and papers, and I want the notations to align. Doing other, non-linear-algebra operations with matrices -- like iterating over their elements -- is not really worth it. There's a meta question, of course: should things be changed to make it worth it to do pythonic tasks with matrices? Should there be special elementwise vs. matrix-operation operators? Should numpy work a lot more like matlab? That discussion is on-going in another thread, I think. Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix indexing question
Exactly: that was one other thing I found artificial. Surely the points will then be wanted as arrays. So my view is that we still do not have a use case for wanting matrices yielded when iterating across rows of a matrix. It's pretty clear from my perspective: 1-D slices of matrices *must* be matrices. I gave an intuitive make-it-fit-with-known-notation explanation, and Charles gave a theoretically-grounded explanation. There was no objection to this from any quarter. Given that 1-D slices of matrices must be matrices for deep mathematical reasons as well as notational compatibility with the rest of linear algebra -- e.g. that m[0] yields a matrix if m is a matrix-- it almost certainly would violate the principle of least surprise for iteration over m (intuitively understood to be choosing m [0] then m[1] and so forth) to yield anything other than a matrix. This can't possibly be what you're asking for, right? You aren't suggesting that m[0] and list(iter(m))[0] should be different types? There are many clear and definite use cases for m[0] being a matrix, by the way, in addition to the theoretical arguments -- these aren't hard to come by. Whether or nor there are actual good use-cases for iterating over a matrix, if you believe that m[0] should be a matrix, it's madness to suggest that list(iter(m))[0] should be otherwise. My opinion? Don't iterate over matrices. Use matrices for linear algebra. There's no iteration construct in linear algebra. The behavior you find so puzzling is a necessary by-product of the fundamental behavior of the matrix class -- which has been explained and which you offered no resistance to. Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix indexing question
Hello all, I suspect my previous email did not contain the full chain of my reasoning, because I thought that some parts were basically obvious. My point was merely that given some pretty basic fundamental tenants of numpy, Alan's suggestions quickly lead to *serious issues* far worse than the original problem. Thanks Chris for making this more explicit; here I will do so further. Let A be an array of arbitrary shape, and M be m-by-n matrix. (1) A[i] equals by definition A[i,...] This is a pretty fundamental part of numpy, right? That property (and its relatives) along with the broadcasting behavior, are in large part what make numpy so very expressive compared to matlab, etc. I can write code in numpy to do multidimensional operations over arbitrary-shaped and arbitrary-dimensioned data that look *exactly* like the corresponding 1D operations. It's amazing, really, how numpy's implicit indexing (this property), broadcasting (sort of the inverse of this, really, where shape tuples are extended as necessary with implicit 1's, instead of slices being extended with :'s as necessary), and fancy indexing, allow complex operations to be written compactly, clearly, and in a fully-general manner. I thought that point (1) was obvious, as it's a pretty fundamental part of numpy (and Numeric and Numarray before that). It's in absolutely no way a trivial convenience. Of course, broadcasting and ufuncs, etc, contribute a lot more to expressiveness than this property, but it's part of the family. (2) Thus, M[i] equals by definition M[i,:]. By the mathematical definition of matrices and for pragmatic reasons, M[i,:] is another matrix. Here the trouble starts. Should matrices, not being arbitrary in their dimension, even have the ability to be indexed as M[i]? Or should they be completely different beasts than arrays? Which violates the principle of least surprise more? Now, I think we all more or less agree that M[i,:] has to be a matrix, because otherwise the matrix class is pretty worthless for linear algebra, and doesn't operate according to the mathematical definition of a matrix. Indeed, this is one of the basic properties of the numpy matrix -- that, and that the multiplication operator is defined as matrix-multiplication. Remove that and there's almost no point in *having* a separate matrix class. Want a use-case? Do the singular value decomposition, and get the product of the second-smallest singular vectors as Vt[:,-2] * U [-2,:]. (Things vaguely akin to this come up a lot.) You can copy the operation from any linear algebra text if the row- and column-vectors are treated as above. Otherwise you have to wonder whether that's supposed to be an outer or inner product, and use the corresponding operator -- and at that point, why was I using matrix classes anyway? (3) If M[i] = M[i,:], and M[i,:] is a matrix, then the iteration behavior of matrices *has* to yield other matrices, unless Alan is willing to suggest that list(iter(m))[i] should have a different type than M[i] -- a clear absurdity. This is the point that I was trying to make previously, having mistakenly assumed that point (1) was clear to all, and that point (2) had been clearly established. So, if we trace the reasoning of Alan's suggestion, coupled with these above properties, we get just that: (a) Iteration over M should yield arrays. (b) Ergo M[i] should be an array. (c) Ergo M[i,:] should be an array. (d) Ergo the matrix class should be identical to the array class except for overloaded multiplication, and the only way to get a 'row' or 'column' vector from a matrix that operates as a proper row or column vector should is M[i:i+1,:]. Whicy is a lot of typing to get the 'i-th' vector from the matrix. I don't think these changes are worth it -- it basically discards the utility of the matrix class for linear algebra in order to make the matrix class more useful for general purpose data storage (a role already filled by the array type). Here is another set of changes which would make matrices fully consistent with their linear-algebra roots: (a) M[i,:] should be a matrix. (b) M[i] is an error. (c) Iteration over M is an error. That's kind of lame though, right? Because then matrices are completely inconsistent with their numpy roots. So we are left with the current behavior: (a) M[i,:] is a matrix. (b) M[i] is a matrix. (c) Iteration over M yields matrices. My view is that, fundamentally, they are linear algebra constructs. However, I also think it would be far more harmful to remove basic behavior common to the rest of numpy (e.g. that M[i] = M[i,:]), and behavior that comes along with that (iterability). Hence my suggestion: treat matrices like linear algebra constructs -- don't iterate over them (unless you know what you're doing). Don't index them like M[i] (unless you know what you're doing). Maybe it's just a little
Re: [Numpy-discussion] Buffer interface PEP
Hi, I have a specific question and then a general question, and some minor issues for clarification. Specifically, regarding the arguments to getbufferproc: 166 format 167address of a format-string (following extended struct 168syntax) indicating what is in each element of 169of memory. The number of elements is len / itemsize, 170where itemsize is the number of bytes implied by the format. 171NULL if not needed in which case format is B for 172unsigned bytes. The memory for this string must not 173be freed by the consumer --- it is managed by the exporter. Is this saying that either NULL or a pointer to B can be supplied by getbufferproc to indicate to the caller that the array is unsigned bytes? If so, is there a specific reason to put the (minor) complexity of handling this case in the caller's hands, instead of dealing with it internally to getbufferproc? In either case, the wording is a bit unclear, I think. The general question is that there are several other instances where getbufferproc is allowed to return ambiguous information which must be handled on the client side. For example, C-contiguous data can be indicated either by a NULL strides pointer or a pointer to a properly- constructed strides array. Clients that can't handle C-contiguous data (contrived example, I know there is a function to deal with that) would then need to check both for NULL *and* inside the strides array if not null, before properly deciding that the data isn't usable them. Similarly, the suboffsets can be either all negative or NULL to indicate the same thing. Might it be more appropriate to specify only one canonical behavior in these cases? Otherwise clients which don't do all the checks on the data might not properly interoperate with providers which format these values in the alternate manner. Also, some typos, and places additional clarification could help: 253 PYBUF_STRIDES (strides and isptr) Should 'isptr' be 'suboffsets'? 75 of a larger array can be described without copying the data. T Dangling 'T'. 279 Get the buffer and optional information variables about the buffer. 280 Return an object-specific view object (which may be simply a 281 borrowed reference to the object itself). This phrasing (and similar phrasing elsewhere) is somewhat opaque to me. What's an object-specific view object? 287 Call this function to tell obj that you are done with your view Similarly, the 'view' concept and terminology should be defined more clearly in the preamble. 333 The struct string-syntax is missing some characters to fully 334 implement data-format descriptions already available elsewhere (in 335 ctypes and NumPy for example). Here are the proposed additions: Is the following table just the additions? If so, it might be good to show the full spec, and flag the specific additions. If not, then the additions should be flagged. 341 't' bit (number before states how many bits) vs. 372 According to the struct-module, a number can preceed a character 373 code to specify how many of that type there are. The I'm confused -- could this be phrased more clearly? Does '5t' refer to a field 5-bits wide, or 5-one bit fields? Is 't' allowed? If so, is it equivalent to or different from '5t'? 378 Functions should be added to ctypes to create a ctypes object from 379 a struct description, and add long-double, and ucs-2 to ctypes. Very cool. In general, the logic of the 'locking mechanism' should be described at a high level at some point. It's described in nitty-gritty details, but at least I would have appreciated a bit more of a discussion about the general how and why -- this would be helpful to clients trying to use the locking mechanism properly. Thanks to Travis and everyone else involved for your work on this cogent and sorely-needed PEP. Zach On Mar 27, 2007, at 12:42 PM, Travis Oliphant wrote: Hi all, I'm having a good discussion with Carl Banks and Greg Ewing over on python-dev about the buffer interface. We are converging on a pretty good design, IMHO. If anybody wants to participate in the discussion, please read the PEP (there are a few things that still need to change) and add your two cents over on python-dev (you can read it through the gmane gateway and participate without signing up). The PEP is stored in numpy/doc/pep_buffer.txt on the SVN tree for NumPy Best regards, -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A unique function...
Hello, There's unique and unique1d, but these don't output the number of occurences. There's also bincount, which outputs the number of each element, but includes zeros for non-present elements and so could be problematic for certain data. Zach On Mar 27, 2007, at 2:28 PM, Pierre GM wrote: All, I could swear that I ran once into a numpy (or scipy) function that output the unique values of a 1d ndarray along with the number of occurences of each element. If I'm not completely mistaken, it 's a private function initially in Fortran. Does this ring a bell to anyone ? Where could I find this function ? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Buffer interface PEP
Is this saying that either NULL or a pointer to B can be supplied by getbufferproc to indicate to the caller that the array is unsigned bytes? If so, is there a specific reason to put the (minor) complexity of handling this case in the caller's hands, instead of dealing with it internally to getbufferproc? In either case, the wording is a bit unclear, I think. Yes, the wording could be more clear. I'm trying to make it easy for exporters to change to the new buffer interface. The main idea I really want to see is that if the caller just passes NULL instead of an address then it means they are assuming the data will be unsigned bytes It is up to the exporter to either allow this or raise an error. The exporter should always be explicit if an argument for returning the format is provided (I may have thought differently a few days ago). Understood -- I'm for the exporters being as explicit as possible if the argument is provided. The general question is that there are several other instances where getbufferproc is allowed to return ambiguous information which must be handled on the client side. For example, C-contiguous data can be indicated either by a NULL strides pointer or a pointer to a properly- constructed strides array. Here. I'm trying to be easy on the exporter and the consumer. If the data is contiguous, then neither the exporter nor will likely care about the strides. Allowing this to be NULL is like the current array protocol convention which allows this to be None. See below. My comments here aren't suggesting that NULL should be disallowed. I'm basically wondering whether it is a good idea to allow NULL and something else to represent the same information. (E.g. as above, an exporter could choose to show C-contiguous data with a NULL returned to the client, or with a trivial strides array). Otherwise two different exporters exporting identical data could provide different representations, which the clients would need to be able to handle. I'm not sure that this is a recipe for perfect interoperability. Clients that can't handle C-contiguous data (contrived example, I know there is a function to deal with that) would then need to check both for NULL *and* inside the strides array if not null, before properly deciding that the data isn't usable them. Not really. A client that cannot deal with strides will simply not pass an address to a stride array to the buffer protocol (that argument will be NULL). If the exporter cannot provide memory without stride information, then an error will be raised. This doesn't really address my question, which I obscured with a poorly-chosen example. The PEP says (or at least that's how I read it) that if the client *does* provide an address for the stride array, then for un-strided arrays, the exporter may either choose to fill on NULL at that address, or provide a strides array. Might it be easier for clients if the PEP required that NULL be returned if the array is C-contiguous? Or at least strongly suggested that? (I understand that there might be cases where an naive exporter thinks it is dealing with a strided array but it really is contiguous, and the exporter shouldn't be required to do that detection.) The use-case isn't too strong here, but I think it's clear in the suboffsets case (see below). Similarly, the suboffsets can be either all negative or NULL to indicate the same thing. I think it's much easier to check if suboffsets is NULL rather than checking all the entries to see if they are -1 for the very common case (i.e. the NumPy case) of no dereferencing.Also, if you can't deal with suboffsets you would just not provide an address for them. My point exactly! As written, the PEP allows an exporter to either return NULL, or an array of all negative numbers (in the case that the client requested that information), forcing a fully -conforming client to make *both* checks in order to decide what to do. Especially in this case, it would make sense to require a NULL be returned in the case of no suboffsets. This makes things easier for both clients that can deal with both suboffsets or non-offsets (e.g. they can branch on NULL, not on NULL or all-are-negative), and also for clients that can *only* deal with suboffsets. Now, in these two cases, the use-case is pretty narrow, I agree. Basically it makes things easier for savvy clients that can deal with different data types, by not forcing them to make two checks (strides == NULL or strides array is trivial; suboffsets == NULL or suboffsets are all negative) when one would do. Again, this PEP allows the same information can be passed in two very different ways, when it really doesn't seem like that ambiguity makes life that much easier for exporters. Maybe I'm wrong about this last point, though. Then there comes the trade-off -- should savvy clients
Re: [Numpy-discussion] matrix indexing question
Looks promising! On 3/29/07, Bill Baxter [EMAIL PROTECTED] wrote: On 3/30/07, Timothy Hochberg [EMAIL PROTECTED] wrote: Note, however that you can't (for instance) multiply column vector with a row vector: (c)(r) Traceback (most recent call last): ... TypeError: Cannot matrix multiply columns with anything That should be allowed. (N,1)*(1,M) is just an (N,M) matrix with entries C[i,j] = A[i,0]*B[0,] I thought about that a little, and while I agree that it could be allowed, I'm not sure that it should be allowed. It's a trade off between a bit of what I would guess is little used functionality with some enhanced error checking (I would guess that usually row*column signals a mistake). However, I don't care much one way or the other; it's not hard to allow. I'm also for allowing it; from the perspective of one writing code that looks like typical (if potentially technically incorrect) notation, being able to write direct analogs to a'b (to get a scalar) or ab' (to get an MxN matrix) and have everything work would be quite useful. Especially in various reconstruction tasks (e.g. using svd to approximate a given matrix with a lower-rank one), the outer product of a row and a column vector comes up often enough that I would be loathe to have it raise an error. I kind of like the idea of using call for multiply, though. If it doesn't turn out to have any major down sides it could be a good way to give ndarray a concise syntax for dot. We'll see how it goes down this time. I've proposed using call before, since I've thought the matrix situation was kind of silly for what seems like ages now, but it always sinks without a ripple. The call syntax is nice, if a bit opaque to someone looking at the code for the first time. On the other hand, it's explicit in a way that overloaded multiplication just isn't. I like it (for what little that's worth). Zach -- //=][=\\ [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] argsort and take along arbitrary axes
Hello all, I've got a few questions that came up as I tried to calculate various statistics about an image time-series. For example, I have an array of shape (t,x,y) representing t frames of a time-lapse of resolution (x,y). Now, say I want to both argsort and sort this time-series, pixel- wise. (For example.) In 1-d it's easy: indices = a.argsort() sorted = a[indices] I would have thought that doing this on my 3-d array would work similarly: indices = a.argsort(axis=0) sorted = a.take(indices, axis=0) Unfortunately, this gives a ValueError of dimensions too large. Now, I know that 'a.sort(axis=0)' works fine for the given example, but I'm curious about how to this sort of indexing operation in the general case. Thanks for any insight, Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort and take along arbitrary axes
I've got a few questions that came up as I tried to calculate various statistics about an image time-series. For example, I have an array of shape (t,x,y) representing t frames of a time-lapse of resolution (x,y). Now, say I want to both argsort and sort this time-series, pixel- wise. (For example.) In 1-d it's easy: indices = a.argsort() sorted = a[indices] I would have thought that doing this on my 3-d array would work similarly: indices = a.argsort(axis=0) sorted = a.take(indices, axis=0) Unfortunately, this gives a ValueError of dimensions too large. Now, I know that 'a.sort(axis=0)' works fine for the given example, but I'm curious about how to this sort of indexing operation in the general case. Unfortunately, argsort doesn't work transparently with take or fancy indexing for multidimensional arrays. I am thinking of adding a function argtake for this, and also for the results returned by argmax and argmin, but at the moment you have to fill in the values of the other indices and use fancy indexing. For now, it is probably simpler, prettier, and faster to just sort the array. Thanks Charles. Unfortunately, the argsort/sort buisness was, as I mentioned, just an example of the kind of 'take' operation that I am trying to figure out how to do. There are other operations that will have similarly-formatted 'indices' arrays (as above) that aren't generated from argsort... As such, how do I fill in the values of the other indices and use fancy indexing? Even after reading the numpy book about that, and reading the docstring for numpy.take, I'm still vague on this. Would I use numpy.indices to get a list of index arrays, and then swap in (at the correct position in this list) the result of argsort (or the other operations), and use that for fancy indexing? Is there an easier/faster way? Thanks again, Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] very large matrices.
Hello Dave, I don't know if this will be useful to your research, but it may be worth pointing out in general. As you know PCA (and perhaps some other spectral algorithms?) use eigenvalues of matrices that can be factored out as A'A (where ' means transpose). For example, in the PCA case, if A is a centered data matrix (i.e. the mean value of each data point has been subtracted off), and if the data elements are row- vectors, then A'A is the covariance matrix. PCA then examines the (nonzero) eigenvectors of this covariance matrix A'A. Now, it's possible to determine the eigenvalues/eigenvectors for A'A from the matrix AA', which in many useful cases is much smaller than A'A. For example, imagine that you have 100 data points, each in 10,000 dimensions. (This is common in imaging applications.) A'A is 10,000x10,000, but AA' is only 100x100. We can get the eigenvalues/ vectors of A'A from those of AA', as described below. (This works in part because in such cases, the larger matrix is rank-deficient, so there will only be e.g. 100 nonzero eigenvalues anyway.) If you're lucky enough to have this kind of structure in your problem, feel free to use the following code which exploits that (and explains in a bit more detail how it works). Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine def _symm_eig(a): Return the eigenvectors and eigenvalues of the symmetric matrix a'*a. If a has more columns than rows, then that matrix will be rank- deficient, and the non-zero eigenvalues and eigenvectors can be more easily extracted from the matrix a*a', from the properties of the SVD: if a of shape (m,n) has SVD u*s*v', then: a'*a = v*s'*s*v' a*a' = u*s*s'*u' That is, v contains the eigenvectors of a'*a, with s'*s the eigenvalues, according to the eigen-decomposition theorem. Now, let s_hat, an array of shape (m,n), be such that s * s_hat = I (m,m) and s_hat * s = I(n,n). With that, we can solve for u or v in terms of the other: v = a'*u*s_hat' u = a*v*s_hat m, n = a.shape if m = n: # just return the eigenvalues and eigenvectors of a'a vecs, vals = _eigh(numpy.dot(a.transpose(), a)) vecs = numpy.where(vecs 0, 0, vecs) return vecs, vals else: # figure out the eigenvalues and vectors based on aa', which is smaller sst_diag, u = _eigh(numpy.dot(a, a.transpose())) # in case due to numerical instabilities we have sst_diag 0 anywhere, # peg them to zero sst_diag = numpy.where(sst_diag 0, 0, sst_diag) # now get the inverse square root of the diagonal, which will form the # main diagonal of s_hat err = numpy.seterr(divide='ignore', invalid='ignore') s_hat_diag = 1/numpy.sqrt(sst_diag) numpy.seterr(**err) s_hat_diag = numpy.where(numpy.isfinite(s_hat_diag), s_hat_diag, 0) # s_hat_diag is a list of length m, a'u is (n,m), so we can just use # numpy's broadcasting instead of matrix multiplication, and only create # the upper mxm block of a'u, since that's all we'll use anyway... v = numpy.dot(a.transpose(), u[:,:m]) * s_hat_diag return sst_diag, v def _eigh(m): Return eigenvalues and corresponding eigenvectors of the hermetian array m, ordered by eigenvalues in decreasing order. Note that the numpy.linalg.eigh makes no order guarantees. values, vectors = numpy.linalg.eigh(m) order = numpy.flipud(values.argsort()) return values[order], vectors[:,order] On May 13, 2007, at 8:35 PM, Dave P. Novakovic wrote: There are definitely elements of spectral graph theory in my research too. I'll summarise We are interested in seeing the each eigenvector from svd can represent in a semantic space In addition to this we'll be testing it against some algorithms like concept indexing (uses a bipartitional k-meansish method for dim reduction) also testing against Orthogonal Locality Preserving indexing, which uses the laplacian of a similarity matrix to calculate projections of a document (or term) into a manifold. These methods have been implemented and tested for document classification, I'm interested in seeing their applicability to modelling semantics with a system known as Hyperspace to analog language. I was hoping to do svd to my HAL built out of reuters, but that was way too big. instead i'm trying with the traces idea i mentioned before (ie contextually grepping a keyword out of the docs to build a space around it.) Cheers Dave On 5/14/07, Charles R Harris [EMAIL PROTECTED] wrote: On 5/13/07, Dave P. Novakovic [EMAIL PROTECTED] wrote: Are you trying some sort of principal components analysis? PCA is indeed one part of the research I'm doing. I had the impression you were trying to build a linear space in which to embed a model, like atmospheric folk do when they try to invert
[Numpy-discussion] Change to get_printoptions?
Hello all, I just recently updated to the SVN version of numpy to test my code against it, and found that a small change made to numpy.get_printoptions (it now returns a dictionary instead of a list) breaks my code. Here's the changeset: http://projects.scipy.org/scipy/numpy/changeset/3877 I'm not really looking forward to needing to detect numpy versions just so I can do the right thing with get_printoptions, but I do agree that the new version of the function is more sensible. My question is if there's any particular policy about backwards- incompatible python api changes, or if I need to be aware of their possibility at every point release. (Either is fine -- I'm happy for numpy to be better at the cost of incompatibility, but I'd like to know if changes like these are the rule or exception.) Also, in terms of compatibility checking, has anyone written a little function to check if numpy is within a particular version range? Specifically, one that handles numpy's built from SVN as well as from release tarballs. Thanks, Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] getting numPy happening for sciPy
On Jul 27, 2007, at 2:42 AM, Nils Wagner wrote: I cannot reproduce the problem concerning #401. It is Mac specific problem. Am I missing something ? I can't reproduce this problem either. I just yesterday built scipy from SVN on two different OS X 10.4.10 boxes, one using the fortran compiler from hpc.sourceforge.net (not the latest 2007 release, but the december 2006 one), and the other using the compiler from r.research.att.com/tools. Everything else was similar, and everything worked fine with regard to ticket 401. On the other hand, when I tried to compile scipy using the latest (2007-05) gfortran from hpc.sourceforge.net, I got bizarre link errors about MACOSX_DEPLOYMENT_TARGET being set incorrectly. (See previous email here http://projects.scipy.org/pipermail/scipy-user/ 2007-June/012542.html ). Interestingly, with the earlier version of gfortran from hpc.sourceforge.net, and with the r.research.att.com/ tools version, this problem does not arise. Anyhow, my point is that there are still odd linker errors (as in ticket 401) lurking that may or may not have anything to do with scipy per se, but might have to do with odd and perhaps buggy builds of gfortran. Feh -- I wish Apple would just start including a fortran compiler with the rest of their dev tools. The situation otherwise is not good. Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Vector magnitude?
Hello, 'len' is a (pretty basic) python builtin function for getting the length of anything with a list-like interface. (Or more generally, getting the size of anything that is sized, e.g. a set or dictionary.) Numpy arrays offer a list-like interface allowing you to iterate along their first dimension, etc. (*) Thus, len(numpy_array) is equivalent to numpy_array.shape[0], which is the number of elements along the first dimension of the array. Zach (*) For example, this is useful if you've got numerous data vectors packed into an array along the first dimension, and want to iterate across the different vectors. a = numpy.ones((number_of_data_elements, size_of_data_element)) for element in a: # element is a 1-D array with a length of 'size_of_data_element' Note further that this works even if your data elements are multi- dimensional; i.e. the above works the same if: element_shape = (x,y,z) a = numpy.ones((number_of_data_elements,)+element_shape) for element in a: # element is a 3-D array with a shape of (x,y,z) On Sep 5, 2007, at 2:40 PM, Robert Dailey wrote: Thanks for your response. I was not able to find len() in the numpy documentation at the following link: http://www.scipy.org/doc/numpy_api_docs/namespace_index.html Perhaps I'm looking in the wrong location? On 9/5/07, Matthieu Brucher [EMAIL PROTECTED] wrote: 2007/9/5, Robert Dailey [EMAIL PROTECTED]: Hi, I have two questions: 1) Is there any way in numpy to represent vectors? Currently I'm using 'array' for vectors. A vector is an array with one dimension, it's OK. You could use a matrix of dimension 1xn or nx1 as well. 2) Is there a way to calculate the magnitude (length) of a vector in numpy? Yes, len(a) Matthieu ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Find first occurrence?
Hello all, On several occasions, I've had the need to find only the first occurrence of a value in an unsorted numpy array. I usually use numpy.where(arr==val)[0] or similar, and don't worry about the fact that I'm iterating across the entire array. However, sometimes the arrays are pretty big and the find operation is in an inner loop, so I was wondering if there's already a C extension function somewhere in numpy or scipy that does a fast find first operation, or anything similar. (Easy enough to write my own, and maybe given the issues inherent in comparing float equality, etc., it doesn't belong in the core numpy anyway...) Thanks, Zach Pincus Program in Biomedical Informatics and Department of Biochemistry ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] display numpy array as image
Thanks for the suggestions, everyone! All very informative and most helpful. For what it's worth, here's my application: I'm building a tool for image processing which needs some manual input in a few places (e.g. user draws a few lines). The images are greyscale images with 12-14 bits of dynamic range (from a microscope), so I need to have some basic brightness/contrast/gamma controls, as well as allowing basic drawing on the image to get the needed user input. It looks like GL or wx will be best suited here, I think? (I presume that python/numpy/ [GL|wx] can keep up with things like dragging a slider to change brightness/contrast/other LUT changes, as long as I code reasonably.) Anyhow, thanks for all the input, Zach On Nov 29, 2007, at 9:03 PM, Joe Harrington wrote: If you want to explore the array interactively, blink images, mess with colormaps using the mouse, rescale the image values, mark regions, add labels, look at dynamic plots of rows and columns, etc., get the ds9 image viewer and the xpa programs that come with it that allow it to communicate with other programs: ftp://sao-ftp.harvard.edu/pub/rd/ds9 http://hea-www.harvard.edu/RD/ds9/index.html Then get the Python numdisplay package, which uses xpa. You have to get numdisplay from inside the stsci_python package: http://www.stsci.edu/resources/software_hardware/pyraf/stsci_python/ current/download Just grab the numdisplay directory from within that. Older versions of numdisplay are standalone but don't work perfectly. Beware, there are outdated web sites about numdisplay on the stsci site. Don't google! Run ds9 before you load numdisplay. Then you can send your python arrays to a real interactive data viewer at will. There are even mechanisms to define physical coordinates mapped from the image coordinates. --jh-- ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] display numpy array as image
Hi Stéfan, Thanks -- I hadn't realized matplotlib's user-interaction abilities were that sophisticated! I'll definitely give that route a shot. Zach On Dec 3, 2007, at 9:46 AM, Stefan van der Walt wrote: Hi Zach Attached is some code for removing radial distortion from images. It shows how to draw lines based on user input using matplotlib. It is not suited for a big application, but useful for demonstrations. Try it on http://mentat.za.net/results/window.jpg Regards Stéfan On Thu, Nov 29, 2007 at 11:59:05PM -0500, Zachary Pincus wrote: Thanks for the suggestions, everyone! All very informative and most helpful. For what it's worth, here's my application: I'm building a tool for image processing which needs some manual input in a few places (e.g. user draws a few lines). The images are greyscale images with 12-14 bits of dynamic range (from a microscope), so I need to have some basic brightness/contrast/gamma controls, as well as allowing basic drawing on the image to get the needed user input. It looks like GL or wx will be best suited here, I think? (I presume that python/ numpy/ [GL|wx] can keep up with things like dragging a slider to change brightness/contrast/other LUT changes, as long as I code reasonably.) Anyhow, thanks for all the input, Zachradial.py ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how do I list all combinations
Hi all, I use numpy's own ndindex() for tasks like these: In: numpy.ndindex? Type: type Base Class: type 'type' String Form:class 'numpy.lib.index_tricks.ndindex' Namespace: Interactive File: /Library/Frameworks/Python.framework/Versions/2.4/ lib/python2.4/site-packages/numpy/lib/index_tricks.py Docstring: Pass in a sequence of integers corresponding to the number of dimensions in the counter. This iterator will then return an N-dimensional counter. Example: for index in ndindex(3,2,1): ... print index (0, 0, 0) (0, 1, 0) (1, 0, 0) (1, 1, 0) (2, 0, 0) (2, 1, 0) Here's a combination function using numpy.ndindex: def combinations(*lists): lens = [len(l) for l in lists] for indices in numpy.ndindex(*lens): yield [l[i] for l, i in zip(lists, indices)] r1=[dog,cat] r2=[1,2] list(combinations(r1, r2)) Out: [['dog', 1], ['dog', 2], ['cat', 1], ['cat', 2]] Or you could use 'map' and 'operator.getitem', which might be faster(?): def combinations(*lists): lens = [len(l) for l in lists] for indices in numpy.ndindex(*lens): yield map(operator.getitem, lists, indices) In the python cookbook, there are numerous similar recipes for generating permutations, combinations, and the like. Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine On Dec 26, 2007, at 5:48 PM, Timothy Hochberg wrote: Here's a baroque way to do it using generated code: def cg_combinations(seqs): n = len(seqs) chunks = [def f(%s): % ', '.join('s%s' % i for i in range (n))] for i in reversed(range(n)): chunks.append( * (n -i) + for x%s in s%s: % (i, i)) chunks.append( * n + yield ( + ', '.join('x%s' % i for i in range(n)) + ')') code = '\n'.join(chunks) exec code return f(*seqs) It should be reasonably fast, if non-obvious. I've included a version with some timing and testing against Chucks version below. Enjoy. (Also, it can be simplified slightly, but I wanted to generate in the same order as Chuck for comparison purposes). == def count(seqs) : counter = [0 for i in seqs] maxdigit = [len(i) - 1 for i in seqs] yield counter while True : i = 0; while i len(counter) and counter[i] = maxdigit[i] : counter[i] = 0 i += 1 if i len(counter) : counter[i] += 1 yield counter else : return def count_combinations(seqs): for c in count(seqs): yield tuple(s[i] for (s, i) in zip(seqs, c)) def cg_combinations(seqs): n = len(seqs) chunks = [def f(%s): % ', '.join('s%s' % i for i in range (n))] for i in reversed(range(n)): chunks.append( * (n -i) + for x%s in s%s: % (i, i)) chunks.append( * n + yield ( + ', '.join('x%s' % i for i in range(n)) + ')') code = '\n'.join(chunks) exec code return f(*seqs) a = abcde*10 b = range(99) c = [x**2 for x in range(33)] seqs = [a, b, c] if __name__ == __main__: assert list(count_combinations(seqs)) == list (cg_combinations(seqs)) import timeit test = timeit.Timer('list(count_combinations(seqs))', 'from __main__ import count_combinations, seqs') t1 = test.timeit(number=10) test = timeit.Timer('list(cg_combinations(seqs))', 'from __main__ import cg_combinations, seqs') t2 = test.timeit(number=10) print t1, t2 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Nasty bug using pre-initialized arrays
Hello all, That's well and good. But NumPy should *never* automatically -- and silently -- chop the imaginary part off your complex array elements, particularly if you are just doing an innocent assignment! Doing something drastic like silently throwing half your data away can lead to all kinds of bugs in code written by somebody who is unaware of this behavior (i.e. most people)! It sounds to me like the right thing is to throw an exception instead of downcasting a data object. I'm not sure that I agree! I'd rather not have to litter my code with casting operations every time I wanted to down-convert data types (creating yet more temporary arrays...) via assignment. e.g.: A[i] = calculate(B).astype(A.dtype) vs. A[i] = calculate(B) Further, writing functions to operate on generic user-provided output arrays (or arrays of user-provided dtype; both of which are common e.g. in scipy.ndimage) becomes more bug-prone, as every assignment would need to be modified as above. This change would break a lot of the image-processing code I've written (where one often does computations in float or even double, and then re-scales and down-converts the result to integer for display), for example. I guess that this could be rolled in via the geterr/seterr mechanism, and could by default just print warnings. I agree that silent truncation can be troublesome, but not having to spell every conversion out in explicit ( and error-prone) detail is also pretty useful. (And arguably more pythonic.) Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Nasty bug using pre-initialized arrays
There are two related hierarchies of datatypes: different kinds (integer, floating point, complex floating point) and different precisions within a given kind (int8, int16, int32, int64). The term downcasting should probably be reserved for the latter only. It seems to me that Zach and Scott are defending downcasting of precisions within a given kind. It does not necessarily follow that the behavior we choose for dtypes within a given kind should be the behavior when we are dealing with dtypes across different kinds. We can keep the precision downcasting behavior that you want while raising an error when one attempts to assign a complex number into a floating point array. Actually, my points were explicitly about cross-kind conversions! A lot of image-processing code features heavy use of converting integer images to float for intermediate calculations, and then rescaling and down-converting back to the original type for display, etc. In fact, scipy.ndimage makes use of this, allowing for operations with output into user-specified arrays, or arrays of user-specified dtype, while (I believe) carrying out some of the intermediate calculations at higher precision. As such, pretty much any code that takes a user-specified array and assigns to it the result of a (potentially) mixed-mode calculation would need to be changed from: A[i] = calculate(B) to A[i] = calculate(B).astype(A.dtype) with the proliferation of temp arrays that that entails. I think, but am not sure, that there is a lot of code out there that does this, intentionally, which would be broken by this change. Explicit is indeed better than implicit, but in this case, finding all of the places where mixed-mode conversions happen and tracking them down could be a pain on the same scale as const chasing in C++, where fixing the error in one place makes it reappear elsewhere in the chain of usage, leading to long, painful, and often totally pointless debugging sessions. In the specific case of converting from complex to anything else, I can see the desire for a warning to prevent data loss. But converting from float to integer is a lot more routine and is used a lot in the kind of work that I do, at least. Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Nasty bug using pre-initialized arrays
For large arrays, it makes since to do automatic conversions, as is also the case in functions taking output arrays, because the typecast can be pushed down into C where it is time and space efficient, whereas explicitly converting the array uses up temporary space. However, I can imagine an explicit typecast function, something like a[...] = typecast(b) that would replace the current behavior. I think the typecast function could be implemented by returning a view of b with a castable flag set to true, that should supply enough information for the assignment operator to do its job. This might be a good addition for Numpy 1.1. While that seems like an ok idea, I'm still not sure what's wrong with raising an exception when there will be information loss. The exception is already raised with standard python complex objects. I can think of many times in my code where explicit looping is a necessity, so pre-allocating the array is the only way to go. The issue Charles is dealing with here is how to *suppress* the proposed exception in cases (as the several that I described) where the information loss is explicitly desired. With what's currently in numpy now, you would have to do something like this: A[...] = B.astype(A.dtype) to set a portion of A to B, unless you are *certain* that A and B are of compatible types. This is ugly and also bug-prone, seeing as how there's some violation of the don't-repeat-yourself principle. (I.e. A is mentioned twice, and to change the code to use a different array, you need to change the variable name A twice.) Moreover, and worse, the phrase 'A = B.astype(A.dtype)' creates and destroys a temporary array the same size as B. It's equivalent to: temp = B.astype(A.dtype) A[...] = temp Which is no good if B is very large. Currently, the type conversion in 'A[...] = B' cases is handled implicitly, deep down in the C code where it is very very fast, and no temporary array is made. Charles suggests a 'typecast' operator that would set a flag on the array B so that trying to convert it would *not* raise an exception, allowing for the fast, C-level conversion. (This assumes your proposed change in which by default such conversions would raise exceptions.) This 'typecast' operation wouldn't do anything but set a flag, so it doesn't create a temporary array or do any extra work. But still, every time that you are not *certain* what the type of a result from a given operation is, any code that looks like: A[i] = calculate(...) will need to look like this instead: A[i] = typecast(calculate(...)) I agree with others that such assignments aren't highly common, but there will be broken code from this. And as Charles demonstrates, getting the details right of how to implement such a change is non- trivial. Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] CASTABLE flag
Hello all, In order to help make things regarding this casting issue more explicit, let me present the following table of potential down-casts. (Also, for the record, nobody is proposing automatic up-casting of any kind. The proposals on the table focus on preventing some or all implicit down-casting.) (1) complex - anything else: Data is lost wholesale. (2) large float - smaller float (also large complex - smaller complex): Precision is lost. (3) float - integer: Precision is lost, to a much greater degree than (2). (4) large integer - smaller integer (also signed/unsigned conversions): Data gets lost/mangled/wrapped around. The original requests for exceptions to be raised focused on case (1), where it is most clear that loss-of-data is happening in a way that is unlikely to be intentional. Robert Kern suggested that exceptions might be raised for cases (1) and (3), which are cross-kind casts, but not for within-kind conversions like (2) and (4). However, I personally don't think that down-converting from float to int is any more or less fraught than converting from int32 to int8: if you need a warning/exception in one case, you'd need it in the rest. Moreover, there's the principle of least surprise, which would suggest that complex rules for when exceptions get raised based on the kind of conversion being made is just asking for trouble. So, as a poll, if you are in favor of exceptions instead of automatic down-conversion, where do you draw the line? What causes an error? Robert seemed to be in favor of (1) and (3). Anne seemed to think that only (1) was problematic enough to worry about. I am personally cool toward the exceptions, but I think that case (4) is just as bad as case (3) in terms of data-loss, though I agree that case (1) seems the worst (and I don't really find any of them particularly bad, though case (1) is something of an oddity for newcomers, I agree.) Finally, if people really want these sort of warnings, I would suggest that they be rolled into the get/setoptions mechanism, so that there's a fine-grained mechanism for turning them to off/warn/ raise exception. Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Unexpected integer overflow handling
Hello all, On my (older) version of numpy (1.0.4.dev3896), I found several oddities in the handling of assignment of long-integer values to integer arrays: In : numpy.array([2**31], dtype=numpy.int8) --- ValueErrorTraceback (most recent call last) /Users/zpincus/ipython console ValueError: setting an array element with a sequence. While this might be reasonable to be an error condition, the precise error raised seems not quite right! But not all overflow errors are caught in this way: In : numpy.array([2**31-1], dtype=numpy.int8) Out: array([-1], dtype=int8) As above, numpy is quite happy allowing overflows; it just breaks when doing a python long-int to int conversion. The conversion from numpy-long-int to int does the right thing though (if by right thing you mean allows silent overflow, which is a matter of discussion elsewhere right now...): In : numpy.array(numpy.array([2**31], dtype=numpy.int64), dtype=numpy.int8) Out: array([0], dtype=int8) At least on item assignment, the overflow exception is less odd: In : a = numpy.empty(shape=(1,), dtype=numpy.int8) In : a[0] = 2**31 --- OverflowError Traceback (most recent call last) /Users/zpincus/ipython console OverflowError: long int too large to convert to int Things work right with array element assignment: In : a[0] = numpy.array([2**31], dtype=numpy.int64)[0] But break again with array scalars, and with the strange ValueError again! In : a[0] = numpy.array(2**31, dtype=numpy.int64) --- ValueErrorTraceback (most recent call last) /Users/zpincus/ipython console ValueError: setting an array element with a sequence. Note that non-long-int-to-int array scalar conversions work: In : a[0] = numpy.array(2**31-1, dtype=numpy.int64) Is this still the case for the current version of numpy? Best, Zach ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] finding eigenvectors etc
Hi all, How are you using the values? How significant are the differences? i am using these eigenvectors to do PCA on a set of images(of faces).I sort the eigenvectors in descending order of their eigenvalues and this is multiplied with the (orig data of some images viz a matrix)to obtain a facespace. I've dealt with similar issues a lot -- performing PCA on data where the dimensionality of the data is much greater than the number of data points. (Like images.) In this case, the maximum number of non-trivial eigenvectors of the covariance matrix of the data is min(dimension_of_data, number_of_data_points), so one always runs into the zero-eigenvalue problem; the matrix is thus always ill-conditioned, but that's not a problem in these cases. Nevertheless, if you've got (say) 100 images that are each 100x100 pixels, to do PCA in the naive way you need to make a 1x1 covariance matrix and then decompose it into 10 eigenvectors and values just to get out the 100 non-trivial ones. That's a lot of computation wasted calculating noise! Fortunately, there are better ways. One is to perform the SVD on the 100x1 data matrix. Let the centered (mean-subtracted) data matrix be D, then the SVD provides matrices U, S, and V'. IIRC, the eigenvalues of D'D (the covariance matrix of interest) are then packed along the first dimension of V', and the eigenvalues are the square of the values in S. But! There's an even faster way (from my testing). The trick is that instead of calculating the 1x1 outer covariance matrix D'D, or doing the SVD on D, one can calculate the 100x100 inner covariance matrix DD' and perform the eigen-decomposition thereon and then trivially transform those eigenvalues and vectors to the ones of the D'D matrix. This computation is often substantially faster than the SVD. Here's how it works: Let D, our re-centered data matrix, be of shape (n, k) -- that is, n data points in k dimensions. We know that D has a singular value decomposition D = USV' (no need to calculate the SVD though; just enough to know it exists). From this, we can rewrite the covariance matrices: D'D = VS'SV' DD' = USS'U' Now, from the SVD, we know that S'S and SS' are diagonal matrices, and V and U (and V' and U') form orthogonal bases. One way to write the eigen-decomposition of a matrix is A = QLQ', where Q is orthogonal and L is diagonal. Since the eigen-decomposition is unique (up to a permutation of the columns of Q and L), we know that V must therefore contain the eigenvectors of D'D in its columns, and U must contain the eigenvectors of DD' in its columns. This is the origin of the SVD recipe for that I gave above. Further, let S_hat, of shape (n, k) be the elementwise reciprocal of S (i.e. SS_hat = I of shape (m, n) and S_hatS = I of shape (n, m), where I is the identity matrix). Then, we can solve for U or V in terms of the other: V = D'US_hat' U = DVS_hat So, to get the eigenvectors and eigenvalues of D'D, we just calculate DD' and then apply the symmetric eigen-decomposition (symmetric version is faster, and DD' is symmetric) to get eigenvectors U and eigenvalues L. We know that L=SS', so S_hat = 1/sqrt(L) (where the sqrt is taken elementwise, of course). So, the eigenvectors we're looking for are: V = D'US_hat Then, the principal components (eigenvectors) are in the columns of V (packed along the second dimension of V). Fortunately, I've packaged this all up into a python module for PCA that takes care of this all. It's attached. Zach Pincus Postdoctoral Fellow, Lab of Dr. Frank Slack Molecular, Cellular and Developmental Biology Yale University # Copyright 2007 Zachary Pincus # # This is free software; you can redistribute it and/or modify # it under the terms of the Python License version 2.4 as published by the # Python Software Foundation. import numpy def pca(data, algorithm='eig'): pca(data) - mean, pcs, norm_pcs, variances, positions, norm_positions Perform Principal Components Analysis on a set of n data points in k dimensions. The data array must be of shape (n, k). This function returns the mean data point, the principal components (of shape (p, k), where p is the number pf principal components: p=min(n,k)), the normalized principal components, where each component is normalized by the data's standard deviation along that component (shape (p, k)), the variance each component represents (shape (p,)), the position of each data point along each component (shape (n, p)), and the position of each data point along each normalized component (shape (n, p)). The optional algorithm parameter can be either 'svd' to perform PCA with the singular value decomposition, or 'eig' to use a symmetric eigenvalue decomposition. Empirically, eig is faster on the datasets I have tested. data = numpy.asarray(data) mean = data.mean(axis = 0) centered = data
Re: [Numpy-discussion] ctypes and numpy
Check out this thread: http://www.mail-archive.com/numpy-discuss...@lists.sourceforge.net/msg01154.html In shot, it can be done, but it can be tricky to make sure you don't leak memory. A better option if possible is to pre-allocate the array with numpy and pass that buffer into the C code -- but that's not always possible... Zach On Nov 6, 2009, at 10:18 PM, Chris Colbert wrote: Are you bound to using ctypes as the only option? This could be done quite easily in Cython... On Fri, Nov 6, 2009 at 1:57 PM, Trevor Clarke tre...@notcows.com wrote: I've found some info on ctypes and numpy but it mostly tells me how to pass numpy objects to C funcs. I've got a C func which returns a c_void_p to a buffer and I'd like to access the data using numpy without creating a copy of the data. Could someone point me in the right direction? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] neighborhood iterator
Hi, Having just written some cython code to iterate a neighborhood across an array, I have some ideas about features that would be useful for a general frame. Specifically, being able to pass in a footprint boolean array to define the neighborhood is really useful in many contexts. Also useful is the ability to query the offset of the current pixel from the center of the neighborhood. (These two features, plus very efficient handling of boundary conditions by breaking the image into regions where the conditions are and are not required, make the image iterators in ITK really nice to use.) Zach On Nov 23, 2009, at 9:12 AM, Nadav Horesh wrote: Thank you, this is a start. I seems that there are more issues to resolve. I am trying to make a general frame that would enable one to write filters using this iterator. Nadav -Original Message- From: numpy-discussion-boun...@scipy.org on behalf of David Warde- Farley Sent: Mon 23-Nov-09 03:21 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] neighborhood iterator On 22-Nov-09, at 12:50 PM, Nadav Horesh wrote: I wonder if the neighbourhood iterator can be used as a more efficient replacement for ndimage.generic_filter. Is there a way to use it from cython? Yes, using the NumPy C API, called like any other C function is from Cython. Something like: ## import numpy as np cimport numpy as np cdef extern from numpy/arrayobject.h: object PyArray_NeighborhoodIterNew(object iter, np.npy_intp bounds, int mode, object, np.ndarray fill_value) int PyArrayNeighborhoodIter_Next(object iter) int PyArrayNeighborhoodIter_Reset(object iter) ## should do the trick. Note that you'll need to call np.import_array() before using any of these functions to initialize the C API, I think. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion winmail.dat___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] convert strides/shape/offset into nd index?
Hi all, I'm curious as to what the most straightforward way is to convert an offset into a memory buffer representing an arbitrarily strided array into the nd index into that array. (Let's assume for simplicity that each element is one byte...) Does sorting the strides from largest to smallest and then using integer division and mod (in the obvious way) always work? (I can't seem to find a counterexample, but I am not used to thinking too deeply about bizarrely-strided configurations.) Is there a method that doesn't involve sorting? Thanks, Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] convert strides/shape/offset into nd index?
Not to be a downer, but this problem is technically NP-complete. The so-called knapsack problem is to find a subset of a collection of numbers that adds up to the specified number, and it is NP-complete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2). Ha ha, right -- that is the knapsack problem isn't it. Oh well... I'll just require fortran- or C-style strided arrays, for which case it is easy to unravel offsets into indices. Thanks everyone! Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] u in [u+1]
I'm having some trouble here. I have a list of numpy arrays. I want to know if an array 'u' is in the list. Try: any(numpy.all(u == l) for l in array_list) standard caveats about float comparisons apply; perhaps any(numpy.allclose(u, l) for l in array_list) is more appropriate in certain circumstances. Can of course replace the first 'any' with 'all' or 'sum' to get different kinds of information, but using 'any' is equivalent to the 'in' query that you wanted. Why the 'in' operator below fails is that behind the scenes, 'u not in [u+1]' causes Python to iterate through the list testing each element for equality with u. Except that as the error states, arrays don't support testing for equality because such tests are ambiguous. (cf. many threads about this.) Zach On Feb 5, 2010, at 6:47 AM, Neal Becker wrote: I'm having some trouble here. I have a list of numpy arrays. I want to know if an array 'u' is in the list. As an example, u = np.arange(10) : u not in [u+1] --- ValueErrorTraceback (most recent call last) /home/nbecker/raysat/test/ipython console in module() ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() What would be the way to do this? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Loading bit strings
Is there a good way in NumPy to convert from a bit string to a boolean array? For example, if I have a 2-byte string s='\xfd\x32', I want to get a 16-length boolean array out of it. numpy.unpackbits(numpy.fromstring('\xfd\x32', dtype=numpy.uint8)) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.array(arr.flat) mutates arr if arr.flags.fortran: bug?
Hello, I assume it is a bug that calling numpy.array() on a flatiter of a fortran-strided array that owns its own data causes that array to be rearranged somehow? Not sure what happens with a fancier-strided array that also owns its own data (because I'm not sure how to create one of those in python). This is from the latest svn version (2.0.0.dev8302) but was also present in a previous version too. Zach In [9]: a = numpy.array([[1,2],[3,4]]).copy('F') In [10]: a Out[10]: array([[1, 2], [3, 4]]) In [11]: list(a.flat) Out[11]: [1, 2, 3, 4] In [12]: a # no problem Out[12]: array([[1, 2], [3, 4]]) In [13]: numpy.array(a.flat) Out[13]: array([1, 2, 3, 4]) In [14]: a # this ain't right! Out[14]: array([[1, 3], [2, 4]]) In [15]: a = numpy.array([[1,2],[3,4]]).copy('C') In [16]: numpy.array(a.flat) Out[16]: array([1, 2, 3, 4]) In [17]: a Out[17]: array([[1, 2], [3, 4]]) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.array(arr.flat) mutates arr if arr.flags.fortran: bug?
You should open a ticket for this. http://projects.scipy.org/numpy/ticket/1439 On Mar 26, 2010, at 11:26 AM, Charles R Harris wrote: On Wed, Mar 24, 2010 at 1:13 PM, Zachary Pincus zachary.pin...@yale.edu wrote: Hello, I assume it is a bug that calling numpy.array() on a flatiter of a fortran-strided array that owns its own data causes that array to be rearranged somehow? Not sure what happens with a fancier-strided array that also owns its own data (because I'm not sure how to create one of those in python). This is from the latest svn version (2.0.0.dev8302) but was also present in a previous version too. Zach In [9]: a = numpy.array([[1,2],[3,4]]).copy('F') In [10]: a Out[10]: array([[1, 2], [3, 4]]) In [11]: list(a.flat) Out[11]: [1, 2, 3, 4] In [12]: a # no problem Out[12]: array([[1, 2], [3, 4]]) In [13]: numpy.array(a.flat) Out[13]: array([1, 2, 3, 4]) In [14]: a # this ain't right! Out[14]: array([[1, 3], [2, 4]]) In [15]: a = numpy.array([[1,2],[3,4]]).copy('C') In [16]: numpy.array(a.flat) Out[16]: array([1, 2, 3, 4]) In [17]: a Out[17]: array([[1, 2], [3, 4]]) You should open a ticket for this. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Replacing NANs
In an array I want to replace all NANs with some number say 100, I found a method nan_to_num but it only replaces with zero. Any solution for this? Indexing with a mask is one approach here: a[numpy.isnan(a)] = 100 also cf. numpy.isfinite as well in case you want the same with infs. Zach On Mar 30, 2010, at 2:59 PM, Vishal Rana wrote: Hi, In an array I want to replace all NANs with some number say 100, I found a method nan_to_num but it only replaces with zero. Any solution for this? Thanks Vishal ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to combine a pair of 1D arrays?
On Wed, Apr 14, 2010 at 10:25, Peter Shinners p...@shinners.org wrote: Is there a way to combine two 1D arrays with the same size into a 2D array? It seems like the internal pointers and strides could be combined. My primary goal is to not make any copies of the data. There is absolutely no way to get around that, I am afraid. You could start with the 2d array... instead of: a = np.array((1,2,3,4)) b = np.array((11,12,13,14)) c = np.magical_fuse(a, b) # what goes here? perhaps something like: c = np.empty((2,4), int) a = c[0] b = c[1] Now a and b are views on c. So if you (say) pass them to, say, some (strides-aware) C routine that fills them in element-by-element, c will get filled in at the same time. Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2D binning
Hi Can anyone think of a clever (non-lopping) solution to the following? A have a list of latitudes, a list of longitudes, and list of data values. All lists are the same length. I want to compute an average of data values for each lat/lon pair. e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then data[1001] = (data[1001] + data[2001])/2 Looping is going to take wa to long. As a start, are the equal lat/lon pairs exactly equal (i.e. either not floating-point, or floats that will always compare equal, that is, the floating-point bit-patterns will be guaranteed to be identical) or approximately equal to float tolerance? If you're in the approx-equal case, then look at the KD-tree in scipy for doing near-neighbors queries. If you're in the exact-equal case, you could consider hashing the lat/ lon pairs or something. At least then the looping is O(N) and not O(N^2): import collections grouped = collections.defaultdict(list) for lt, ln, da in zip(lat, lon, data): grouped[(lt, ln)].append(da) averaged = dict((ltln, numpy.mean(da)) for ltln, da in grouped.items()) Is that fast enough? Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2D binning
I guess it's as fast as I'm going to get. I don't really see any other way. BTW, the lat/lons are integers) You could (in c or cython) try a brain-dead hashtable with no collision detection: for lat, long, data in dataset: bin = (lat ^ long) % num_bins hashtable[bin] = update_incremental_mean(hashtable[bin], data) you'll of course want to do some experiments to see if your data are sufficiently sparse and/or you can afford a large enough hashtable array that you won't get spurious hash collisions. Adding error- checking to ensure that there are no collisions would be pretty trivial (just keep a table of the lat/long for each hash value, which you'll need anyway, and check that different lat/long pairs don't get assigned the same bin). Zach -Mathew On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincus zachary.pin...@yale.edu wrote: Hi Can anyone think of a clever (non-lopping) solution to the following? A have a list of latitudes, a list of longitudes, and list of data values. All lists are the same length. I want to compute an average of data values for each lat/lon pair. e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then data[1001] = (data[1001] + data[2001])/2 Looping is going to take wa to long. As a start, are the equal lat/lon pairs exactly equal (i.e. either not floating-point, or floats that will always compare equal, that is, the floating-point bit-patterns will be guaranteed to be identical) or approximately equal to float tolerance? If you're in the approx-equal case, then look at the KD-tree in scipy for doing near-neighbors queries. If you're in the exact-equal case, you could consider hashing the lat/ lon pairs or something. At least then the looping is O(N) and not O(N^2): import collections grouped = collections.defaultdict(list) for lt, ln, da in zip(lat, lon, data): grouped[(lt, ln)].append(da) averaged = dict((ltln, numpy.mean(da)) for ltln, da in grouped.items()) Is that fast enough? Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion