Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility
On Wed, Apr 22, 2009 at 2:24 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Apr 20, 2009 at 11:06 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Apr 20, 2009 at 10:13 PM, David Cournapeau da...@ar.media.kyoto-u.ac.jp wrote: Charles R Harris wrote: Here is a link to the start of the old discussion http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization. You took part in it also. Thanks, I remembered we had the discussion, but could not find it. The different is that I am much more familiar with the technical details and numpy codebase now :) I know how to control exported symbols on most platform which matter (I can't test for AIX or HP-UX unfortunately - but I am perfectly fine with ignoring namespace pollution on those anyway), and I would guess that the only platforms which do not support symbol visibility in one way or the other do not support shared library anyway (some CRAY stuff, for example). Concerning the file size, I don't think anyone would disagree that they are too big, but we don't need to go the java-way of one file/class-function either. One first split which I personally like is API/implementation. For example, for multiarray.c, we would only keep the public PyArray_* functions, and put everything else in another file. The other very big file is arrayobject.c, and this one is already mostly organized in independent parts (buffer protocol, number protocol, etc...) Another thing I would like to do it to make the global C API array pointer a 'true' global variable instead of a static one. It took me a while when I was working on the hashing protocol for dtype to understand why it was crashing (the array pointer being static, every file has its own copy, so it was never initialized in the hashdescr.c file). I think a true global variable, hidden through a symbol map, is easier to understand and more reliable. I made an experiment along those lines a couple of years ago. There were compilation problems because the needed include files weren't available. No doubt that could be fixed in the build, but at some point I would like to have real include files, not the generated variety. Generated include files are kind of bogus IMHO, as they don't define an interface but rather reflect whatever the function definition happens to be. So as any part of a split I would also suggest writing the associated include files. That would also make separate compilation possible, which would make it easier to do test compilations while doing development. The list of visible symbols has grown ;) Yes. Except PyArray_DescrHash which is a mistake on my own, for all the npy_* symbols, there is nothing we can do ATM because they are from a pure C (static) library. That's one of the rationale in the original email :) David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance issue (again)
well, this isn't a perfect solution. polyfit is better because it determines rank based on condition values. Finds the eigenvalues ... etc. But, unless it can vectorized without Python looping, it's too slow for me to use Mathew josef.p...@gmail.com wrote: If you remove the mean from x and y (along axis = 1) then can't you just do something like (x*y).sum(1) / (x*x).sum(axis=1) I think that's what I said 8 days ago. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance issue (again)
On Wed, Apr 22, 2009 at 8:48 AM, Mathew Yeates myea...@jpl.nasa.gov wrote: well, this isn't a perfect solution. polyfit is better because it determines rank based on condition values. Finds the eigenvalues ... etc. But, unless it can vectorized without Python looping, it's too slow for me to use I liked your sheer genius comment better. Yeah, maybe use polyfit only for those cases where abs((x*y).sum(1) / (x*x).sum(1)) is large? And ignore the slope calculation where (x*x).sum(1) is small. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance issue (again)
On Wed, Apr 22, 2009 at 11:48 AM, Mathew Yeates myea...@jpl.nasa.govwrote: well, this isn't a perfect solution. polyfit is better because it determines rank based on condition values. Finds the eigenvalues ... etc. But, unless it can vectorized without Python looping, it's too slow for me to use rank is a property of the design matrix. In your case the design matrix is a vector of ones and the x vector. So the only case, where you run into problems, is when your three observation of x are the same, then dot(x.T*x) is zero, you can only have one constant. If there is no slope in x then you don't have three different observations to estimate a slope coefficient. Just special case (x*x).sum(1)1e-8 or something, in this case yestimate = y.mean eigen vectors with one regressor are pretty useless or trivial, same with rank. For higher order polynomials this will become more important, but not for a linear polynomial. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] buggy fortran array reshape ?
After reshaping a Fortran array, the new array doesn't share data with original array. I will be glad if someone can explain the strange behaviour of this program. Is it a numpy bug ? #v def check_bug(order): a = numpy.ndarray((3,2),order=order,dtype=int) a[0,0] = 1 b = a.reshape((6,)) a[0,0] = 2 print b[0] check_bug('C') # 2, good check_bug('F') # 1, wrong ??? print(numpy.version.version) # 1.2.1 #^ -- FP ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] buggy fortran array reshape ?
On Wed, Apr 22, 2009 at 1:13 PM, Fabrice Pardo fabrice.pa...@lpn.cnrs.fr wrote: After reshaping a Fortran array, the new array doesn't share data with original array. I will be glad if someone can explain the strange behaviour of this program. Is it a numpy bug ? #v def check_bug(order): a = numpy.ndarray((3,2),order=order,dtype=int) a[0,0] = 1 b = a.reshape((6,)) a[0,0] = 2 print b[0] check_bug('C') # 2, good check_bug('F') # 1, wrong ??? print(numpy.version.version) # 1.2.1 #^ from help: Returns: reshaped_array : ndarray This will be a new view object if possible; otherwise, it will be a copy. if possible and otherwise are not very precise I guess reshape tries to return an array that is contiguous, if you do a reshape in the order of the array, i.e. change your line to b = a.reshape((6,), order=order) then the reshaped array is just a view. I still find view vs copy very confusing. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] buggy fortran array reshape ?
josef.p...@gmail.com wrote: from help: Returns: reshaped_array : ndarray This will be a new view object if possible; otherwise, it will be a copy. if possible and otherwise are not very precise I guess reshape tries to return an array that is contiguous, if you do a reshape in the order of the array, i.e. change your line to b = a.reshape((6,), order=order) then the reshaped array is just a view. I still find view vs copy very confusing. You are right, the documentation doesn't lies. The simplistic current version choice is a copy. In my example, b is 1D, contiguous, with no 'C' or 'F' difference. Then that's possible to do an other choice, making a view. The reshape function is unpredictable and its behaviour is not documented. It cannot be used safely. Another remark against reshape: OWNDATA flag is False, even if b is a copy ! -- FP ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] buggy fortran array reshape ?
Wed, 22 Apr 2009 20:18:14 +0200, Fabrice Pardo wrote: [clip] The reshape function is unpredictable and its behaviour is not documented. It cannot be used safely. It is documented and it can be used safely. The manual, however, has currently no section on views that would explain these issues in depth. If you want to ensure no-copy, assign to shape: a.shape = (6,) Another remark against reshape: OWNDATA flag is False, even if b is a copy ! Apparently, reshape first copies to a contiguous array and then reshapes. This could be simplified. -- Pauli Virtanen ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] buggy fortran array reshape ?
On Wed, Apr 22, 2009 at 08:18:14PM +0200, Fabrice Pardo wrote: It cannot be used safely. use: b = a.view() b.shape = (2, 3) This will return a view, or raise an exception. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] buggy fortran array reshape ?
Wed, 22 Apr 2009 13:51:45 -0400, josef.pktd wrote: [clip] change your line to b = a.reshape((6,), order=order) then the reshaped array is just a view. This has the effect that the unravelling is done in Fortran order (when order='F') rather than C-order, which can be confusing at times. -- Pauli Virtanen ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] buggy fortran array reshape ?
On Wed, Apr 22, 2009 at 2:37 PM, Pauli Virtanen p...@iki.fi wrote: Wed, 22 Apr 2009 20:18:14 +0200, Fabrice Pardo wrote: [clip] The reshape function is unpredictable and its behaviour is not documented. It cannot be used safely. It is documented and it can be used safely. The manual, however, has currently no section on views that would explain these issues in depth. If you want to ensure no-copy, assign to shape: a.shape = (6,) Another remark against reshape: OWNDATA flag is False, even if b is a copy ! Apparently, reshape first copies to a contiguous array and then reshapes. This could be simplified. Is the difference between assigning to the attribute and using the method call explained somewhere? I had recently the puzzling case, where I wanted to create a structured array, and tried x.view(dtype=...) x.astype(..) x.dtype = ... I don't remember exactly, but view and astype didn't create the structured array that I wanted, while the assignment x.dtype = ... worked. This has the effect that the unravelling is done in Fortran order (when order='F') rather than C-order, which can be confusing at times. If he intentionally starts out in Fortran order, he might have a reason to stick to it. In stats, we are still focused by default on axis=0, and I usually think in terms of columns of random variables. But using a lot of transpose and newaxis, I never know what the memory layout is unless I check the flags, and I'm starting to realize that this requires more attention with numpy. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] buggy fortran array reshape ?
Wed, 22 Apr 2009 15:12:20 -0400, josef.pktd wrote: On Wed, Apr 22, 2009 at 2:37 PM, Pauli Virtanen p...@iki.fi wrote: [clip] If you want to ensure no-copy, assign to shape: a.shape = (6,) [clip] Is the difference between assigning to the attribute and using the method call explained somewhere? No. But it certainly should be. The first place to fix is the attribute docstring which doesn't even mention it can be assigned to: http://docs.scipy.org/numpy/docs/numpy.ndarray.shape/ Similar review should be done to all attribute docstrings. Second, a separate more detailed discussion about memory layouts, views, etc. should be written. -- Pauli Virtanen ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Masking an array with another array
Hello, Could you please give me some hints about how to mask an array using another arrays like in the following example. In [14]: a = arange(5) In [15]: a Out[15]: array([0, 1, 2, 3, 4]) and my secondary array is b In [16]: b = array([2,3]) What I want to do is to mask a with b values and get an array of: array([False, False, True, True, False], dtype=bool) That is just an manually created array. I still don't know how to do this programmatically in Pythonic fashion or numpy's masked array functions. Thank you. Gökhan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masking an array with another array
On Apr 22, 2009, at 5:21 PM, Gökhan SEVER wrote: Hello, Could you please give me some hints about how to mask an array using another arrays like in the following example. What about that ? numpy.logical_or.reduce([a==i for i in b]) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masking an array with another array
Yes Pierre, I like this one line of elegances in Python a lot. I was thinking that the answer lies in somewhere in masked array operations, but I proved wrong. Thanks for your input on this small riddle. Here is another way of doing that. (That's what I thought of initially and what Matthias Michler responded on matplotlib mailing list.) mask = zeros(len(a), dtype=bool) for index in xrange(len(a)):# run through array a if a[index] in b: mask[index] = True Ending with a quote about Pythonicness :) ...that something is Pythonic when it has a sense of quality, simplicity, clarity and elegance about it. Gökhan On Wed, Apr 22, 2009 at 4:49 PM, Pierre GM pgmdevl...@gmail.com wrote: On Apr 22, 2009, at 5:21 PM, Gökhan SEVER wrote: Hello, Could you please give me some hints about how to mask an array using another arrays like in the following example. What about that ? numpy.logical_or.reduce([a==i for i in b]) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masking an array with another array
On Wed, Apr 22, 2009 at 8:18 PM, Gökhan SEVER gokhanse...@gmail.com wrote: Yes Pierre, I like this one line of elegances in Python a lot. I was thinking that the answer lies in somewhere in masked array operations, but I proved wrong. Thanks for your input on this small riddle. Here is another way of doing that. (That's what I thought of initially and what Matthias Michler responded on matplotlib mailing list.) mask = zeros(len(a), dtype=bool) for index in xrange(len(a)): # run through array a if a[index] in b: mask[index] = True Ending with a quote about Pythonicness :) ...that something is Pythonic when it has a sense of quality, simplicity, clarity and elegance about it. Gökhan On Wed, Apr 22, 2009 at 4:49 PM, Pierre GM pgmdevl...@gmail.com wrote: On Apr 22, 2009, at 5:21 PM, Gökhan SEVER wrote: Hello, Could you please give me some hints about how to mask an array using another arrays like in the following example. What about that ? numpy.logical_or.reduce([a==i for i in b]) I prefer broad casting to list comprehension in numpy: a = np.arange(5) b = np.array([2,3]) (a[:,np.newaxis]==b).any(1) array([False, False, True, True, False], dtype=bool) Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masking an array with another array
On Apr 22, 2009, at 9:03 PM, josef.p...@gmail.com wrote: I prefer broad casting to list comprehension in numpy: Pretty neat! I still dont have the broadcasting reflex. Now, any idea which one is more efficient in terms of speed? in terms of temporaries? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masking an array with another array
On Wed, Apr 22, 2009 at 10:45 PM, Pierre GM pgmdevl...@gmail.com wrote: On Apr 22, 2009, at 9:03 PM, josef.p...@gmail.com wrote: I prefer broad casting to list comprehension in numpy: Pretty neat! I still dont have the broadcasting reflex. Now, any idea which one is more efficient in terms of speed? in terms of temporaries? I used similar broadcasting for working with categorical data series and for creating dummy variables for regression. So I played already for some time with this. In this case, I would except that the memory consumption is essentially the same, you have a list of arrays and I have a 2d array, unless numpy needs an additional conversion to array in np.logical_or.reduce, which seems plausible but I don't know. The main point that Sturla convinced me in the discussion on kendalltau is that if b is large, 500 or 1000, then building the full intermediate boolean array is killing both memory and speed performance, compared to a python for loop, and very bad compared to a cython loop. In this example my version is at least twice as fast for len(b) = 4, your version does not scale very well at all to larger b, your takes 7 times as long as mine for len(b) = 400, which, I guess would mean that you have an extra copying step I added the for loop and it is always the fastest, even more for short b. I hope it's correct, I never used a inplace logical operator. Josef from time import time as time_ a = np.array(range(10)*1000) blen = 10#100 b = np.array([2,3,5,8]*blen) print shape b, b.shape t = time_() for _ in range(100): (a[:,np.newaxis]==b).any(1) print time_() - t t = time_() for _ in range(100): np.logical_or.reduce([a==i for i in b]) print time_() - t t = time_() for _ in range(100): z = a == b[0] for ii in range(1,len(b)): z |= (a == b[ii]) print time_() - t #shape b (80,) #0.11133514 #0.26632425 #shape b (80,) #0.827999830246 #5.2650001049 #shape b (400,) #4.6086758 #28.437362 #shape b (400,) #3.8913242 #27.5 #shape b (400,) #3.89099979401 #27.328962 #3.51599979401 #for loop #shape b (40,) #0.45396185 #2.5460381 #0.35895096 #for loop #shape b (4,) #0.10867575 #0.2826485 #0.0309998989105 #for loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masking an array with another array
On Wed, Apr 22, 2009 at 04:21:05PM -0500, Gökhan SEVER wrote: Could you please give me some hints about how to mask an array using another arrays like in the following example. In [14]: a = arange(5) In [15]: a Out[15]: array([0, 1, 2, 3, 4]) and my secondary array is b In [16]: b = array([2,3]) What I want to do is to mask a with b values and get an array of: array([False, False, True, True, False], dtype=bool) This is an operation on 'sets': you are testing if members of a are 'in' b. Generally, set operations on arrays can be found in numpy.lib.arraysetops. I believe what you are interested in is setmember1d. HTH, Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masking an array with another array
Ahaa,, Thanks Gaël. That method is more elegance than the previous inputs, and the simplest of all. Although one line of import this says: There should be one-- and preferably only one --obvious way to do it. I always find many different ways of implementing ideas in Python world. Gökhan On Thu, Apr 23, 2009 at 12:16 AM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Wed, Apr 22, 2009 at 04:21:05PM -0500, Gökhan SEVER wrote: Could you please give me some hints about how to mask an array using another arrays like in the following example. In [14]: a = arange(5) In [15]: a Out[15]: array([0, 1, 2, 3, 4]) and my secondary array is b In [16]: b = array([2,3]) What I want to do is to mask a with b values and get an array of: array([False, False, True, True, False], dtype=bool) This is an operation on 'sets': you are testing if members of a are 'in' b. Generally, set operations on arrays can be found in numpy.lib.arraysetops. I believe what you are interested in is setmember1d. HTH, Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion