Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-22 Thread David Cournapeau
On Wed, Apr 22, 2009 at 2:24 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Mon, Apr 20, 2009 at 11:06 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:


 On Mon, Apr 20, 2009 at 10:13 PM, David Cournapeau
 da...@ar.media.kyoto-u.ac.jp wrote:

 Charles R Harris wrote:

 
  Here is a link to the start of the old discussion
 
  http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization.
  You took part in it also.

 Thanks, I remembered we had the discussion, but could not find it. The
 different is that I am much more familiar with the technical details and
 numpy codebase now :) I know how to control exported symbols on most
 platform which matter (I can't test for AIX or HP-UX unfortunately - but
 I am perfectly fine with ignoring namespace pollution on those anyway),
 and I would guess that the only platforms which do not support symbol
 visibility in one way or the other do not support shared library anyway
 (some CRAY stuff, for example).

 Concerning the file size, I don't think anyone would disagree that they
 are too big, but we don't need to go the java-way of one
 file/class-function either. One first split which I personally like is
 API/implementation. For example, for multiarray.c, we would only keep
 the public PyArray_* functions, and put everything else in another file.
 The other very big file is arrayobject.c, and this one is already mostly
 organized in independent parts (buffer protocol, number protocol, etc...)

 Another thing I would like to do it to make the global C API array
 pointer a 'true' global variable instead of a static one. It took me a
 while when I was working on the hashing protocol for dtype to understand
 why it was crashing (the array pointer being static, every file has its
 own copy, so it was never initialized in the hashdescr.c file). I think
 a true global variable, hidden through a symbol map, is easier to
 understand and more reliable.

 I made an experiment along those lines a couple of years ago. There were
 compilation problems because the needed include files weren't available. No
 doubt that could be fixed in the build, but at some point I would like to
 have real include files, not the generated variety. Generated include files
 are kind of bogus IMHO, as they don't define an interface but rather reflect
 whatever the function definition happens to be. So as any part of a split I
 would also suggest writing the associated include files. That would also
 make separate compilation possible, which would make it easier to do test
 compilations while doing development.

 The list of visible symbols has grown ;)

Yes. Except PyArray_DescrHash which is a mistake on my own, for all
the npy_* symbols, there is nothing we can do ATM because they are
from a pure C (static) library. That's one of the rationale in the
original email :)

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance issue (again)

2009-04-22 Thread Mathew Yeates
well, this isn't a perfect solution. polyfit  is better because it 
determines rank based on condition values. Finds the eigenvalues ... 
etc. But, unless it can vectorized without Python looping, it's too slow 
for me to use

Mathew

josef.p...@gmail.com wrote:



 If you remove the mean from x and y (along axis = 1) then can't you
 just do something like

 (x*y).sum(1) / (x*x).sum(axis=1)



 I think that's what I said 8 days ago.

 Josef


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance issue (again)

2009-04-22 Thread Keith Goodman
On Wed, Apr 22, 2009 at 8:48 AM, Mathew Yeates myea...@jpl.nasa.gov wrote:
 well, this isn't a perfect solution. polyfit  is better because it
 determines rank based on condition values. Finds the eigenvalues ...
 etc. But, unless it can vectorized without Python looping, it's too slow
 for me to use

I liked your sheer genius comment better.

Yeah, maybe use polyfit only for those cases where abs((x*y).sum(1) /
(x*x).sum(1)) is large? And ignore the slope calculation where
(x*x).sum(1) is small.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance issue (again)

2009-04-22 Thread josef . pktd
On Wed, Apr 22, 2009 at 11:48 AM, Mathew Yeates myea...@jpl.nasa.govwrote:

 well, this isn't a perfect solution. polyfit  is better because it
 determines rank based on condition values. Finds the eigenvalues ...
 etc. But, unless it can vectorized without Python looping, it's too slow
 for me to use





rank is a property of the design matrix.

In your case the design matrix is a vector of ones and the x vector. So the
only case, where you run into problems, is when your three observation of x
are the same, then dot(x.T*x) is zero, you can only have one constant. If
there is no slope in x then you don't have three different observations to
estimate a slope coefficient.

Just special case (x*x).sum(1)1e-8   or something, in this case
yestimate = y.mean

eigen vectors with one regressor are pretty useless or trivial, same with
rank.

For higher order polynomials this will become more important, but not for a
linear polynomial.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] buggy fortran array reshape ?

2009-04-22 Thread Fabrice Pardo
After reshaping a Fortran array, the new array doesn't share data
with original array.
I will be glad if someone can explain the strange behaviour of this
program. Is it a numpy bug ? 

#v
def check_bug(order):
a = numpy.ndarray((3,2),order=order,dtype=int)
a[0,0] = 1
b = a.reshape((6,))
a[0,0] = 2
print b[0]

check_bug('C') # 2, good
check_bug('F') # 1, wrong ???
print(numpy.version.version) # 1.2.1
#^

-- 
FP

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] buggy fortran array reshape ?

2009-04-22 Thread josef . pktd
On Wed, Apr 22, 2009 at 1:13 PM, Fabrice Pardo
fabrice.pa...@lpn.cnrs.fr wrote:

 After reshaping a Fortran array, the new array doesn't share data
 with original array.
 I will be glad if someone can explain the strange behaviour of this
 program. Is it a numpy bug ?

 #v
 def check_bug(order):
    a = numpy.ndarray((3,2),order=order,dtype=int)
    a[0,0] = 1
    b = a.reshape((6,))
    a[0,0] = 2
    print b[0]

 check_bug('C') # 2, good
 check_bug('F') # 1, wrong ???
 print(numpy.version.version) # 1.2.1
 #^


from help:

Returns:
reshaped_array : ndarray
This will be a new view object if possible; otherwise, it will be a copy.


 if possible and otherwise are not very precise
I guess reshape tries to return an array that is contiguous, if you do
a reshape in the order of the array, i.e.

change your line to
b = a.reshape((6,), order=order)

then the reshaped array is just a view.

I still find view vs copy very confusing.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] buggy fortran array reshape ?

2009-04-22 Thread Fabrice Pardo
josef.p...@gmail.com wrote:

 from help:
 
 Returns:
 reshaped_array : ndarray
 This will be a new view object if possible; otherwise, it will be a copy.
 

  if possible and otherwise are not very precise
 I guess reshape tries to return an array that is contiguous, if you do
 a reshape in the order of the array, i.e.

 change your line to
 b = a.reshape((6,), order=order)

 then the reshaped array is just a view.

 I still find view vs copy very confusing.
   
You are right, the documentation doesn't lies.

The simplistic current version choice is a copy.
In my example, b is 1D, contiguous, with no 'C' or 'F' difference.
Then that's possible to do an other choice, making a view.

The reshape function is unpredictable and its behaviour is not documented.

It cannot be used safely.

Another remark against reshape:
OWNDATA flag is False, even if b is a copy !

-- 
FP





___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] buggy fortran array reshape ?

2009-04-22 Thread Pauli Virtanen
Wed, 22 Apr 2009 20:18:14 +0200, Fabrice Pardo wrote:
[clip]
 The reshape function is unpredictable and its behaviour is not
 documented.
 
 It cannot be used safely.

It is documented and it can be used safely. The manual, however, has 
currently no section on views that would explain these issues in depth.

If you want to ensure no-copy, assign to shape:

a.shape = (6,)

 Another remark against reshape:
 OWNDATA flag is False, even if b is a copy !

Apparently, reshape first copies to a contiguous array and then reshapes. 
This could be simplified.

-- 
Pauli Virtanen

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] buggy fortran array reshape ?

2009-04-22 Thread Gael Varoquaux
On Wed, Apr 22, 2009 at 08:18:14PM +0200, Fabrice Pardo wrote:
 It cannot be used safely.

use:

b = a.view()
b.shape = (2, 3)

This will return a view, or raise an exception.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] buggy fortran array reshape ?

2009-04-22 Thread Pauli Virtanen
Wed, 22 Apr 2009 13:51:45 -0400, josef.pktd wrote:
[clip]
 change your line to
 b = a.reshape((6,), order=order)
 
 then the reshaped array is just a view.

This has the effect that the unravelling is done in Fortran order (when 
order='F') rather than C-order, which can be confusing at times.

-- 
Pauli Virtanen

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] buggy fortran array reshape ?

2009-04-22 Thread josef . pktd
On Wed, Apr 22, 2009 at 2:37 PM, Pauli Virtanen p...@iki.fi wrote:
 Wed, 22 Apr 2009 20:18:14 +0200, Fabrice Pardo wrote:
 [clip]
 The reshape function is unpredictable and its behaviour is not
 documented.

 It cannot be used safely.

 It is documented and it can be used safely. The manual, however, has
 currently no section on views that would explain these issues in depth.

 If you want to ensure no-copy, assign to shape:

 a.shape = (6,)

 Another remark against reshape:
 OWNDATA flag is False, even if b is a copy !

 Apparently, reshape first copies to a contiguous array and then reshapes.
 This could be simplified.


Is the difference between assigning to the attribute and using the
method call explained
somewhere?

I had recently the puzzling case, where I wanted to create a
structured array, and tried

x.view(dtype=...)
x.astype(..)
x.dtype = ...

I don't remember exactly, but view and astype didn't create the
structured array that I wanted,
while the assignment x.dtype = ...  worked.

This has the effect that the unravelling is done in Fortran order (when
order='F') rather than C-order, which can be confusing at times.

If he intentionally starts out in Fortran order, he might have a
reason to stick to it.
In stats, we are still focused by default on axis=0, and I usually
think in terms of columns of random variables. But using a lot of
transpose and newaxis, I never know what the memory layout is unless I
check the flags, and I'm starting to realize that this requires more
attention with numpy.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] buggy fortran array reshape ?

2009-04-22 Thread Pauli Virtanen
Wed, 22 Apr 2009 15:12:20 -0400, josef.pktd wrote:
 On Wed, Apr 22, 2009 at 2:37 PM, Pauli Virtanen p...@iki.fi wrote:
[clip]
 If you want to ensure no-copy, assign to shape:

 a.shape = (6,)
[clip]
 Is the difference between assigning to the attribute and using the
 method call explained
 somewhere?

No.

But it certainly should be. The first place to fix is the attribute 
docstring which doesn't even mention it can be assigned to:

http://docs.scipy.org/numpy/docs/numpy.ndarray.shape/

Similar review should be done to all attribute docstrings. Second, a 
separate more detailed discussion about memory layouts, views, etc. 
should be written.

-- 
Pauli Virtanen

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Masking an array with another array

2009-04-22 Thread Gökhan SEVER
Hello,

Could you please give me some hints about how to mask an array using another
arrays like in the following example.

In [14]: a = arange(5)

In [15]: a
Out[15]: array([0, 1, 2, 3, 4])

and my secondary array is b

In [16]: b = array([2,3])

What I want to do is to mask a with b values and get an array of:

array([False, False, True, True,  False], dtype=bool)

That is just an manually created array. I still don't know how to do this
programmatically in Pythonic fashion or numpy's masked array functions.

Thank you.


Gökhan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread Pierre GM

On Apr 22, 2009, at 5:21 PM, Gökhan SEVER wrote:

 Hello,

 Could you please give me some hints about how to mask an array using  
 another arrays like in the following example.

What about that ?
numpy.logical_or.reduce([a==i for i in b])


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread Gökhan SEVER
Yes Pierre,

I like this one line of elegances in Python a lot. I was thinking that the
answer lies in somewhere in masked array operations, but I proved wrong.

Thanks for your input on this small riddle.

Here is another way of doing that. (That's what I thought of initially and
what Matthias Michler responded on matplotlib mailing list.)

mask = zeros(len(a), dtype=bool)
for index in xrange(len(a)):# run through array a
   if a[index] in b:
   mask[index] = True


Ending with a quote about Pythonicness :)

...that something is Pythonic when it has a sense of quality,
simplicity, clarity and elegance about it.


Gökhan


On Wed, Apr 22, 2009 at 4:49 PM, Pierre GM pgmdevl...@gmail.com wrote:


 On Apr 22, 2009, at 5:21 PM, Gökhan SEVER wrote:

  Hello,
 
  Could you please give me some hints about how to mask an array using
  another arrays like in the following example.

 What about that ?
 numpy.logical_or.reduce([a==i for i in b])


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread josef . pktd
On Wed, Apr 22, 2009 at 8:18 PM, Gökhan SEVER gokhanse...@gmail.com wrote:
 Yes Pierre,

 I like this one line of elegances in Python a lot. I was thinking that the
 answer lies in somewhere in masked array operations, but I proved wrong.

 Thanks for your input on this small riddle.

 Here is another way of doing that. (That's what I thought of initially and
 what Matthias Michler responded on matplotlib mailing list.)

 mask = zeros(len(a), dtype=bool)
 for index in xrange(len(a)):        # run through array a
    if a[index] in b:
        mask[index] = True


 Ending with a quote about Pythonicness :)

 ...that something is Pythonic when it has a sense of quality, simplicity,
 clarity and elegance about it.


 Gökhan


 On Wed, Apr 22, 2009 at 4:49 PM, Pierre GM pgmdevl...@gmail.com wrote:

 On Apr 22, 2009, at 5:21 PM, Gökhan SEVER wrote:

  Hello,
 
  Could you please give me some hints about how to mask an array using
  another arrays like in the following example.

 What about that ?
 numpy.logical_or.reduce([a==i for i in b])



I prefer broad casting to list comprehension in numpy:

 a = np.arange(5)
 b = np.array([2,3])

 (a[:,np.newaxis]==b).any(1)
array([False, False,  True,  True, False], dtype=bool)

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread Pierre GM

On Apr 22, 2009, at 9:03 PM, josef.p...@gmail.com wrote:

 I prefer broad casting to list comprehension in numpy:

Pretty neat! I still dont have the broadcasting reflex. Now, any idea  
which one is more efficient in terms of speed? in terms of temporaries?

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread josef . pktd
On Wed, Apr 22, 2009 at 10:45 PM, Pierre GM pgmdevl...@gmail.com wrote:

 On Apr 22, 2009, at 9:03 PM, josef.p...@gmail.com wrote:

 I prefer broad casting to list comprehension in numpy:

 Pretty neat! I still dont have the broadcasting reflex. Now, any idea
 which one is more efficient in terms of speed? in terms of temporaries?


I used similar broadcasting for working with categorical data series
and for creating dummy variables for regression. So I played already
for some time with this.

In this case, I would except that the memory consumption is
essentially the same, you have a list of arrays and I have a 2d array,
unless numpy needs an additional conversion to array in
np.logical_or.reduce, which seems plausible but I don't know.

The main point that Sturla convinced me in the discussion on
kendalltau is that if b is large, 500 or 1000, then building the full
intermediate boolean array is killing both memory and speed
performance, compared to a python for loop, and very bad compared to a
cython loop.

In this example my version is at least twice as fast for len(b) = 4,
your version does not scale very well at all to larger b, your takes 7
times as long as mine for len(b) = 400, which, I guess would mean that
you have an extra copying step

I added the for loop and it is always the fastest, even more for short
b. I hope it's correct, I never used a inplace logical operator.

Josef

from time import time as time_

a = np.array(range(10)*1000)
blen = 10#100
b = np.array([2,3,5,8]*blen)


print shape b, b.shape
t = time_()
for _ in range(100):
(a[:,np.newaxis]==b).any(1)
print time_() - t

t = time_()
for _ in range(100):
np.logical_or.reduce([a==i for i in b])
print time_() - t


t = time_()
for _ in range(100):
z = a == b[0]
for ii in range(1,len(b)):
z |= (a == b[ii])
print time_() - t

#shape b (80,)
#0.11133514
#0.26632425

#shape b (80,)
#0.827999830246
#5.2650001049

#shape b (400,)
#4.6086758
#28.437362

#shape b (400,)
#3.8913242
#27.5

#shape b (400,)
#3.89099979401
#27.328962
#3.51599979401   #for loop

#shape b (40,)
#0.45396185
#2.5460381
#0.35895096   #for loop

#shape b (4,)
#0.10867575
#0.2826485
#0.0309998989105   #for loop
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread Gael Varoquaux
On Wed, Apr 22, 2009 at 04:21:05PM -0500, Gökhan SEVER wrote:
Could you please give me some hints about how to mask an array using
another arrays like in the following example.

In [14]: a = arange(5)

In [15]: a
Out[15]: array([0, 1, 2, 3, 4])

and my secondary array is b

In [16]: b = array([2,3])

What I want to do is to mask a with b values and get an array of:

array([False, False, True, True,  False], dtype=bool)

This is an operation on 'sets': you are testing if members of a are 'in'
b. Generally, set operations on arrays can be found in
numpy.lib.arraysetops. I believe what you are interested in is
setmember1d.

HTH,

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-22 Thread Gökhan SEVER
Ahaa,,

Thanks Gaël. That method is more elegance than the previous inputs, and the
simplest of all.

Although one line of import this says:

There should be one-- and preferably only one --obvious way to do it.

I always find many different ways of implementing ideas in Python world.

Gökhan


On Thu, Apr 23, 2009 at 12:16 AM, Gael Varoquaux 
gael.varoqu...@normalesup.org wrote:

 On Wed, Apr 22, 2009 at 04:21:05PM -0500, Gökhan SEVER wrote:
 Could you please give me some hints about how to mask an array using
 another arrays like in the following example.

 In [14]: a = arange(5)

 In [15]: a
 Out[15]: array([0, 1, 2, 3, 4])

 and my secondary array is b

 In [16]: b = array([2,3])

 What I want to do is to mask a with b values and get an array of:

 array([False, False, True, True,  False], dtype=bool)

 This is an operation on 'sets': you are testing if members of a are 'in'
 b. Generally, set operations on arrays can be found in
 numpy.lib.arraysetops. I believe what you are interested in is
 setmember1d.

 HTH,

 Gaël
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion