Re: [Numpy-discussion] f2py NotImplementedError

2016-07-26 Thread Aronne Merrelli
Hello Andy,

I'm not an expert on f2py, but I've used it in a few cases. I can
recreate a NotImplementedError by copying your commands. The problem
is that you are using -l (lower case L), which I think is supposed to
specify the library name, not a path. The NotImplementedError is not
really what the code should produce in this case, but I don't know
what is going on there.

I think you normally want to use the typical flags, like "-lfoo
-L/path/to/foo" or similar. In any case for this simple example, you
don't really need to use those flags. I can get your code to compile
with this command (I have to specify fcompiler here, otherwise it uses
ifort on my system):

$ gfortran -fPIC -c othermodule.f90 -o othermodule.o
$ f2py --fcompiler=gfortran -m mainmodule -c mainmodule.f90 othermodule.o

This compiles, but then the shared object has no functions in it when
you import it in python. I think that's because your main file is a
program, and I do not think f2py wraps fortran programs, rather only
subroutines/functions. If you change the mainmodule file slightly,
replacing the 'program mains' with:

module mains
  use moderator
contains
  subroutine foo
...
  end subroutine foo
end module mains

Compile it similarly, then it looks to work:

$ python -c "import mainmodule ; mainmodule.mains.foo()"
 here's your 1 st number:2.7182817459106445
 here's your 2 nd number:3.6945281028747559
 here's your 3 rd number:3.3475894927978516
 here's your 4 th number:2.2749228477478027
 here's your 5 th number:1.2367763519287109
 here's your 6 th number:   0.56031775474548340
 here's your 7 th number:   0.21758595108985901
 here's your 8 th number:7.3932491242885590E-002
 here's your 9 th number:2.2329926490783691E-002
 here's your10 th number:6.0699032619595528E-003

On Sun, Jul 24, 2016 at 2:54 PM, andy buzza  wrote:
> Hello all,
>
> I have been trying to compile a relatively simple pair of Fortran files, one
> referencing a subroutine from another file (mainmodule.f90 references
> othermodule.f90).  I have been able to compile them using a Fortran
> compiler, but receive a NotImplementedError when using f2py.
>
> Steps I use for f2py:
>
> $gfortran -shared -o othermodule.so othermodule.f90 -fPIC
>
> $f2py -c -l/path/othermodule -m mainmodule mainmodule.f90
>
> I am running this on linux and wasn't sure how to correct the error.
>
> othermodule.f90
>
>   module moderator
>
> implicit none
> integer*1 :: i
> integer*8 :: fact,len
> real*8,dimension(:),allocatable :: ex
>
> contains
>
>   subroutine submarine(ii,ff,exex)
>
> implicit none
> integer*1 :: ii
> integer*8 :: ff
> real*8 :: exex
>
> exex=exp(real(ii))/ff
>
>   end subroutine submarine
>
>   end module moderator
>
>
> mainmodule.f90
>
>   program mains
>
> use moderator
>
> implicit none
>
> len=10
> allocate(ex(len))
> fact=1
> do i=1,len
>fact=fact*i
>call submarine(i,fact,ex(i))
>if (i==1) then
>   print*,"here's your ",i,"st number: ",ex(i)
>elseif (i==2) then
>   print*,"here's your ",i,"nd number: ",ex(i)
>elseif (i==3) then
>   print*,"here's your ",i,"rd number: ",ex(i)
>else
>   print*,"here's your ",i,"th number: ",ex(i)
>endif
> enddo
> deallocate(ex)
>
>   end program
>
>
> Thanks for the help,
> Andy
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Equivalent to IDL's help function

2013-10-07 Thread Aronne Merrelli
There isn't anything quite the same. (I think what you are really asking
for is a way to print the essential info about one variable name, at least
that is how I would use the IDL help procedure). In IPython, I use the
whos magic to do something similar, although it just prints this info for
all variables or all variables of one class, rather than just one variable.
I do not think there is a way to do it for just one variable.

Here are some examples - you can see this works quite well but it will
become unwieldy if your interactive namespace becomes large:

In [1]: x = 1; y = 2; z = 3.3; d = {'one':1, 'two':2, 'three':3}

In [2]: whos
Variable   Type Data/Info
-
d  dict n=3
x  int  1
y  int  2
z  float3.3

In [3]: whos dict
Variable   TypeData/Info

d  dictn=3

In [4]: xa = np.arange(111); ya = np.ones((22,4))

In [5]: whos ndarray
Variable   Type   Data/Info
---
xa ndarray111: 111 elems, type `int64`, 888 bytes
ya ndarray22x4: 88 elems, type `float64`, 704 bytes




On Mon, Oct 7, 2013 at 12:15 PM, Siegfried Gonzi
sgo...@staffmail.ed.ac.ukwrote:

 Hi all

 What is the equivalent to IDL its help function, e.g.

 ==
 IDL a = make_array(23,23,)

 IDL help,a

 will result in:

 A   FLOAT = Array[23, 23]

 or

 IDL a = create_struct('idl',23)

 IDL help,a

 gives:

 A   STRUCT= - Anonymous Array[1]

 ==

 I have been looking for it ever since using numpy. It would make my life
 so much easier.


 Thanks, Siegfried


 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unique() function and avoiding Loop

2013-07-10 Thread Aronne Merrelli
On Fri, Jul 5, 2013 at 4:20 AM, Bakhtiyor Zokhidov 
bakhtiyor_zokhi...@mail.ru wrote:

 Hi everybody,

 I have a problem with sorting out the following function. What I expect is
 that I showed as an example below.

 Two problems are encountered to achieve the result:
 1) The function sometimes can't not sort as expected: I showed an example
 for that below.
 2) I could not do vectorization to avoid loop.


 OR, Is there another way to solve that problem??
 Thanks in advance


 Example:
 data = ['', 12, 12, 423, '1', 423, -32, 12, 721, 345]. Expected
 result:  [0, 12, 12, 423, 0, 423, -32, 12, 721, 345], here, '' and '1'
 are string type I need to replace them by zero


I don't understand your code example, but if your problem is fully
described as above (replace the strings '' or '1' with the integer 0), then
it would seem simplest to just do this with python built in functions
rather than using numpy. The numpy functions work best with arrays, and
your data variable is a python list with a mixture of integers and
strings.

Here is a possible solution:

 data = ['', 12, 12, 423, '1', 423, -32, 12, 721, 345]
 foo = lambda x: 0 if (x == '') or (x == '1') else x
 print map(foo, data)
[0, 12, 12, 423, 0, 423, -32, 12, 721, 345]


Hope that helps,
Aronne


 The result I got: ['', 12, 12, 423, '1', 423, -32, 12, 721, 345]

 import numpy as np

 def func(data):

   x, i = np.unique(data, return_inverse = True)
   f = [ np.where( i == ind )[0] for ind in range(len(x)) ]

   new_data = []
   # Obtain 'data' arguments and give these data to New_data
   for i in range(len(x)):
   if np.size(f[i])  1:
  for j in f[i]:
  if str(data[j])  '':

 new_data.append(data[j])
   else:
 data[j] = 0
   return data

 --
 Bakhtiyor Zokhidov

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array manupulation

2013-05-26 Thread Aronne Merrelli
On Sun, May 26, 2013 at 4:30 AM, Sudheer Joseph sudheer.jos...@yahoo.comwrote:

 Dear Brian,
 I even tried below but no luck!
 In [138]: xx=np.zeros(11)
 In [139]: xx
 Out[139]: array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

 In [147]: xx.shape
 Out[147]: (11,)
 In [140]: xx=np.array(xx)[np.newaxis]
 In [141]: xx.shape
 Out[141]: (1, 11)
 In [142]: xx=xx.T
 In [143]: xx.shape
 Out[143]: (11, 1)
 In [144]: csum.shape
 Out[144]: (11, 5)
 In [145]: np.vstack((xx,csum))
 ---
 ValueErrorTraceback (most recent call last)
 /media/SJOITB/SST_VAL/ipython-input-145-2a0a60f68737 in module()
  1 np.vstack((xx,csum))

 /usr/local/lib/python2.7/dist-packages/numpy-1.7.0-py2.7-linux-x86_64.egg/numpy/core/shape_base.pyc
 in vstack(tup)
 224
 225 
 -- 226 return _nx.concatenate(map(atleast_2d,tup),0)
 227
 228 def hstack(tup):

 ValueError: all the input array dimensions except for the concatenation
 axis must match exactly




You've transposed the arrays, so now you need to stack the other way. So,
you need to use hstack to concatenate arrays with the same column length
(first axis), or vstack to concatenate arrays with the same row length
(second axis). For example:

In [110]: xx1 = np.zeros((1,7)); cc1 = np.ones((3,7))

In [111]: xx2 = np.zeros((7,1)); cc2 = np.ones((7,3))

In [112]: np.vstack((xx1, cc1))
Out[112]:
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 1.,  1.,  1.,  1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.,  1.,  1.,  1.]])

In [113]: np.hstack((xx2, cc2))
Out[113]:
array([[ 0.,  1.,  1.,  1.],
   [ 0.,  1.,  1.,  1.],
   [ 0.,  1.,  1.,  1.],
   [ 0.,  1.,  1.,  1.],
   [ 0.,  1.,  1.,  1.],
   [ 0.,  1.,  1.,  1.],
   [ 0.,  1.,  1.,  1.]])


Also, I would highly recommend studying the NumPy for MATLAB users guide:

http://www.scipy.org/NumPy_for_Matlab_Users

These issues (any many more) are discussed there.


Cheers,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing bug

2013-04-05 Thread Aronne Merrelli
On Sun, Mar 31, 2013 at 12:14 AM, Ivan Oseledets
ivan.oseled...@gmail.comwrote:

snip

Oh!  So it is not a bug, it is a feature, which is completely
 incompatible with other array based languages (MATLAB and Fortran). To
 me, I can not find a single explanation why it is so in numpy.
 Taking submatrices from a matrix is a common operation and the syntax
 above is very natural to take submatrices, not a weird diagonal stuff.
 i.e.,

 c = np.random.randn(100,100)
 d = c[[0,3],[2,3]]

 should NOT produce two numbers! (and you can not do it using slices!)

 In MATLAB and Fortran
 c(indi,indj)
 will produce a 2 x 2 matrix.
 How it can be done in numpy (and why the complications?)

 So, please consider this message as a feature request.



There is already a function, ix, in the index_tricks that does this (I
think it is essentially implementing the broadcasting trick that Nathaniel
mentions. For me the index trick is easier, as I often forget the
broadcasting details). Example:

In [14]: c = np.random.randn(100,100)

In [15]: c[[0,3],[2,3]]
Out[15]: array([ 0.99141998, -0.88928917])

In [16]: c[np.ix_([0,3],[2,3])]
Out[16]:
array([[ 0.99141998, -1.98406295],
   [ 0.0729076 , -0.88928917]])


So for me, I think this is superior to MATLAB, as I have often had the case
of wanting the result from the second option. In MATLAB you would need to
extract the 2x2 matrix, and then take its diagonal. This can be wasteful
when the index arrays become large.

Cheers,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] help with f2py

2012-12-23 Thread Aronne Merrelli
Hi,

I'm trying to run f2py and running into some trouble. Starting from
http://www.scipy.org/Cookbook/F2Py, and the very simple 'Wrapping Hermite
Polynomial' example, I can get the pyf file created with no issues. The
system I am using is RedHat linux, and has several Fortran compilers:

$ f2py -c --help-fcompiler
snip
Fortran compilers found:
  --fcompiler=g95  G95 Fortran Compiler (0.92)
  --fcompiler=gnu  GNU Fortran 77 compiler (3.4.6)
  --fcompiler=gnu95GNU Fortran 95 compiler (4.1.2)
  --fcompiler=intelem  Intel Fortran Compiler for 64-bit apps (11.1)

All of these will successfully create the .so file except for g95, but when
I try to import into python I get this ImportError for any of the other
three compilers:

In [5]: import hermite
ImportError: ./hermite.so: undefined symbol: c06ebf_

If I look at the shared object I find that symbol here:

$ nm hermite.so
snip
43c0 T array_from_pyobj
 U c06eaf_
 U c06ebf_


And that about hits my limit of compiler knowledge, as I am pretty much a
novice with these things. Any ideas on what is going wrong here, or
suggestions of things to try?

Thanks,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] __array_wrap__ and dot

2012-08-22 Thread Aronne Merrelli
Hello list,

I posted an issue many months ago [1] about confusing __array_wrap__
behavior in a subclass of np.matrix.
Since that time I wasn't really using the code that used the matrix
subclass, but lately I have starting using the code so I've run into
the issue again. Looking into a little more detail, I think the root
of the problem is how __array_wrap__ interacts with the dot functions.
Operations like M*2 will be processed differently depending on whether
M is a ndarray or matrix subclass; it appears that a matrix will
always call np.dot for that operation, while an ndarray will go
through np.multiply. Since dot acts differently than multiply here,
you get different results between the two subclasses.

I updated the gist to show the problem [2]. The output I get on my
system is pasted at the bottom. It looks like __array_wrap__ gets
called sometimes, depending on the type of one of the arguments. I'm
not seeing any logical pattern to it, which makes it confusing.  My
expectation (perhaps incorrect) is that __array_wrap__ should be
invoked any time dot is called. I guess that depends on whether dot
should be considered a ufunc - I don't know the answer to that
question.

If someone with more knowledge of the internals could take a look at
this, that would be great - I think this is a bug, but I'm not really
sure what is the expected behavior here.

Thanks,
Aronne

[1] http://mail.scipy.org/pipermail/numpy-discussion/2011-December/059665.html
[2] https://gist.github.com/1511354



Output from run_test (see the gist code) using ipython:


In [1]: import matrix_array_wrap_test as mtest; import numpy as np

In [2]: np.__version__
Out[2]: '1.6.1'

In [3]: np.dot
Out[3]: function numpy.core._dotblas.dot

In [4]: mtest.run_test()
aw called? |  creation   |  use

Yes|  o=ArrSubClass(2)   |  o2 = o + 2
Yes|  o=ArrSubClass(2)   |  o2 = o + 2.0
Yes|  o=ArrSubClass(2)   |  o2 = o * 2
Yes|  o=ArrSubClass(2)   |  o2 = o * 2.0
Yes|  o=ArrSubClass(2)   |  o2 = o * o.T
No |  o=ArrSubClass(2)   |  o2 = np.dot(o,2)
No |  o=ArrSubClass(2)   |  o2 = np.dot(o,2.0)
No |  o=ArrSubClass(2)   |  o2 = np.dot(o,o.T)
Yes|  o=ArrSubClass(2)   |  o2 = np.core.multiarray.dot(o,2)
Yes|  o=ArrSubClass(2)   |  o2 =
np.core.multiarray.dot(o,2.0)
No |  o=ArrSubClass(2)   |  o2 =
np.core.multiarray.dot(o,o.T)

Yes|  o=MatSubClass([1, 1])  |  o2 = o + 2
Yes|  o=MatSubClass([1, 1])  |  o2 = o + 2.0
Yes|  o=MatSubClass([1, 1])  |  o2 = o * 2
No |  o=MatSubClass([1, 1])  |  o2 = o * 2.0
No |  o=MatSubClass([1, 1])  |  o2 = o * o.T
Yes|  o=MatSubClass([1, 1])  |  o2 = np.dot(o,2)
No |  o=MatSubClass([1, 1])  |  o2 = np.dot(o,2.0)
No |  o=MatSubClass([1, 1])  |  o2 = np.dot(o,o.T)
Yes|  o=MatSubClass([1, 1])  |  o2 = np.core.multiarray.dot(o,2)
Yes|  o=MatSubClass([1, 1])  |  o2 =
np.core.multiarray.dot(o,2.0)
No |  o=MatSubClass([1, 1])  |  o2 =
np.core.multiarray.dot(o,o.T)

Yes|  o=MatSubClass([1.0, 1.0])  |  o2 = o + 2
Yes|  o=MatSubClass([1.0, 1.0])  |  o2 = o + 2.0
No |  o=MatSubClass([1.0, 1.0])  |  o2 = o * 2
No |  o=MatSubClass([1.0, 1.0])  |  o2 = o * 2.0
No |  o=MatSubClass([1.0, 1.0])  |  o2 = o * o.T
No |  o=MatSubClass([1.0, 1.0])  |  o2 = np.dot(o,2)
No |  o=MatSubClass([1.0, 1.0])  |  o2 = np.dot(o,2.0)
No |  o=MatSubClass([1.0, 1.0])  |  o2 = np.dot(o,o.T)
Yes|  o=MatSubClass([1.0, 1.0])  |  o2 = np.core.multiarray.dot(o,2)
Yes|  o=MatSubClass([1.0, 1.0])  |  o2 =
np.core.multiarray.dot(o,2.0)
No |  o=MatSubClass([1.0, 1.0])  |  o2 =
np.core.multiarray.dot(o,o.T)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Problems understanding histogram2d

2012-07-20 Thread Aronne Merrelli
On Fri, Jul 20, 2012 at 10:11 AM, Andreas Hilboll li...@hilboll.de wrote:
 Hi,

 I have a problem using histogram2d:

from numpy import linspace, histogram2d
bins_x = linspace(-180., 180., 360)
bins_y = linspace(-90., 90., 180)
data_x = linspace(-179.96875, 179.96875, 5760)
data_y = linspace(-89.96875, 89.96875, 2880)
histogram2d(data_x, data_y, (bins_x, bins_y))

AttributeError: The dimension of bins must be equal to the dimension of
 the sample x.

 I would expect histogram2d to return a 2d array of shape (360,180), which
 is full of 256s. What am I missing here?


It is a joint histogram, so the x and y inputs represent each
dimension of a 2-dimensional sample. So, the x and y arrays must be
the same length. (the documentation does appear to be incorrect here).
The bins do not need to have the same length. Here is your example
adjusted (with many fewer bins so I could print it in the console) -
note since you just have two ramps from linspace as the data, most
of the points are near the diagonal.

In [15]: bins_x = linspace(-180,180,6)
In [16]: bins_y = linspace(-90,90,4)
In [17]: data_x = linspace(-179.96875, 179.96875, 2880)
In [18]: data_y = linspace(-89.96875, 89.96875, 2880)
In [19]: H, x_edges, y_edges = np.histogram2d(data_x, data_y, (bins_x, bins_y))

In [20]: H
Out[20]:
array([[ 576.,0.,0.],
   [ 384.,  192.,0.],
   [   0.,  576.,0.],
   [   0.,  192.,  384.],
   [   0.,0.,  576.]])

In [21]: x_edges
Out[21]: array([-180., -108.,  -36.,   36.,  108.,  180.])

In [22]: y_edges
Out[22]: array([-90., -30.,  30.,  90.])


So, back to that AttributeError - it is clearly unhelpful. Looking
through the code, it looks like the x,y input arrays are joined into a
2D array with a numpy core function 'atleast_2d'. If this function
sees inputs that are not the same length, it actually produces a
2-element numpy object array:

In [57]: data_x.shape, data_y.shape
Out[57]: ((5760,), (2880,))
In [58]: data_xy = atleast_2d([data_x, data_y])
In [59]: data_xy.shape, data_xy.dtype
Out[59]: ((1, 2), dtype('object'))
In [60]: data_xy[0,0].shape, data_xy[0,1].shape
Out[60]: ((5760,), (2880,))

If the x, y array have the same length this looks a lot more logical:

In [62]: data_x.shape, data_y.shape
Out[62]: ((2880,), (2880,))
In [63]: data_xy = atleast_2d([data_x, data_y])
In [64]: data_xy.shape, data_xy.dtype
Out[64]: ((2, 2880), dtype('float64'))

So, that Assertion error comes up histogramdd (which actually does the
work), expects the data array to be [Ndimension, Nsample], and the
number of dimensions is set by the number of bin arrays that were
input (2). Since it sees that [1,2] shaped object array, it treats
that as a 2-element, 1-dimension dataset; thus, at that level, the
AssertionError actually makes sense.


Hope that helps,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays and vectorizing

2012-05-02 Thread Aronne Merrelli
On Wed, May 2, 2012 at 1:06 PM, Moroney, Catherine M (388D)
catherine.m.moro...@jpl.nasa.gov wrote:
 Hello,

 Can somebody give me some hints as to how to code up this function
 in pure python, rather than dropping down to Fortran?

 I will want to compare a 7-element vector (called element) to a large list 
 of similarly-dimensioned
 vectors (called target, and pick out the vector in target that is the 
 closest to element
 (determined by minimizing the Euclidean distance).

 For instance, in (slow) brute force form it would look like:

 element = numpy.array([1, 2, 3, 4, 5, 6, 7])
 target  = numpy.array(range(0, 49)).reshape(7,7)*0.1

 min_length = .0
 min_index  =
 for i in xrange(0, 7):
   distance = (element-target)**2
   distance = numpy.sqrt(distance.sum())
   if (distance  min_length):
      min_length = distance
      min_index  = i


If you are just trying to find the index to the vector in target
that is closest to element, then I think the default broadcasting
would work fine. Here is an example that should work (the broadcasting
is done for the subtraction element-targets):

In [39]: element = np.arange(1,8)
In [40]: targets = np.random.uniform(0,8,(1000,7))
In [41]: distance_squared = ((element-targets)**2).sum(1)
In [42]: min_index = distance_squared.argmin()
In [43]: element
Out[43]: array([1, 2, 3, 4, 5, 6, 7])
In [44]: targets[min_index,:]
Out[44]:
array([ 1.93625981,  2.56137284,  2.23395169,  4.15215253,  3.96478248,
5.21829915,  5.13049489])

Note - depending on the number of vectors in targets, it might be
better to have everything transposed if you are really worried about
the timing; you'd need to try that for your particular case.

Hope that helps,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Aronne Merrelli
On Wed, May 2, 2012 at 5:27 PM, Kevin Jacobs jac...@bioinformed.com
bioinfor...@gmail.com wrote:
 On Wed, May 2, 2012 at 5:45 PM, Moroney, Catherine M (388D)
 catherine.m.moro...@jpl.nasa.gov wrote:

 Thanks to Perry for some very useful off-list conversation.   I realize
 that
 I wasn't being clear at all in my earlier description of the problem so
 here it is
 in a nutshell:

 Find the best match in an array t(5000, 7) for a single vector e(7).  Now
 scale
 it up so e is (128, 512, 7) and I want to return a (128, 512) array of the
 t-identifiers
 that are the best match for e.  Best match is defined as the minimum
 Euclidean distance.


 It sounds like you want to find the nearest neighbor to a point in a
 high-dimensional space. This sounds like a job for a spacial data structure
 like a KD-tree.  See:

 http://docs.scipy.org/doc/scipy/reference/spatial.html
 http://mloss.org/software/view/143/
 http://www.mrzv.org/software/pyANN/
 etc.

In general this is a good suggestion - I was going to mention it
earlier - but I think for this particular problem it is not better
than the brute force and argmin() NumPy approach. On my laptop, the
KDTree query is about a factor of 7 slower (ignoring the time cost to
create the KDTree)

In [45]: def foo1(element, targets):
distance_squared = ((element-targets)**2).sum(1)
min_index = distance_squared.argmin()
return sqrt(distance_squared[min_index]), min_index
   :

In [46]: def foo2(element, T):
return T.query(element)


In [47]: element = np.arange(1,8)
In [48]: targets = np.random.uniform(0,8,(5000,7))
In [49]: T = scipy.spatial.KDTree(targets)
In [50]: %timeit foo1(element, targets)
1000 loops, best of 3: 401 us per loop
In [51]: %timeit foo2(element, T)
100 loops, best of 3: 2.92 ms per loop

Just to make sure they say the same thing:

In [53]: foo1(element, targets)
Out[53]: (1.8173671152898632, 570)
In [54]: foo2(element, T)
Out[54]: (1.8173671152898632, 570)


I think KDTree is more optimal for larger searches (more than 5000
elements), and fewer dimensions. For example, with 500,000 elements
and 2 dimensions, I get 34 ms for NumPy and 2 ms for the KDtree.

Back to the original question, for 400 us per search, even over
128x512 elements that would be 26 seconds total, I think? That might
not be too bad.

Cheers,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Aronne Merrelli
On Wed, May 2, 2012 at 6:46 PM, Kevin Jacobs jac...@bioinformed.com
bioinfor...@gmail.com wrote:
 On Wed, May 2, 2012 at 7:25 PM, Aronne Merrelli aronne.merre...@gmail.com
 wrote:

 In general this is a good suggestion - I was going to mention it
 earlier - but I think for this particular problem it is not better
 than the brute force and argmin() NumPy approach. On my laptop, the
 KDTree query is about a factor of 7 slower (ignoring the time cost to
 create the KDTree)


 The cKDTree implementation is more than 4 times faster than the brute-force
 approach:

 T = scipy.spatial.cKDTree(targets)

 In [11]: %timeit foo1(element, targets)   # Brute force
 1000 loops, best of 3: 385 us per loop

 In [12]: %timeit foo2(element, T)         # cKDTree
 1 loops, best of 3: 83.5 us per loop

 In [13]: 385/83.5
 Out[13]: 4.610778443113772

Wow, not sure how I missed that! It even seems to scale better than
linear (some of that 83us is call overhead, I guess):

In [35]: %timeit foo2(element, T)
1 loops, best of 3: 115 us per loop
In [36]: elements = np.random.uniform(0,8,[128,512,7])
In [37]: %timeit foo2(elements.reshape((128*512,7)), T)
1 loops, best of 3: 2.66 s per loop

So only 2.7 seconds to search the whole set. Not bad!

Cheers,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG

2012-04-02 Thread Aronne Merrelli
On Sun, Apr 1, 2012 at 8:28 AM, Kamesh Krishnamurthy kames...@gmail.com wrote:
 Hello all,

 I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were
 linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower.
 I've posted details on Stackoverflow:
 http://stackoverflow.com/q/9955021/974568



If you just call eig() in MATLAB it only returns eigenvalues (not
vectors). I think there might be a shortcut algorithm if you only
want the eigenvalues - or maybe it is faster just due to the smaller
memory requirement. NumPy's eig always computes both. On my Mac OS X
machine I get this result, showing the two are basically equivalent
(this is EPD NumPy, so show_config() shows it is built on MKL):

MATLAB:
 tic; eig(r); toc
Elapsed time is 10.594226 seconds.
 tic; [V,D] = eig(r); toc
Elapsed time is 23.767467 seconds.

NumPy
In [4]: t0=datetime.now(); numpy.linalg.eig(r); print datetime.now()-t0
0:00:25.594435
In [6]: t0=datetime.now(); v,V = numpy.linalg.eig(r); print datetime.now()-t0
0:00:25.485411

If you change the MATLAB call, how does it compare?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Logical indexing and higher-dimensional arrays.

2012-02-08 Thread Aronne Merrelli
On Wed, Feb 8, 2012 at 8:49 AM, Travis Oliphant tra...@continuum.io wrote:


 On Feb 8, 2012, at 8:29 AM, Sturla Molden wrote:

  On 08.02.2012 06:01, Travis Oliphant wrote:
 
  Recall that the shape of the output with fancy indexing is determined
 by broadcasting together the indexing objects and using that as the shape
 of the output:
 
  x[ind1, ind2] will produce an output with the shape of broadcast(ind1,
 ind2) whose elements are selected by the broadcasted tuple.
 
 
 
 
  Thank you for the explanation, Travis.
 
  I think my main confusion is why we broadcast indices at all.
  Broadcasting is something I would expect to happen for binary operations
  between ndarrays, not for indexing.

 We broadcast because of the zip-based semantics of current
 fancy-indexing which is basically an element-by-element operation:
  x[ind1, ind2]  will choose it's elements out of x by taken elements from
 ind1 as the first index and elements out of ind2 as the second.

 As an essentially element-by-element operation, if the shapes of the input
 arrays are not exactly the same, it is natural to broadcast them to the
 same shape.   This is fairly intuitive if you are used to broadcasting in
 NumPy.   The problem is that it does not coincide other intuitions in
 certain cases.

 This behavior actually allows easy-to-access cross-product behavior by
 broadcasting a 1-d ind1 shaped as (4,) with a ind2 shaped as (4,1).  The
 numpy.ix_  function does the basic reshaping needed so that 0:2, 0:2
 matches [0,1],[0,1] indexing:   x[ix_([0,1],[0,1])] is the same as
 x[0:2,0:2].

 There are also some very nice applications where you can select out of a
 3-d volume a depth-surface defined by indexes like so:

arr[ i[:,newaxis], j, depth]

 where arr is a 3-d array,  i and j are 1-d index arrays: i =
 arange(arr.shape[0]) and j = arange(arr.shape[1]), and depth is a 2-d array
 of depths.   The selected result will be 2-d.

 When you understand what is happening, it is consistent and it does make
 sense and it has some very nice applications.I recognize that people
 get confused because their expectations are different, but the current
 behavior can be understood with a fairly simple mental model.

 I could support, however, a move to push this style of indexing to a
 method, and make fancy-indexing use cross-product semantics if:

* we do it in a phased-manner with lot's of deprecation warnings
 and even a tool that helps you change code (the tool would have to play
 your code and make a note of the locations where this style of indexing was
 used --- because static code analysis wouldn't know which objects were
 arrays and which weren't).

* we also make the ndarray object general enough so that
 fancy-indexing could be returned as a view on the array (the same as
 regular indexing)

 This sort of thing would take time, but is not out of the question in my
 mind because I suspect the number of users and use-cases of broadcasted
 fancy-indexing is small.

 -Travis



Speaking for myself, I've used both methods quite often. For the
broadcasting method, it is very useful for image processing scenarios where
you want to extract a small subregion with an arbitrary shape. It is also
extremely useful for this to work both ways, for example if I wanted to do
some processing on that subregion and then put it back in the larger image:

sub_reg_vals = img[ ind1, ind2 ]
do_stuff(sub_reg_vals)
img[ ind1, ind2 ] = sub_reg_vals

Of course this may require awareness of the coordinates versus flat index,
but the ravel/unravel_index functions do this for you. Note that IDL works
in this way, unlike MATLAB. The quick and dirty way I've been doing it in
MATLAB is

sub_reg_vals = diag( img[ind1, ind2] )

Which works but is obviously a bit inefficient.

I have no convincing ideas about which method should be the default, but
I think both methods get a fair bit of use, so as long as there are fast
and well documented methods for doing either, I would be happy. I actually
had no idea that np.ix_ did the other method in NumPy. (Thanks for
mentioning that! I had been using A[ind1,:][:,ind2]). There is no generic
terminology for these operations, so I think that generates a lot of
confusion.

Thanks,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix indexing

2012-02-07 Thread Aronne Merrelli
On Mon, Feb 6, 2012 at 11:44 AM, Naresh Pai n...@uark.edu wrote:

 I have two large matrices, say, ABC and DEF, each with a shape of 7000 by
 4500. I have another list, say, elem, containing 850 values from ABC. I am
 interested in finding out the corresponding values in DEF where ABC has
 elem and store them *separately*. The code that I am using is:

 for i in range(len(elem)):
  DEF_distr = DEF[ABC==elem[i]]

 DEF_distr gets used for further processing before it gets cleared from
 memory and the next round of the above loop begins. The loop above
 currently takes about 20 minutes! I think the bottle neck is where elem is
 getting searched repeatedly in ABC. So I am looking for a solution where
 all elem can get processed in a single call and the indices of ABC be
 stored in another variable (separately). I would appreciate if you suggest
 any faster method for getting DEF_distr.


You'll need to mention some details about the contents of ABC/DEF in order
to get the best answer (what range of values, do they have a certain
structure, etc). I made the assumption that ABC and elem have integers (I'm
not sure it makes sense to search for ABC==elem[n] unless they are both
integers), and then used a sort followed by searchsorted. This has a side
effect of reordering the elements in DEF_distr. I don't know if that
matters. You can skip the .copy() calls if you don't care that ABC/DEF are
sorted.

ABC_1D = ABC.copy().ravel()
ABC_1D_sorter = np.argsort(ABC_1D)
ABC_1D = ABC_1D[ABC_1D_sorter]
DEF_1D = DEF.copy().ravel()
DEF_1D = DEF_1D[ABC_1D_sorter]
ind1 = np.searchsorted(ABC_1D, elem, side='left')
ind2 = np.searchsorted(ABC_1D, elem, side='right')
DEF_distr = []
for n in range(len(elem)):
DEF_distr.append( DEF_1D[ind1[n]:ind2[n]] )


I tried this on the big memory workstation, and for the 7Kx4K size I get
about 100 seconds for the simple method and 10 seconds for this more
complicated sort-based method - if you are getting 20 minutes for that,
maybe there is a memory problem, or a different part of the code that is
the bottleneck?

Hope that helps,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Aronne Merrelli
On Mon, Jan 23, 2012 at 1:33 PM, Travis Oliphant teoliph...@gmail.comwrote:

 Can you determine where the problem is, precisely.In other words, can
 you verify that c is not getting filled in correctly?

 You are no doubt going to get overflow in the summation as you have a
 uint8 parameter.   But, having that overflow be exactly '0' would be
 surprising.

 Can you verify that a and b are getting created correctly?   Also, 'c'
 should be a 2-d array, can you verify that?  Can you take the sum along the
 -1 axis and the 0 axis separately:

 print a.shape
 print b.shape
 print c.shape

 c[100:].sum(axis=0)
 d = c[100:].sum(axis=-1)
 print d[:100]
 print d[-100:]



I am getting the same results as David. It looks like c just stopped
filling in partway through the array. I don't think there is any overflow
issue, since the result of sum() is up-promoted to uint64 when I do that.
Travis, here are the outputs at my end - I cut out many zeros for brevity:

In [7]: print a.shape
(500, 972)
In [8]: print b.shape
(4993210,)
In [9]: print c.shape
(4993210, 972)

In [10]: c[100:].sum(axis=0)
Out[10]:
array([0, 0, 0,  , 0])

In [11]: d = c[100:].sum(axis=-1)

In [12]: print d[:100]
[0 0 0 ... 0 0]

In [13]: print d[-100:]
[0 0 0 ... 0 0 0]

I looked at sparse subsamples with matplotlib - specifically,
imshow(a[::1000, :]) - and the a array looks correct (random values
everywhere), but c is zero past a certain row number. In fact, it looks
like it becomes zero at row 575419 - I think for all rows in c beyond row
574519, the values will be zero. For lower row numbers, I think they are
correctly filled (at least, by the sparse view in matplotlib).

In [15]: a[b[574519], 350:360]
Out[15]: array([143, 155,  11,  30, 212, 149, 110, 164, 165, 120],
dtype=uint8)

In [16]: c[574519, 350:360]
Out[16]: array([143, 155,  11,  30, 212, 149,   0,   0,   0,   0],
dtype=uint8)


I'm using EPD 7.1, numpy 1.6.1, Linux installation (I don't know the kernel
details)

HTH,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] find location of maximum values

2012-01-09 Thread Aronne Merrelli
On Mon, Jan 9, 2012 at 7:59 PM, questions anon questions.a...@gmail.comwrote:

 thank you, I seem to have made some progress (with lots of help)!!
 I still seem to be having trouble with the time. Because it is hourly data
 for a whole month I assume that is where my problem lies.
 When I run the following code I alwayes receive the first timestamp of the
 file. Not sure how to get around this:

 tmax=TSFC.max(axis=0)
 maxindex=tmax.argmax()


You are computing max(axis=0) first. So, tmax is an array containing the
maximum temperature at each lat/lon grid point, over the set of 721 months.
It will be a [106, 193] array.

So the argmax of tmax is an element in a shape [106,193] array (the number
of latitude/number of longitude) not the original three dimension [721,
106, 193] array. Thus when you unravel it you can only get the first time
value.

I re-read your original post but I don't understand what number you need.
Are you trying to get the single max value over the entire array? Or max
value for each month? (a 721 element vector)? or something else?


Cheers,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] output different columns of ndarray in different formats

2011-12-22 Thread Aronne Merrelli
On Thu, Dec 22, 2011 at 10:27 AM, Chao YUE chaoyue...@gmail.com wrote:

 Dear all,

 Just a small question, how can I output different columns of ndarray in
 different formats,
 the manual says, A single format (%10.5f), a sequence of formats, or a
 multi-format string
 but I use

 np.savetxt('new.csv',data,fmt=['%i4','%f6.3'])

 or

 np.savetxt('new.csv',data,fmt=('%i4','%f6.3'))

 give strange results.



I think you've flipped the format codes; try:

fmt = ('%4i', '%6.3f')
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] unexpected behavior of __array_wrap__ in matrix subclass

2011-12-22 Thread Aronne Merrelli
Hello NumPy list,

While experimenting with a subclass of numpy.matrix, I discovered cases
where __array_wrap__ is not called during multiplication. I'm not sure
whether this is a bug or my own misunderstanding of np.matrix 
__array_wrap__; if nothing else I thought it would be helpful to describe
this in case other people run into the same problem.

If the matrix is created from an array of integers, then __array_wrap__ is
called if the matrix is multiplied by an integer. It appears that in all
other cases, __array_wrap__ is not called for multiplication (int times
float scalar, float times float scalar, float matrix times float matrix,
etc). For addition, __array_wrap__ is called for all cases that I checked.

I did find a possible workaround. If you define a __mul__ method in the
matrix subclass, and then just call np.multiply, then __array_wrap__ is
called in all cases I expect it to be called.

I uploaded a example script here: https://gist.github.com/1511354

Hopefully it is not too confusing. I'm basically abusing the python
exception handler to tell whether or not __array_wrap__ is called for any
particular case. The MatSubClass shows the problem, and the
MatSubClassFixed as the __mul__ method defined. Here are the results I see
in my working environment (ipython in EPD 7.1):

In [1]: np.__version__
Out[1]: '1.6.0'
In [2]: execfile('matrix_array_wrap_test.py')
In [3]: run_test()
array_wrap called for o2 = o * 2 after o=MatSubClass([1,1])
array_wrap NOT called for o2 = o * 2.0 after o=MatSubClass([1,1])
array_wrap NOT called for o2 = o * 2 after o=MatSubClass([1.0, 1.0])
array_wrap NOT called for o2 = o * 2.0 after o=MatSubClass([1.0, 1.0])
array_wrap called for o2 = o * 2 after o=MatSubClassFixed([1,1])
array_wrap called for o2 = o * 2.0 after o=MatSubClassFixed([1,1])
array_wrap called for o2 = o * 2 after o=MatSubClassFixed([1.0, 1.0])
array_wrap called for o2 = o * 2.0 after o=MatSubClassFixed([1.0, 1.0])



Thanks,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean problems

2011-12-10 Thread Aronne Merrelli
On Sat, Dec 10, 2011 at 5:47 AM, ferreirafm ferreir...@lim12.fm.usp.brwrote:



 Hi Stéfan,
 Thanks for your replay. Have a look in the arrays at:
 http://ompldr.org/vYm83ZA
 Regards,
 Fred
 --



I can recreate this error if tab is a structured ndarray - what is the
dtype of tab?

If that is correct, I think you could fix this by simplifying things. Since
tab is already an ndarray, you should not need to convert it back into a
python list. By converting the ndarray back to a list you are making an
extra level of wrapping as a python object, which is ultimately why you
get that error about adding numpy.void.

Unfortunately you cannot take directly take a mean of a struct dtype;
structs are generic so they could have fields with strings, or objects,
etc, that would be invalid for a mean calculation. However the following
code fragment should work pretty efficiently. It will make a 1-element
array of the same dtype as tab, and then populate it with the mean value of
all elements where the length is = 15. Note that dtype.fields.keys() gives
you a nice way to iterate over the fields in the struct dtype:

length_mask = tab['length'] = 15
tab_means = np.zeros(1, dtype=tab.dtype)
for k in tab.dtype.fields.keys():
tab_means[k] = np.mean( tab[k][mask] )

In general this would not work if tab has a field that is not a simple
numeric type, such as a str, object, ... But it looks like your arrays are
all numeric from your example above.

Hope that helps,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.matrix subclassing

2011-10-25 Thread Aronne Merrelli
On Mon, Oct 24, 2011 at 5:54 PM, David Voong voong.da...@gmail.com wrote:

 Hi guys,

 I have a question regarding subclassing of the numpy.matrix class.

 I read through the wiki page,
 http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

 and tried to subclass numpy.matrix, I find that if I override the
 __finalize_array__ method I have problems using the sum method and get the
 following error,


 Traceback (most recent call last):
   File test.py, line 61, in module
 print (a * b).sum()
   File /afs/
 cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py,
 line 435, in sum
 return N.ndarray.sum(self, axis, dtype, out)._align(axis)
   File /afs/
 cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py,
 line 370, in _align
 return self[0,0]
   File /afs/
 cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py,
 line 305, in __getitem__
 out = N.ndarray.__getitem__(self, index)
 IndexError: 0-d arrays can only use a single () or a list of newaxes (and a
 single ...) as an index



Hi,

Thanks for asking this - I'm also trying to subclass np.matrix and running
into similar problems; I never generally need to sum my vectors so this
wasn't a problem I had noticed thus far.

Anyway, for np.matrix, there are definitely particular issues beyond what is
described on the array subclassing wiki. I think I have a workaround, based
on struggling with my own subclass. This is really a hack since I'm not sure
how some parts of matrix actually work, so if someone has a better solution
please speak up!

You didn't give details on the actual subclass, but I can recreate the error
with the following minimal example (testing with Numpy 1.6.1 inside EPD
7.1):

class MatSubClass1(np.matrix):
def __new__(cls, input_array):
obj = np.asarray(input_array).view(cls)
return obj
def __array_finalize__(self, obj):
pass
def __array_wrap__(self, out_arr, context=None):
return np.ndarray.__array_wrap__(self, out_arr, context)

In [2]: m1 = MatSubClass1( [[2,0],[1,1]] )
In [3]: m1.sum()
...
IndexError: 0-d arrays can only use a single () or a list of newaxes (and a
single ...) as an index


The problem is that __array_finalize__ of the matrix class that needs to get
called, to preserve dimensions (matrix should always have 2 dimensions). You
can't just add the matrix __array_finalize__ because the initial call
happens when you create the object, in which case obj is a ndarray object,
not a matrix. So, check to see obj is a matrix first before calling it. In
addition, there is some undocumented _getitem attribute inside matrix, and I
do not know what it does. If you just set that attribute during __new__, you
get something that seems to work:

class MatSubClass2(np.matrix):
def __new__(cls, input_array):
obj = np.asarray(input_array).view(cls)
obj._getitem = False
return obj
def __array_finalize__(self, obj):
if isinstance(obj, np.matrix):
np.matrix.__array_finalize__(self, obj)
def __array_wrap__(self, out_arr, context=None):
return np.ndarray.__array_wrap__(self, out_arr, context)

In [4]: m2 = MatSubClass2( [[2,0],[1,1]] )

In [5]: m2.sum(), m2.sum(0), m2.sum(1)
Out[5]: (4, matrix([[3, 1]]), matrix([[2], [2]]))


HTH,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Float128 integer comparison

2011-10-15 Thread Aronne Merrelli
On Sat, Oct 15, 2011 at 1:12 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 Continuing the exploration of float128 - can anyone explain this behavior?

  np.float64(9223372036854775808.0) == 9223372036854775808L
 True
  np.float128(9223372036854775808.0) == 9223372036854775808L
 False
  int(np.float128(9223372036854775808.0)) == 9223372036854775808L
 True
  np.round(np.float128(9223372036854775808.0)) ==
 np.float128(9223372036854775808.0)
 True


I know little about numpy internals, but while fiddling with this, I noticed
a possible clue:

 np.float128(9223372036854775808.0) == 9223372036854775808L
False
 np.float128(4611686018427387904.0) == 4611686018427387904L
True
 np.float128(9223372036854775808.0) - 9223372036854775808L
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: unsupported operand type(s) for -: 'numpy.float128' and 'long'
 np.float128(4611686018427387904.0) - 4611686018427387904L
0.0


My speculation - 9223372036854775808L is the first integer that is too big
to fit into a signed 64 bit integer. Python is OK with this but that means
it must be containing that value in some more complicated object. Since you
don't get the type error between float64() and long:

 np.float64(9223372036854775808.0) - 9223372036854775808L
0.0

Maybe there are some unimplemented pieces in numpy for dealing with
operations between float128 and python arbitrary longs? I could see the ==
test just producing false in that case, because it defaults back to some
object equality test which isn't actually looking at the numbers.

Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preventing an ndarray subclass from returning new subclass instances for std(), etc

2011-09-19 Thread Aronne Merrelli
On Sun, Sep 18, 2011 at 3:58 PM, Wes McKinney wesmck...@gmail.com wrote:


 I thought maybe you can intercept 0-dim objects and return self.item()
 in array_finalize, but not dice. This is really weird:

 import numpy as np

 class example(np.ndarray):

def __new__(cls, arr):
return np.array(arr).view(cls)

def __array_finalize__(self, obj):
 print 'foo'
if self.ndim == 0:
return self.item()


 In [6]: foo = example(np.arange(10))
 foo

 In [7]: foo.std()
 foo
 foo
 foo
 foo
 foo
 foo
 foo
 Out[7]: example(2.8722813232690143)
 ___



The documentation on array_wrap/finalize isn't too clear on this point, but
I think the return from __array_finalize__ is actually not used in any way.
So, in this example, I think it the returned self.item() is discarded. The
only important output from __array_finalize__ is whatever modification was
made to the self object. The 'caller' of __array_finalize__ is (I think)
somewhere is the C code, so I can't verify this (I'm not really a C
programmer...).

The series of foos is due to the std calculation, if you change print
'foo' to print 'foo', obj you get this output:

In [71]: a = example([1,2,3])
foo [1 2 3]

In [72]: a.std()
foo [1 2 3]
foo 6.0
foo 2.0
foo [1 2 3]
foo [1 2 3]
foo [-1.  0.  1.]
foo [ 1.  0.  1.]
foo 2.0
foo 0.6667
Out[72]: example(0.816496580927726)


Anyway, back on topic - I'm having similar problems as Keith. It seems like
there isn't consistency on how different built-in functions treat
array_wrap/finalize/etc, or maybe I'm still confused. At the moment it seems
like the only solution is to write a method in the subclass for each case
where the return isn't what you want. In this example, to force the result
of std() to be an ndarray instead of the subclass, you would need to write
something similar to:

import numpy as np
class example(np.ndarray):
   def __new__(cls, arr):
   return np.array(arr).view(cls)
   def std(self, *args, **kwargs):
   return np.array(self).std(*args, **kwargs)


Is this the best way to handle this problem? It could make the subclass
definition quite cluttered, but on the other hand all these extra methods
would be pretty straightforward.


Thanks,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Best way to construct/slice 3-dimensional ndarray from multiple 2d ndarrays?

2011-08-17 Thread Aronne Merrelli
On Wed, Aug 17, 2011 at 9:04 AM, Keith Hughitt keith.hugh...@gmail.comwrote:


 Also, when subclassing ndarray and calling obj = data.view(cls) for an
 ndarray data, does this copy the data into the new object by value or
 reference? The method which extracts the 2d slice actually returns a
 subclass of ndarray created using the extracted data, so this is why I ask.



I think it should pass a reference - the following code suggests the
subclass is sharing the same fundamental array object. You can use the .base
attribute of the ndarray object to see if it is a view back to another
ndarray object:

import numpy as np
class TestClass(np.ndarray):
def __new__(cls, inp_array):
return inp_array.view(cls)

In [2]: x = np.ones(5)
In [3]: obj = TestClass(x)
In [4]: id(x), id(obj), id(obj.base)
Out[4]: (23517648, 19708080, 23517648)
In [5]: print x, obj
[ 1.  1.  1.  1.  1.] [ 1.  1.  1.  1.  1.]
In [6]: x[2] = 2
In [7]: print x, obj
[ 1.  1.  2.  1.  1.] [ 1.  1.  2.  1.  1.]


If you change the TestClass.__new__() to: return
np.array(inp_array).view(cls) then you will make a copy of the input array
instead, if that is needed. In that case, it looks like the .base attribute
is a new ndarray, copied from the input array.


Aronne

[PS - also note that .base is set to None, if the ndarray is not a view into
another ndarray; it turns out that None has a valid object number, which
confused me at first - see id(None).]
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] __array_wrap__ / __array_finalize__ in NumPy v1.4+

2011-08-14 Thread Aronne Merrelli
Hello,

I'm attempting to implement a subclass of ndarray, and becoming confused
about the way __array_wrap__ and __array_finalize__ operate. I boiled it
down to a short subclass, which is the example on the website at
http://docs.scipy.org/doc/numpy-1.6.0/user/basics.subclassing.html, with one
added attribute that is a copy of the self array multiplied by 2. The
doubled copy is stored in a plain ndarray. The attachment has the python
code.

The output below is from NumPy 1.3 and 1.6 (1.4 has the same output as 1.6).
The output from 1.3 matches the documentation on the website. In 1.6,
__array_wrap__ and __array_finalize__ are invoked in the opposite order,
__array_finalize__ appears to be getting an empty array, and array_wrap's
argument is no longer an ndarray but rather an instance of the subclass.
This doesn't match the documentation so I am not sure if this is the correct
behavior in newer NumPy. Is this a bug, or the expected behavior in newer
NumPy versions?

Am I just missing something simple? The actual code I am trying to write
uses essentially the same idea - keeping another array, related to the self
array through some calculation, as another object attribute.  Is there a
better way to accomplish this?

Thanks,
Aronne



NumPy version:  1.3.0

object creation
In __array_finalize__:
   self type class 'array_wrap_test.TestClass', values TestClass([0, 1])
   obj  type type 'numpy.ndarray', values array([0, 1])

object + ndarray
In __array_wrap__:
   self type class 'array_wrap_test.TestClass', values TestClass([0, 1])
   arr  type type 'numpy.ndarray', values array([2, 3])
In __array_finalize__:
   self type class 'array_wrap_test.TestClass', values TestClass([2, 3])
   obj  type class 'array_wrap_test.TestClass', values TestClass([0, 1])

obj=  [0 1] [0 2]  ret=  [2 3] [4 6]


NumPy version:  1.6.0

object creation
In __array_finalize__:
   self type class 'array_wrap_test.TestClass', values TestClass([0, 1])
   obj  type type 'numpy.ndarray', values array([0, 1])

object + ndarray
In __array_finalize__:
   self type class 'array_wrap_test.TestClass', values TestClass([
15, 22033837])
   obj  type class 'array_wrap_test.TestClass', values TestClass([0, 1])
In __array_wrap__:
   self type class 'array_wrap_test.TestClass', values TestClass([0, 1])
   arr  type class 'array_wrap_test.TestClass', values TestClass([2, 3])

obj=  [0 1] [0 2]  ret=  [2 3] [  30 44067674]


array_wrap_test.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion