Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus

On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote:

 I know I know, that's pretty outrageous to even suggest, but please 
 bear with me, I am stumped as you may be:
 
 2-D data file here:
 http://dl.dropbox.com/u/139035/data.npy
 
 Then:
 In [3]: data.mean()
 Out[3]: 3067.024383998
 
 In [4]: data.max()
 Out[4]: 3052.4343
 
 In [5]: data.shape
 Out[5]: (1000, 1000)
 
 In [6]: data.min()
 Out[6]: 3040.498
 
 In [7]: data.dtype
 Out[7]: dtype('float32')
 
 
 A mean value calculated per loop over the data gives me 3045.747251076416
 I first thought I still misunderstand how data.mean() works, per axis 
 and so on, but did the same with a flattenend version with the same 
 results.
 
 Am I really soo tired that I can't see what I am doing wrong here?
 For completion, the data was read by a osgeo.gdal dataset method called 
 ReadAsArray()
 My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
 Enthought's EPD.


I get the same result:

In [1]: import numpy

In [2]: data = numpy.load('data.npy')

In [3]: data.mean()
Out[3]: 3067.024383998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.min()
Out[5]: 3040.498

In [6]: numpy.version.version
Out[6]: '2.0.0.dev-433b02a'

This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 
32 vs. 64-bit process doesn't make a difference.

The data matrix doesn't look too strange when I view it as an image -- all 
pretty smooth variation around the (min, max) range. But maybe it's still 
somehow floating-point pathological?

This is fun too:
In [12]: data.mean()
Out[12]: 3067.024383998

In [13]: (data/3000).mean()*3000
Out[13]: 3020.807437501

In [15]: (data/2).mean()*2
Out[15]: 3067.024383998

In [16]: (data/200).mean()*200
Out[16]: 3013.67541


Zach


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus
 You have a million 32-bit floating point numbers that are in the 
 thousands. Thus you are exceeding the 32-bitfloat precision and, if you 
 can, you need to increase precision of the accumulator in np.mean() or 
 change the input dtype:
 a.mean(dtype=np.float32) # default and lacks precision
 3067.024383998
 a.mean(dtype=np.float64)
 3045.747251076416
 a.mean(dtype=np.float128)
 3045.7472510764160156
 b=a.astype(np.float128)
 b.mean()
 3045.7472510764160156
 
 Otherwise you are left to using some alternative approach to calculate 
 the mean.
 
 Bruce

Interesting -- I knew that float64 accumulators were used with integer arrays, 
and I had just assumed that 64-bit or higher accumulators would be used with 
floating-point arrays too, instead of the array's dtype. This is actually quite 
a bit of a gotcha for floating-point imaging-type tasks -- good to know!

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Addressing arrays

2012-01-30 Thread Zachary Pincus
a[x,y,:]

Read the slicing part of the tutorial:
http://www.scipy.org/Tentative_NumPy_Tutorial 
(section 1.6)

And the documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html



On Jan 30, 2012, at 10:25 AM, Ted To wrote:

 Hi,
 
 Is there some straightforward way to access an array by values across a
 subset of its dimensions?  For example, if I have a three dimensional
 array a=(x,y,z), can I look at the values of z given particular values
 for x and y?
 
 Thanks,
 Ted
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Addressing arrays

2012-01-30 Thread Zachary Pincus
Ted, can you clarify what you're asking for? Maybe give a trivial example of an 
array and the desired output?

I'm pretty sure this is a slicing question though:
 If I have a three dimensional array a=(x,y,z), can I look at the values of z 
 given particular values for x and y?
Given that element values are scalars in this case, and indices are (x,y,z) 
triples, it seems likely that looking for values of z given an (x,y) pair is 
an slicing-by-index question, no?

For indexing-by-value, fancy indexing with boolean masks is usually the way 
to go... again, Ted (or Chao), if you can describe your indexing needs in a bit 
more detail, it's often easy to find a compact slicing and/or fancy-indexing 
strategy that works well and reasonably efficiently.

Zach



On Jan 30, 2012, at 10:33 AM, Chao YUE wrote:

 he is not asking for slicing. he is asking for how to index array by element 
 value but not element index.
 
 2012/1/30 Zachary Pincus zachary.pin...@yale.edu
 a[x,y,:]
 
 Read the slicing part of the tutorial:
 http://www.scipy.org/Tentative_NumPy_Tutorial
 (section 1.6)
 
 And the documentation:
 http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
 
 
 
 On Jan 30, 2012, at 10:25 AM, Ted To wrote:
 
  Hi,
 
  Is there some straightforward way to access an array by values across a
  subset of its dimensions?  For example, if I have a three dimensional
  array a=(x,y,z), can I look at the values of z given particular values
  for x and y?
 
  Thanks,
  Ted
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 -- 
 ***
 Chao YUE
 Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
 UMR 1572 CEA-CNRS-UVSQ
 Batiment 712 - Pe 119
 91191 GIF Sur YVETTE Cedex
 Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Addressing arrays

2012-01-30 Thread Zachary Pincus
 Thanks!  That works great if I only want to search over one index but I
 can't quite figure out what to do with more than a single index.  So
 suppose I have a labeled, multidimensional array with labels 'month',
 'year' and 'quantity'.  a[['month','year']] gives me an array of indices
 but a[['month','year']]==(1,1960) produces False.  I'm sure I simply
 don't know the proper syntax and I apologize for that -- I'm kind of new
 to numpy.

I think that your best bet is to form the boolean masks independently and then 
logical-and them together:

mask = (a['month'] == 1)  (a['year'] == 1960)
jan_60 = a[mask]

Someone might have more insight here. Though I should note that if you have 
large data and are doing lots of queries like this, a more database-ish 
approach might be better. Something like sqlite's python bindings, or PyTables. 
Alternately, if your data are all time-series based things, PANDAS might be 
worth looking at. 

But the above approach should be just fine for non-huge datasets...

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] all elements equal

2012-03-05 Thread Zachary Pincus
How about the following?
exact: numpy.all(a == a[0])
inexact: numpy.allclose(a, a[0])

On Mar 5, 2012, at 2:19 PM, Keith Goodman wrote:

 On Mon, Mar 5, 2012 at 11:14 AM, Neal Becker ndbeck...@gmail.com wrote:
 What is a simple, efficient way to determine if all elements in an array (in 
 my
 case, 1D) are equal?  How about close?
 
 For the exactly equal case, how about:
 
 I[1] a = np.array([1,1,1,1])
 I[2] np.unique(a).size
 O[2] 1# All equal
 
 I[3] a = np.array([1,1,1,2])
 I[4] np.unique(a).size
 O[4] 2   # All not equal
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Viewing a float64 array with a float32 array

2012-03-21 Thread Zachary Pincus
 Hi,
 
 Is it possible to have a view of a float64 array that is itself float32?
 So that:
 
 A = np.arange(5, dtype='d')
 A.view(dtype='f')
 
 would return a size 5 float32 array looking at A's data?

I think not. The memory layout of a 32-bit IEEE float is not a subset of that 
of a 64-bit float -- see e.g. the first table in:
http://steve.hollasch.net/cgindex/coding/ieeefloat.html

This would work for int8/int16 or other int types (so long as the number 
doesn't exceed the range of the smaller type), but AFAIK not floats.

Note how the subset relationship works for the int8/int16 case, but not 
float32/float64:

str(numpy.array(100,dtype=numpy.int8).data)
'd'

str(numpy.array(100,dtype=numpy.int16).data)
'd\x00'

str(numpy.array(-100,dtype=numpy.int16).data)
'\x9c\xff'

str(numpy.array(-100,dtype=numpy.int8).data)
'\x9c'

str(numpy.array(100,dtype=numpy.float32).data)
'\x00\x00\xc8B'

str(numpy.array(100,dtype=numpy.float64).data)
'\x00\x00\x00\x00\x00\x00Y@'


Zach



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Viewing a float64 array with a float32 array

2012-03-21 Thread Zachary Pincus
 I'm not sure if you are referring to rounding errors but that's OK with
 me.
 
 I was thinking something along the lines of changing how numpy looks at
 the data of A's view by modifying say the stride attribute, etc.

Yes, so was I. As you can see in my example with ints below, you could skip 
every other byte of the int16 array to look at an int8 array. This is because 
the memory layout of an int8 is a proper subset of int16. (Modulo 
endian-concerns of course...)

But looking at the link I provided, you can see that taking the first 32 bits 
of an float64 (or the last 32 or any 32) does not yield something that can be 
interpreted as a float32. So there's no subset relationship, and you can't do 
the strides-trick.

To be extra clear, look at the memory layout of a float that's expressible 
without rounding error:
str(numpy.array(128,dtype=numpy.float64).data)
'\x00\x00\x00\x00\x00\x00`@'

str(numpy.array(128,dtype=numpy.float32).data)
'\x00\x00\x00C'

There's obviously no stride trick whereby one will look like the other. 

Zach


 
 On Wed, Mar 21, 2012, at 11:19, Zachary Pincus wrote:
 Hi,
 
 Is it possible to have a view of a float64 array that is itself float32?
 So that:
 
 A = np.arange(5, dtype='d')
 A.view(dtype='f')
 
 would return a size 5 float32 array looking at A's data?
 
 I think not. The memory layout of a 32-bit IEEE float is not a subset of
 that of a 64-bit float -- see e.g. the first table in:
 http://steve.hollasch.net/cgindex/coding/ieeefloat.html
 
 This would work for int8/int16 or other int types (so long as the number
 doesn't exceed the range of the smaller type), but AFAIK not floats.
 
 Note how the subset relationship works for the int8/int16 case, but not
 float32/float64:
 
 str(numpy.array(100,dtype=numpy.int8).data)
 'd'
 
 str(numpy.array(100,dtype=numpy.int16).data)
 'd\x00'
 
 str(numpy.array(-100,dtype=numpy.int16).data)
 '\x9c\xff'
 
 str(numpy.array(-100,dtype=numpy.int8).data)
 '\x9c'
 
 str(numpy.array(100,dtype=numpy.float32).data)
 '\x00\x00\xc8B'
 
 str(numpy.array(100,dtype=numpy.float64).data)
 '\x00\x00\x00\x00\x00\x00Y@'
 
 
 Zach
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-09 Thread Zachary Pincus
 That all sounds like no option -- sad.
 Cython is no solution cause, all I want is to leave Python Syntax in
 favor for strong OOP design patterns.

What about ctypes?

For straight numerical work where sometimes all one needs to hand across the 
python-to-C/C++/Fortran boundary is a pointer to some memory and the size of 
the memory region. So I will often just write a very thin C interface to 
whatever I'm using and then call into it with ctypes.

So you could just design your algorithm in C++ with all the strong OOP design 
patterns you want, and then just write a minimal C interface on top with one 
or two functions that receive pointers to memory. Then allocate numpy arrays in 
python with whatever memory layout you need, and use the a ctypes attribute 
to find the pointer data etc. that you need to pass over. 

Zach

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slices from an index list

2012-04-11 Thread Zachary Pincus
 Here's one way you could do it:
 
 In [43]: indices = [0,1,2,3,5,7,8,9,10,12,13,14]
 
 In [44]: jumps = where(diff(indices) != 1)[0] + 1
 
 In [45]: starts = hstack((0, jumps))
 
 In [46]: ends = hstack((jumps, len(indices)))
 
 In [47]: slices = [slice(start, end) for start, end in zip(starts, ends)]
 
 In [48]: slices
 Out[48]: [slice(0, 4, None), slice(4, 5, None), slice(5, 9, None), slice(9, 
 12, None)]

If you're only going to use the slices to divide up the list, you could use 
numpy.split and skip creating the slice objects:

indices = [0,1,2,3,5,7,8,9,10,12,13,14]
jumps = numpy.where(numpy.diff(indices) != 1)[0] + 1
numpy.split(indices, jumps)

giving:
[array([0, 1, 2, 3]), array([5]), array([ 7,  8,  9, 10]), array([12, 13, 14])]

Zach

(btw, Warren, the method to calculate the jumps is cute. I'll have to remember 
that.)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fancy-indexing reorders output in corner cases?

2012-05-14 Thread Zachary Pincus
Hello all,

The below seems to be a bug, but perhaps it's unavoidably part of the indexing 
mechanism?

It's easiest to show via example... note that using [0,1] to pull two columns 
out of the array gives the same shape as using :2 in the simple case, but 
when there's additional slicing happening, the shapes get transposed or 
something.

In [2]: numpy.version.version # latest git version
Out[2]: '1.7.0.dev-3d4'

In [3]: d = numpy.empty((10, 9, 8, 7))

In [4]: d[:,:,:,[0,1]].shape
Out[4]: (10, 9, 8, 2)

In [5]: d[:,:,:,:2].shape
Out[5]: (10, 9, 8, 2)

In [6]: d[:,0,:,[0,1]].shape
Out[6]: (2, 10, 8)

In [7]: d[:,0,:,:2].shape
Out[7]: (10, 8, 2)

In [8]: d[0,:,:,[0,1]].shape
Out[8]: (2, 9, 8)

In [9]: d[0,:,:,:2].shape
Out[9]: (9, 8, 2)

Oddly, this error can appear/disappear depending on the position of the other 
axis sliced:
In [14]: d = numpy.empty((10, 9, 8))

In [15]: d[:,:,[0,1]].shape
Out[15]: (10, 9, 2)

In [16]: d[:,0,[0,1]].shape
Out[16]: (10, 2)

In [17]: d[0,:,[0,1]].shape
Out[17]: (2, 9)

This cannot be the expected behavior, right?
Zach

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?

2012-05-14 Thread Zachary Pincus
 On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus zachary.pin...@yale.edu 
 wrote:
 The below seems to be a bug, but perhaps it's unavoidably part of the 
 indexing mechanism?
 
 It's easiest to show via example... note that using [0,1] to pull two 
 columns out of the array gives the same shape as using :2 in the simple 
 case, but when there's additional slicing happening, the shapes get 
 transposed or something.
 
 When fancy indexing and slicing is mixed, the resulting shape is
 essentially unpredictable.

Aah, right -- this does come up on the list not infrequently, doesn't it. I'd 
always thought it was more exotic usages that raised these issues. Good to know.

  The correct way to do it is to only use
 fancy indexing, i.e. generate the indices of the sliced dimension as
 well.
 

Excellent -- thanks!


 Stéfan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x

2012-06-05 Thread Zachary Pincus
 There is a fine line here. We do need to make people clean up lax code in 
 order to improve numpy, but hopefully we can keep the cleanups reasonable.

Oh agreed. Somehow, though, I was surprised by this, even though I keep tabs on 
the numpy lists -- at no point did it become clear that big changes in how 
arrays get constructed and typecast are ahead that may require code fixes. 
That was my main point, but probably a PEBCAK issue more than anything.

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x

2012-06-05 Thread Zachary Pincus
 On Tue, Jun 5, 2012 at 8:41 PM, Zachary Pincus zachary.pin...@yale.edu
 wrote:
 
 There is a fine line here. We do need to make people clean up lax code
 in order to improve numpy, but hopefully we can keep the cleanups
 reasonable.
 
 Oh agreed. Somehow, though, I was surprised by this, even though I keep
 tabs on the numpy lists -- at no point did it become clear that big changes
 in how arrays get constructed and typecast are ahead that may require code
 fixes. That was my main point, but probably a PEBCAK issue more than
 anything.
 
 
 It was fairly extensively discussed when introduced,
 http://thread.gmane.org/gmane.comp.python.numeric.general/44206, and again
 at some later point.
 
 Those are the not-yet-finalized changes in 1.7; Zachary (I think) is
 talking about problems upgrading from ~1.5 to 1.6.

Yes, unless I'm wrong I experienced these problems from 1.5.something to 1.6.1. 
I didn't take notes as it was in the middle of a deadline-crunch so I just 
fixed the code and moved on (long, stupid story about why the upgrade before a 
deadline...). It's just that the issues mentioned above seem to have hit me too 
and I wanted to mention that. But unhelpfully, I think, without code, and now 
I've hijacked this thread! Sorry.

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303)

2012-10-14 Thread Zachary Pincus
 On 10.10.2012 15:42, Nathaniel Smith wrote:
 This PR submitted a few months ago adds a substantial new API to numpy,
 so it'd be great to get more review. No-one's replied yet, though...
 
 Any thoughts, anyone? Is it useful, could it be better...?
 
 Fast neighbor search is what scipy.spatial.cKDTree is designed for. 
 There is an brand new version in SciPy master.

It's not neighbor *search*, it's applying a function over an (arbitrary chosen 
and weighted) moving neighborhood in an nd array.

It would be useful for the author of the PR to post a detailed comparison of 
this functionality with scipy.ndimage.generic_filter, which appears to have 
very similar functionality.

Zach

 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303)

2012-10-15 Thread Zachary Pincus
Hi Tim,

It's tricky to find functionality sometimes because as you've seen numpy and 
especially scipy are spread over very diverse domains with very diverse 
terminology! Usually someone on one or the other of the lists can help folks 
find some functionality, if not by name than by description...

In any case, though, I hope you'll keep your code around and accessible to 
people in standalone form. Despite being a bit slower than the ndimage stuff, 
it seems like you've got a better set of boundary conditions, and some other 
niceties that may appeal to others too.

Zach



 On Sun, Oct 14, 2012 at 8:24 PM, Zachary Pincus zachary.pin...@yale.edu 
 wrote:
 It would be useful for the author of the PR to post a detailed comparison of 
 this functionality with scipy.ndimage.generic_filter, which appears to have 
 very similar functionality.
 
 I'll be durned.   I created neighbor because I didn't find what I wanted, and 
 to find now that I just didn't look in the right place is well ...  Let's 
 just say that I went for a long run last night.
 
 Searching for ndimage, I found that is has been around a long, long time.  
 First in numarray, then moved to scipy.
 
 Really I could only nitpick about minor differences - kinda like a primary 
 political campaign.  On the face of it though, generic_filter looks better.  
 First off it is written in C so likely will be faster and more efficient 
 memory use.  I didn't look at optimizing neighbor at all and at least my part 
 of it is pure Python.  Of course for all of the small differences, I like my 
 choices better.  :-)
 
 I would like to make a mild suggestion.  Emphasis on mild.  Maybe ndimage, in 
 all or in part, should be brought into (back into?) Numpy and renamed.  
 
 About the PR.  Given that the neighbor functionality exists already, I will 
 close the PR later today.  Move along, nothing to see here...
 
 Side note:  I wrote arraypad with the future idea that it would become 
 easyish to include that functionality in other places, for example in 
 neighbor.  A Don't Repeat Yourself idea.  Previously I had only seen Fortran 
 pad capabilities in some of the Fast Fourier Transform functions. The 
 generic_filter function includes several padding functions - written in C.  
 This means that if arraypad needs be more efficient we have C code to base a 
 better arraypad.
 
 Another side node:  The measurements functions in ndimage are called zonal 
 functions in the GIS field.
 
 Kindest regards,
 Tim
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Manipulate neighboring points in 2D array

2012-12-27 Thread Zachary Pincus
 I have 2D array, let's say: `np.random.random((100,100))` and I want to do 
 simple manipulation on each point neighbors, like divide their values by 3.
 
 So for each array value, x, and it neighbors n:
 
 n n nn/3 n/3 n/3
 n x n - n/3  x  n/3
 n n nn/3 n/3 n/3
 
 I searched a bit, and found about scipy ndimage filters, but if I'm not 
 wrong, there is no such function. Of course me being wrong is quite possible, 
 as I did not comprehend whole ndimage module, but I tried generic filter for 
 example and browser other functions.
 
 Is there better way to make above manipulation, instead using for loop over 
 every array element?

I am not sure I understand the above manipulation... typically neighborhood 
operators take an array element and the its neighborhood and then give a single 
output that becomes the value of the new array at that point. That is, a 3x3 
neighborhood filter would act as a function F(R^{3x3}) - R. It appears that 
what you're talking about above is a function F(R^{3x3}) - R^{3x3}. But how is 
this output to map onto the original array positions? Is the function to be 
applied to non-overlapping neighborhoods? Is it to be applied to all 
neighborhoods and then summed at each position to give the output array?

If you can describe the problem in a bit more detail, with perhaps some sample 
input and output for what you desire (and/or with some pseudocode describing 
how it would work in a looping-over-each-element approach), I'm sure folks can 
figure out how best to do this in numpy.

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Manipulate neighboring points in 2D array

2012-12-28 Thread Zachary Pincus
 You are right. I needed generic filter - to update current point, and not the 
 neighbors as I wrote.
 Initial code is slow loop over 2D python lists, which I'm trying to convert 
 to numpy and make it useful. In that loop there is inner loop for calculating 
 neighbors properties, which confused me yesterday, and mislead to search for 
 something that probably does not make sense.
 
 It's clear now :)

It's possible that some generic filter operations can be cast in terms of 
pure-numpy operations, or composed out of existing filters available in 
scipy.ndimage. If you can describe the filter operation you wish to perform, 
perhaps someone can make some suggestions.

Alternately, scipy.ndimage.generic_filter can take an arbitrary python 
function. Though it's not really fast...

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sampling arrays

2013-06-18 Thread Zachary Pincus
 Thank you, 
  No if the location ( space time or depth) of choice is not 
 available then the function I was looking for should give an interpolated 
 value at the choice.
 with best regards,
 Sudheer

scipy.ndimage.map_coordinates may be exactly what you want.


 - Original Message -
 
 From: Henry Gomersall h...@cantab.net
 To: Discussion of Numerical Python numpy-discussion@scipy.org
 Cc: 
 Sent: Sunday, 16 June 2013 2:49 PM
 Subject: Re: [Numpy-discussion] sampling arrays
 
 On Sun, 2013-06-16 at 14:48 +0800, Sudheer Joseph wrote:
 Is it possible to sample a 4D array of numpy at given dimensions with
 out writing loops? ie a smart python way?
 
 It's not clear how what you want to do is different from simply indexing
 the array...?
 
 Henry
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Relative speed

2013-08-29 Thread Zachary Pincus
 And as you pointed out, 
 most of the time for non-trivial datasets the numpy operations will be 
 faster. (I'm daunted by the notion of trying to do linear algebra on 
 lists of tuples, assuming that's the relevant set of operations given 
 the comparison to the matrix class.)

Note the important and pretty common exception of building up a list one 
element (or row of elements) at a time. Here, python lists usually rule, unless 
the final size is known in advance.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] str/bytes object from arr.data?

2010-09-27 Thread Zachary Pincus
As str objects are supposed to be immutable, I think anything  
official that makes a string from a numpy array is supposed to copy  
the data. But I think you can use ctypes to wrap a pointer and a  
length as a python string.

Zach


On Sep 27, 2010, at 8:28 AM, Francesc Alted wrote:

 Hi,

 Anybody knows a way to get a str object (bytes for Python = 2.6)  
 out of
 a buffer object (i.e. the .data attribute of ndarrays) without copying
 data?

 I need this for avoid creating a new function that deals with buffer
 objects (instead of reusing the one for str/byte objects that I  
 already
 have).  So, a matter of laziness :-)

 Thanks,

 -- 
 Francesc Alted
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] point line distance

2010-10-06 Thread Zachary Pincus
Here's a good list of basic geometry algorithms:
http://www.softsurfer.com/algorithms.htm

Zach


On Oct 6, 2010, at 5:08 PM, Renato Fabbri wrote:

 supose you have a line defined by two points and a point. you want  
 the distance

 what are easiest possibilities? i am doing it, but its nasty'n ugly

 -- 
 GNU/Linux User #479299
 skype: fabbri.renato
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] function name as parameter

2010-10-20 Thread Zachary Pincus
 I'm trying to write an implementation of the amoeba function from
 numerical recipes and need to be able to pass a function name and
 parameter list to be called from within the amoeba function.  Simply
 passing the name as a string doesn't work since python doesn't know it
 is a function and throws a typeerror.  Is there something similar to
 IDL's 'call_function' routine in python/numpy or a pythonic/numpy  
 means
 of passing function names?

Just pass the function itself! For example:

def foo():
   print 6

def call_function_repeatedly(func, count):
   for i in range(count):
 func()

call_function_repeatedly(foo, 2) # calls foo twice

bar = foo
bar() # still calls foo... we've just assigned the function to a  
different name


In python, functions (and classes, and everything else) are first- 
class objects and can be assigned to variables, passed around, etc,  
etc, just as anything else.

However, note that scipy.optimize.fmin implements the Nelder-Mead  
simplex algorithm, which is (I think) the same as the amoeba  
optimizer. Also you might be interested in the openopt package, which  
implements more optimizers a bit more consistently than scipy.optimize.

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] function name as parameter

2010-10-20 Thread Zachary Pincus
 Hi Robert,
 so in a big data analysis framework, that is essentially written in C 
 ++,
 exposed to python with SWIG, plus dedicated python modules, the user
 performs an analysis choosing some given modules by name,as in :
 myOpt=foo
 my_analyse.perform(use_optimizer=myOpt)

 The attribute use_optimizer is then checked to perform the right
 calls/instantiations of python but also C++ objects. and the  
 actual
 crunching number is in the C++ part.
 But then I realize that I need to tweak this optimizer's state, and  
 the
 optimizer object is accessible from a library pyOpt that has been
 swigified from a C++ library.
 Then I access the object by calling optObject =
 eval(pyOpt.%s(some_args)%myOpt)
 where myOpt would be foo in this particular analysis. This is  
 because
 what the attribute use_optimizer expects is also the object name in  
 the
 library, of course.
 It goes without saying that I could do :
 if myOpt==foo:
 optObject=pyOpt.foo(some_args)
 else:
 
 and if you guys tell me it is way safer, I will do that instead of the
 use of eval that I liked because of the compactness.

Well, optObject=getattr(pyOpt, myOpt) is a bit nicer and marginally  
more likely to return sane error messages than optObject=eval(pyOpt. 
%s%myOpt). Then you could do result=optObject(some_args)...

But why not have the user just pass in the relevant optObject from the  
pyOpt namespace (or some restricted namespace that just has the  
relevant functions exposed)? E.g.
my_analysis.perform(optimizer=pyOpt.Amoeba)
rather than
my_analysis.perform(optimizer='Amoeba')

This lets users do introspection on the pyOpt namespace to see what  
functions they can choose from, which is rather friendlier in an  
interactive environment.

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] short circuit != ?

2010-10-27 Thread Zachary Pincus
This is silly: the structure of the python language prevents  
meaningful short-circuiting in the case of
np.any(a!=b)

While it's true that np.any itself may short-circuit, the 'a!=b'  
statement itself will be evaluated in its entirety before the result  
(a boolean array) is passed to np.any. This is just how python (like  
most other languages) works. This means that the big-O complexity  
isn't reduced any: all elements of a and b have to be examined. If any  
short-circuits, then at least all the elements don't need to be  
examined twice!

If performance actually matters here (unlikely?), a quick bit of  
cython code could do this easily enough in the simple case where a and  
b are the same shape (so no broadcasting required). Note that the  
below hard-codes the dtype also.

import cython
cimport numpy
import numpy

TYPE_T = numpy.int64_t
TYPE = numpy.int64

@cython.boundscheck(False)
def equal(a, b):
   cdef:
 numpy.ndarray[TYPE, ndim=1, negative_indices=False] a_flat =  
a.astype(TYPE).flatten()
 numpy.ndarray[TYPE, ndim=1, negative_indices=False] b_flat =  
b.astype(TYPE).flatten()
 unsigned int i, l
   assert a_flat.shape[0] == b_flat.shape[0]
   l = a_flat.shape[0]
   for i in range(l):
 if a_flat[i] != b_flat[i]:
   return False
   return True

Zach



On Oct 27, 2010, at 9:44 AM, Skipper Seabold wrote:

 On Wed, Oct 27, 2010 at 9:37 AM, Johann Cohen-Tanugi
 co...@lpta.in2p3.fr wrote:


 On 10/27/2010 03:31 PM, Neal Becker wrote:
 Johann Cohen-Tanugi wrote:


 how about np.any(a!=b)  ??

 On 10/27/2010 12:25 PM, Neal Becker wrote:

 Is there a way to get a short circuit != ?

 That is, compare 2 arrays but stop as soon as the first element
 comparison fails?

 I'm assuming that np.all (a != b) will _not_ do this, but will  
 first
 compare all elements.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 I don't think that will do short ciruit, will it?  I think that  
 will compare
 each element, returning a bool array, then short-circuit eval that  
 bool
 array.


 In [3]: a=np.array([2,3,

 In [4]: b=np.array([2,5,

 In [5]: np.any(a!=b)
 Out[5]: True

 it does not return a bool array, it seems. I do not see how you would
 broadcast the notion of any along axes maybe?


 Not definitive by any means, but

 In [28]: a = np.arange(1,1000)

 In [29]: b = np.arange(1,1000)

 In [30]: timeit np.any(a!=b)
 10 loops, best of 3: 37.8 ms per loop

 In [31]: a[0] = 10.

 In [32]: timeit np.any(a!=b)
 10 loops, best of 3: 24.7 ms per loop

 In [33]: a[0] = 1

 In [34]: a[-1] = 1

 In [35]: timeit np.any(a!=b)
 10 loops, best of 3: 37.7 ms per loop

 It seems to at least take less time when the difference is at the
 beginning, though I'm sure there could be exceptions.

 Skipper
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

2010-10-27 Thread Zachary Pincus
 Help! I'm having a problem in searching through the *elements* if a  
 2d array. I have a loop over a numpy array:

 n,m = G.shape
 print n,m
 for i in xrange(n):
 for j in xrange(m):
 print type(G), type(G[i,j]), type(float(G[i,j]))
 g = float(abs(G[i,j]))
 if g  cut:
 print i,j,G[i,j]

 However, I'm getting the error message:

 The truth value of an array with more than one element is  
 ambiguous. Use a.any() or a.all()

 despite the fact that I'm not computing a truth value of an array.  
 I'd like to just compare to G[i,j] or abs(G[i,j]), which should be a  
 *single* value, and *not* an array. I even call 'float' here to make  
 sure that I cast this to a normal python float. But I still get this  
 error message.

Which line is raising the error?
My guess is it's the only truth-value testing line: if g  cut. (I'm  
not sure what else could error here in that way...) It looks like g is  
probably a scalar, but your code isn't showing where cut comes from,  
nor have you printed out it's type... is it an array?

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray __getattr__ to perform __getitem__

2010-10-29 Thread Zachary Pincus
 But wouldn't the performance hit only come when I use it in this  
 way?
 __getattr__ is only called if the named attribute is *not* found (I
 guess it falls off the end of the case statement, or is the result  
 of
 the attribute hash table miss).
 That's why I said that __getattr__ would perhaps work better.


 So do you want me to try out an implementation and supply a patch?  If
 so, where should I send the patch?

 Ian

Note that there are various extant projects that I think attempt to  
provide similar functionality to what you're wanting (unless I badly  
misread your original email, in which case apologies):
http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes

You might want to check these out (or pitch in with those efforts)  
before starting up your own variant. (Though if your idea has more  
modest goals, then it might be a good complement to these more  
extensive/intrusive solutions.)

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] N dimensional dichotomy optimization

2010-11-23 Thread Zachary Pincus

On Nov 23, 2010, at 10:57 AM, Gael Varoquaux wrote:

 On Tue, Nov 23, 2010 at 04:33:00PM +0100, Sebastian Walter wrote:
 At first glance it looks as if a relaxation is simply not possible:
 either there are additional rows or not.
 But with some technical transformations it is possible to reformulate
 the problem into a form that allows the relaxation of the integer
 constraint in a natural way.

 Maybe this is also possible in your case?

 Well, given that it is a cross-validation score that I am optimizing,
 there is not simple algorithm giving this score, so it's not obvious  
 at
 all that there is a possible relaxation. A road to follow would be to
 find an oracle giving empirical risk after estimation of the penalized
 problem, and try to relax this oracle. That's two steps further than  
 I am
 (I apologize if the above paragraph is incomprehensible, I am  
 getting too
 much in the technivalities of my problem.

 Otherwise, well, let me know if you find a working solution ;)

 Nelder-Mead seems to be working fine, so far. It will take a few weeks
 (or more) to have a real insight on what works and what doesn't.

Jumping in a little late, but it seems that simulated annealing might  
be a decent method here: take random steps (drawing from a  
distribution of integer step sizes), reject steps that fall outside  
the fitting range, and accept steps according to the standard  
annealing formula.

Something with a global optimum but spikes along the way is pretty  
well-suited to SA in general, and it's also an easy algorithm to make  
work on a lattice. If you're in high dimensions, there are also bolt- 
on methods for biasing the steps toward good directions as opposed  
to just taking isotropic random steps. Again, pretty easy to think of  
discrete implementations of this...

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Threshold

2010-12-02 Thread Zachary Pincus
 mask = numpy.zeros(medical_image.shape, dtype=uint16)
 mask[ numpy.logical_and( medical_image = lower, medical_image =  
 upper)] = 255

 Where lower and upper are the threshold bounds. Here I' m marking the
 array positions where medical_image is between the threshold bounds
 with 255, where isn' t with 0. The question is: Is there a better  
 way to do that?

This will give you a True/False boolean mask:
mask = numpy.logical_and( medical_image = lower, medical_image =  
upper)

And this a 0/255 mask:
mask = 255*numpy.logical_and( medical_image = lower, medical_image =  
upper)

You can make the code a bit more terse/idiomatic by using the bitwise  
operators, which do logical operations on boolean arrays:
mask = 255*((medical_image = lower)  (medical_image = upper))

Though this is a bit annoying as the bitwise ops ( | ^ ~) have higher  
precedence than the comparison ops ( =  =), so you need to  
parenthesize carefully, as above.

Zach


On Dec 2, 2010, at 7:35 AM, totonixs...@gmail.com wrote:

 Hi all,

 I' m developing a medical software named InVesalius [1], it is a free
 software. It uses numpy arrays to store the medical images (CT and
 MRI) and the mask, the mask is used to mark the region of interest and
 to create 3D surfaces. Those array generally have 512x512 elements.
 The mask is created based in threshold, with lower and upper bound,
 this way:

 mask = numpy.zeros(medical_image.shape, dtype=uint16)
 mask[ numpy.logical_and( medical_image = lower, medical_image =  
 upper)] = 255

 Where lower and upper are the threshold bounds. Here I' m marking the
 array positions where medical_image is between the threshold bounds
 with 255, where isn' t with 0. The question is: Is there a better way
 to do that?

 Thank!

 [1] - svn.softwarepublico.gov.br/trac/invesalius
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Arrays with aliased elements?

2011-01-01 Thread Zachary Pincus
def repeat(arr, num):
   arr = numpy.asarray(arr)
   return numpy.ndarray(arr.shape+(num,), dtype=arr.dtype,
 buffer=arr, strides=arr.strides+(0,))

There are limits to what these sort of stride tricks can accomplish,  
but repeating as above, or similar, is feasible.


On Jan 1, 2011, at 8:42 PM, Enzo Michelangeli wrote:

 Is there any way, not involving compilation of C code, to define  
 ndarrays
 where some rows or columns share the same data buffers? For example,
 something built by a hypothetical variant of the np.repeat()  
 function, such
 that, if a = array([2,3]), calling:

   b = np.aliasedrepeat(x, [1, 2], axis=0)

 would return in b:

   array([[2, 3],
  [2, 3],
  [2, 3]])

 ...with the understanding that the three rows would actually share  
 the same
 data, so setting e.g.:

   b[0,1] = 5

 ...would change b into:

   array([[2, 5],
  [2, 5],
  [2, 5]])

 In other words, something with a behaviour similar to a list of lists:

 a = [2,3]
 b = [a,a,a]
 b
 [[2, 3], [2, 3], [2, 3]]
 b[0][1] = 5
 b
 [[2, 5], [2, 5], [2, 5]]

 This would save memory (and time spent in unnecessary copying) in some
 applications with large arrays, and would allow to cope with the  
 current
 inability of weave.blitz to understand broadcasting rules, e.g. for
 calculating outer products (I mentioned this in a previous thread).

 Enzo

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Variable in an array name?

2011-01-12 Thread Zachary Pincus
 Is it possible to use a variable in an array name?  I am looping  
 through a
 bunch of calculations, and need to have each array as a separate  
 entity.
 I'm pretty new to python and numpy, so forgive my ignorance.  I'm  
 sure there
 is a simple answer, but I can't seem to find it.

 let's say i have a variable 'i':

 i = 5

 I would like my array to have the name array5

 I know how I could do this manually, but not in a loop where i is  
 redefined
 several times.

There are ways to do this, but what you likely actually want is just  
to put several arrays in a python list and then index into the list,  
instead of constructing numbered names.

e.g.:

array_list = []

for whatever:
   array_list.append(numpy.array(whatever))

for array in array_list:
   do_something(array)

given_array = array_list[i]
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Variable in an array name?

2011-01-12 Thread Zachary Pincus
 Thank you very much for the prompt response.  I have already done  
 what you
 have suggested, but there are a few cases where I do need to have an  
 array
 named with a variable (looping through large numbers of unrelated  
 files and
 calculations that need to be dumped into different analyses).  It  
 would be
 extraordinarily helpful if someone could post a solution to this  
 problem,
 regardless of inefficiency of the method.  Thanks a ton for any  
 additional
 help.

You could store arrays associated with string names, or other  
identifiers, (as opposed to integer indices) in a python dict.

Global and local namespaces are also just dicts that you can grab with  
globals() and locals(), if you really want to look up variable names  
algorithmically, but I promise you that this is really not what you  
want to be doing.

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NOOB Alert: Looping Through Text Files...

2011-01-14 Thread Zachary Pincus
 textlist = [test1.txt, test2.txt, test3.txt]

 for i in textlist:
   text_file = open(textlist, a)
   text_file.write(\nI suck at Python and need help)
   text_file.close()

 But, this doesn't work.  It gives me the error:

 coercing to Unicode: need string or buffer, list found

Yeah, it's probably the wrong list; still, easy enough to answer...

You want this:
   text_file = open(i, a)

to open the individual file, as opposed to:
   text_file = open(textlist, a)

which tries to open the whole list of filenames. Python cannot figure  
out how to turn the list into a single (unicode) filename, hence the  
error.

As for your wanting to write files with new names:

for txtfile in txtlist:
   f = open(txtfile, 'r')
   data = f.read()
   f.close()
   fnew = open(get_new_name(txtfile), 'w')
   fnew.write(data)
   fnew.write('\nHelp is on the way.')
   fnew.close()

where get_new_name() or whatever is defined appropriately to transform  
the old name ('test1.txt', say) into 'test1_appended.txt'...

Zach


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] create a numpy array of images

2011-01-31 Thread Zachary Pincus
 I am using python for a while now and I have a requirement of  
 creating a
 numpy array of microscopic tiff images ( this data is 3d, meaning  
 there are
 100 z slices of 512 X 512 pixels.) How can I create an array of  
 images?

 It's quite straightforward to create a 3-d array to hold this kind  
 of data:

 image_block = np.empty((100, 512, 512), dtype=??)

 now you can load it up by using some lib (PIL, or ???) to load the  
 tif
 images, and then:

 for i in images:
 image_block[i,:,:] = i

 Notice that since PIL 1.1.6, PIL Image objects support the numpy
 interface: http://effbot.org/zone/pil-changes-116.htm

For even longer than this, PIL has been somewhat broken with regard to  
16-bit images (very common in microscopy); you may run into strange  
byte-ordering issues that scramble the data on reading or writing.  
Also, PIL's numpy interface is somewhat broken in similar ways.  
(Numerous people have provided patches to PIL, but these are never  
incorporated into any releases, as far as I can tell.)

So try PIL, but if the images come out all wrong, you might want to  
check out the scikits.image package, which has hooks for various other  
image read/write tools.

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to do this efficiently?

2011-02-09 Thread Zachary Pincus
 In a 1-d array, find the first point where all subsequent points  
 have values
 less than a threshold, T.

Maybe something like:

last_greater = numpy.arange(arr.shape)[arr = T][-1]
first_lower = last_greater + 1

There's probably a better way to do it, without the arange, though...



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to do this efficiently?

2011-02-09 Thread Zachary Pincus
 In a 1-d array, find the first point where all subsequent points
 have values
 less than a threshold, T.

 Maybe something like:

 last_greater = numpy.arange(arr.shape)[arr = T][-1]
 first_lower = last_greater + 1

 There's probably a better way to do it, without the arange, though...

 I'm trying to find the first point in a power spectrum such that all  
 subsequent
 points are below some level.  I've started with:

 db is my power spectrum in dB,   It is already reversed.

 mag = np.maximum.accumulate (db) - db[-1]

 Now all I need is to find the first point such that mag  -50.  How  
 to do this
 efficiently?

Right -- that's what I showed above. Find the last point in mag that  
is = -50, and by definition the next point is the first point such  
that the remainder of mag is  -50.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to do this efficiently?

2011-02-09 Thread Zachary Pincus

On Feb 9, 2011, at 10:58 AM, Neal Becker wrote:

 Zachary Pincus wrote:

 In a 1-d array, find the first point where all subsequent points
 have values
 less than a threshold, T.

 Maybe something like:

 last_greater = numpy.arange(arr.shape)[arr = T][-1]
 first_lower = last_greater + 1

 There's probably a better way to do it, without the arange,  
 though...

 I'm trying to find the first point in a power spectrum such that all
 subsequent
 points are below some level.  I've started with:

 db is my power spectrum in dB,   It is already reversed.

 mag = np.maximum.accumulate (db) - db[-1]

 Now all I need is to find the first point such that mag  -50.  How
 to do this
 efficiently?

 Right -- that's what I showed above. Find the last point in mag that
 is = -50, and by definition the next point is the first point such
 that the remainder of mag is  -50.

 But where is numpy's 'find_first' function?  I can't seem to find it  
 so I had to
 make my own in C++.


As before, the line below does what you said you need, though not  
maximally efficiently. (Try it in an interpreter...) There may be  
another way in numpy that doesn't rely on constructing the index  
array, but this is the first thing that came to mind.

last_greater = numpy.arange(arr.shape)[arr = T][-1]

Let's unpack that dense line a bit:

mask = arr = T
indices = numpy.arange(arr.shape)
above_threshold_indices = indices[mask]
last_above_threshold_index = above_threshold_indices[-1]

Does this make sense?











___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to do this efficiently?

2011-02-09 Thread Zachary Pincus
 As before, the line below does what you said you need, though not
 maximally efficiently. (Try it in an interpreter...) There may be
 another way in numpy that doesn't rely on constructing the index
 array, but this is the first thing that came to mind.

 last_greater = numpy.arange(arr.shape)[arr = T][-1]

 Let's unpack that dense line a bit:

 mask = arr = T
 indices = numpy.arange(arr.shape)
 above_threshold_indices = indices[mask]
 last_above_threshold_index = above_threshold_indices[-1]

 Does this make sense?

 This assumes monotonicity. Is that allowed?

The twice-stated problem was:

 In a 1-d array, find the first point where all subsequent points  
 have values less than a threshold, T.

So that should do the trick... Though Alan's argmax solution is  
definitely a better one than indexing an indices array. Same logic and  
result though, just more compact.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to do this efficiently?

2011-02-09 Thread Zachary Pincus
 This assumes monotonicity. Is that allowed?

 The twice-stated problem was:

[Note to avert email-miscommunications] BTW, I wasn't trying to snipe  
at you with that comment, Josef...

I just meant to say that this solution solves the problem as Neal  
posed it, though that might not be the exact problem he has.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to do this efficiently?

2011-02-09 Thread Zachary Pincus

 In a 1-d array, find the first point where all subsequent points  
 have values
 less than a threshold.

 This doesn't imply monotonicity.
 Suppose with have a sin curve, and I want to find the last trough. Or
 a business cycle and I want to find the last recession.

 Unless my english deteriorated recently.


Not sure that I follow? I read the statement as:
find the smallest non-negative index i such that array[j]  t for all  
i = j  len(array)

for which the various solutions proposed work, except for the corner  
case where array.min() = t.

I'm not sure where monotonicity comes into play, unless one interprets  
the all subsequent points clause as some number of subsequent  
points or something...

For those sort of problems, I usually wind up smoothing the array  
[optional], and then using scipy.ndimage.maximum to calculate peak  
values within a given window, and then find the points that equal the  
peak values -- this works pretty robustly for peak/trough detection in  
arbitrary dimension.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [OT] any image io module that works with python3?

2011-03-12 Thread Zachary Pincus
Here's a ctypes interface to FreeImage that I wrote a while back and  
was since cleaned up (and maintained) by the scikits.image folk:

https://github.com/stefanv/scikits.image/blob/master/scikits/image/io/_plugins/freeimage_plugin.py

If it doesn't work out of the box on python 3, then it should be  
pretty simple to fix.

Zach



On Mar 12, 2011, at 4:40 AM, Christoph Gohlke wrote:



 On 3/12/2011 1:08 AM, Nadav Horesh wrote:
 Having numpy, scipy, and matplotlib working reasonably with  
 python3, a
 major piece of code I miss for a major python3 migration is an  
 image IO.
 I found that pylab's imread works fine for png image, but I need to  
 read
 all the other image format as well as png and jpeg output.
 Any hints (including advices how easyly construct my own module) are
 appreciated.
 Nadav.


 On Windows, PIL (private port at
 http://www.lfd.uci.edu/~gohlke/pythonlibs/#pil), PythonMagick
 http://www.imagemagick.org/download/python/, and pygame 1.9.2pre
 http://www.pygame.org are working reasonably well for image IO. Also
 the FreeImage library http://freeimage.sourceforge.net/ is easy to  
 use
 with ctypes http://docs.python.org/py3k/library/ctypes.html.

 Christoph
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of np.minimum and np.maximum

2011-04-06 Thread Zachary Pincus

 a, b, c = np.array([10]), np.array([2]), np.array([7])
 min_val = np.minimum(a, b, c)
 min_val
 array([2])
 max_val = np.maximum(a, b, c)
 max_val
 array([10])
 min_val
 array([10])

 (I'm using numpy 1.4, and I observed the same behavior with numpy
 2.0.0.dev8600 on another machine). I'm quite surprised by this  
 behavior
 (It took me quite a long time to figure out what was happening in a
 script of mine that wasn't giving what I expected, because of
 np.maximum changing the output of np.minimum). Is it a bug, or am I
 missing something?

Read the documentation for numpy.minimum and numpy.maximum: they give  
you element-wise minimum values from two arrays passed as arguments.  
E.g.:

  numpy.minimum([1,2,3],[3,2,1])
array([1, 2, 1])

The optional third parameter to numpy.minimum is an out array - an  
array to place the results into instead of making a new array for that  
purpose. (This can save time / memory in various cases.)

This should therefore be enough to explain the above behavior. (That  
is, min_val and max_val wind up being just other names for the array  
'c', which gets modified in-place by the numpy.minimum and  
numpy.maximum.)

If you want the minimum value of a sequence of arbitrary length, use  
the python min() function. If you have a numpy array already and you  
want the minimum (global, or along a particular axis), use the min()  
method of the array, or numpy.min(arr).

Zach

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reading tiff images

2011-04-26 Thread Zachary Pincus

On Apr 26, 2011, at 2:31 PM, Daniel Lepage wrote:

 You need PIL no matter what; scipy.misc.imread, scipy.ndimage.imread,
 and scikits.image.io.imread all call PIL.

scikits.image.io also has a ctypes wrapper for the freeimage library.  
I prefer these (well, I wrote them), though apparently there are some  
64-bit issues (crashes?). I haven't been working on a 64-bit system so  
I haven't been able to address them, but I will be soon. It's a very  
thin wrapper around a simple image IO library, so there's lots of room  
to add and extend as need be...

All of the PIL wrappers are kluges around serious flaws in how PIL  
reads images, particularly non-8-bit images and in particular non- 
native-endian 16-bit images.

Zach


 Theoretically there's no difference between any of them, although in
 actuality some use import Image and others use from PIL import
 Image; one of these may fail depending on how you installed PIL. (I'm
 not sure which is supposed to be standard - the PIL docs use both
 interchangeably, and I think the latest version of PIL on pypi sets it
 up so that both will work).

 I'd use whichever tool you're already importing - if you're using
 ndimage anyway, just use ndimage.imread rather than adding more
 imports.

 Note that using PIL directly is easy, but does require adding an extra
 step; OTOH, if you're familiar with PIL, you can use some of its
 transformations from the start, e.g.

 def imread(fname, mode='RGBA'):
return np.asarray(Image.open(fname).convert(mode))

 to ensure that you always get 4-channel images, even for images that
 were initially RGB or grayscale.

 HTH,
 Dan

 On Tue, Apr 26, 2011 at 2:00 PM, Mathew Yeates  
 mat.yea...@gmail.com wrote:
 Hi
 What is current method of using ndiimage on a Tiff file? I've seen
 different methods using ndimage itself, scipy.misc and Pil.

 Mathew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast grayscale conversion

2011-06-20 Thread Zachary Pincus
You could try:
src_mono = src_rgb.astype(float).sum(axis=-1) / 3.

But that speed does seem slow. Here are the relevant timings on my machine (a 
recent MacBook Pro) for a 3.1-megapixel-size array:
In [16]: a = numpy.empty((2048, 1536, 3), dtype=numpy.uint8)

In [17]: timeit numpy.dot(a.astype(float), numpy.ones(3)/3.)
10 loops, best of 3: 116 ms per loop

In [18]: timeit a.astype(float).sum(axis=-1)/3.
10 loops, best of 3: 85.3 ms per loop

In [19]: timeit a.astype(float)
10 loops, best of 3: 23.3 ms per loop




On Jun 20, 2011, at 4:15 PM, Alex Flint wrote:

 At the moment I'm using numpy.dot to convert a WxHx3 RGB image to a grayscale 
 image:
 
 src_mono = np.dot(src_rgb.astype(np.float), np.ones(3)/3.);
 
 This seems quite slow though (several seconds for a 3 megapixel image) - is 
 there a more specialized routine better suited to this?
 
 Cheers,
 Alex
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] poor performance of sum with sub-machine-word integer types

2011-06-21 Thread Zachary Pincus
Hello all,

As a result of the fast greyscale conversion thread, I noticed an anomaly 
with numpy.ndararray.sum(): summing along certain axes is much slower with 
sum() than versus doing it explicitly, but only with integer dtypes and when 
the size of the dtype is less than the machine word. I checked in 32-bit and 
64-bit modes and in both cases only once the dtype got as large as that did the 
speed difference go away. See below...

Is this something to do with numpy or something inexorable about machine / 
memory architecture?

Zach

Timings -- 64-bit mode:
--
In [2]: i = numpy.ones((1024,1024,4), numpy.int8)
In [3]: timeit i.sum(axis=-1)
10 loops, best of 3: 131 ms per loop
In [4]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 2.57 ms per loop

In [5]: i = numpy.ones((1024,1024,4), numpy.int16)
In [6]: timeit i.sum(axis=-1)
10 loops, best of 3: 131 ms per loop
In [7]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 4.75 ms per loop

In [8]: i = numpy.ones((1024,1024,4), numpy.int32)
In [9]: timeit i.sum(axis=-1)
10 loops, best of 3: 131 ms per loop
In [10]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 6.37 ms per loop

In [11]: i = numpy.ones((1024,1024,4), numpy.int64)
In [12]: timeit i.sum(axis=-1)
100 loops, best of 3: 16.6 ms per loop
In [13]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 15.1 ms per loop



Timings -- 32-bit mode:
--
In [2]: i = numpy.ones((1024,1024,4), numpy.int8)
In [3]: timeit i.sum(axis=-1)
10 loops, best of 3: 138 ms per loop
In [4]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 3.68 ms per loop

In [5]: i = numpy.ones((1024,1024,4), numpy.int16)
In [6]: timeit i.sum(axis=-1)
10 loops, best of 3: 140 ms per loop
In [7]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 4.17 ms per loop

In [8]: i = numpy.ones((1024,1024,4), numpy.int32)
In [9]: timeit i.sum(axis=-1)
10 loops, best of 3: 22.4 ms per loop
In [10]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 12.2 ms per loop

In [11]: i = numpy.ones((1024,1024,4), numpy.int64)
In [12]: timeit i.sum(axis=-1)
10 loops, best of 3: 29.2 ms per loop
In [13]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
10 loops, best of 3: 23.8 ms per loop

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] poor performance of sum with sub-machine-word integer types

2011-06-21 Thread Zachary Pincus
On Jun 21, 2011, at 1:16 PM, Charles R Harris wrote:

 It's because of the type conversion sum uses by default for greater precision.

Aah, makes sense. Thanks for the detailed explanations and timings!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-26 Thread Zachary Pincus
Hello Keith,

While I also echo Johann's points about the arbitrariness and non-utility of 
benchmarking I'll briefly comment just on just a few tests to help out with 
getting things into idiomatic python/numpy:

Tests 1 and 2 are fairly pointless (empty for loop and empty procedure) that 
won't actually influence the running time of well-written non-pathological code.

Test 3: 
#Test 3 - Add 20 scalar ints
nrep = 200 * scale_factor
for i in range(nrep):
a = i + 1

well, python looping is slow... one doesn't do such loops in idiomatic code if 
the underlying intent can be re-cast into array operations in numpy. But here 
the test is on such a simple operation that it's not clear how to recast in a 
way that would remain reasonable. Ideally you'd test something like:
i = numpy.arange(20)
for j in range(scale_factor):
  a = i + 1

but that sort of changes what the test is testing.


Finally, test 21:
#Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar
for i in range(nrep):
b = scipy.ndimage.filters.median_filter(a, size=(5, 5))
timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep)

A median filter is definitely NOT a boxcar filter! You want uniform_filter:

In [4]: a = numpy.empty((1000,1000))

In [5]: timeit scipy.ndimage.filters.median_filter(a, size=(5, 5))
10 loops, best of 3: 93.2 ms per loop

In [6]: timeit scipy.ndimage.filters.uniform_filter(a, size=(5, 5))
10 loops, best of 3: 27.7 ms per loop

Zach


On Sep 26, 2011, at 10:19 AM, Keith Hughitt wrote:

 Hi all,
 
 Myself and several colleagues have recently started work on a Python library 
 for solar physics, in order to provide an alternative to the current mainstay 
 for solar physics, which is written in IDL.
 
 One of the first steps we have taken is to create a Python port of a popular 
 benchmark for IDL (time_test3) which measures performance for a variety of 
 (primarily matrix) operations. In our initial attempt, however, Python 
 performs significantly poorer than IDL for several of the tests. I have 
 attached a graph which shows the results for one machine: the x-axis is the 
 test # being compared, and the y-axis is the time it took to complete the 
 test, in milliseconds. While it is possible that this is simply due to 
 limitations in Python/Numpy, I suspect that this is due at least in part to 
 our lack in familiarity with NumPy and SciPy.
 
 So my question is, does anyone see any places where we are doing things very 
 inefficiently in Python?
 
 In order to try and ensure a fair comparison between IDL and Python there are 
 some things (e.g. the style of timing and output) which we have deliberately 
 chosen to do a certain way. In other cases, however, it is likely that we 
 just didn't know a better method.
 
 Any feedback or suggestions people have would be greatly appreciated. 
 Unfortunately, due to the proprietary nature of IDL, we cannot share the 
 original version of time_test3, but hopefully the comments in time_test3.py 
 will be clear enough.
 
 Thanks!
 Keith
 sunpy_time_test3_idl_python_2011-09-26.png___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-29 Thread Zachary Pincus
I think the remaining delta between the integer and float boxcar smoothing is 
that the integer version (test 21) still uses median_filter(), while the float 
one (test 22) is using uniform_filter(), which is a boxcar.

Other than that and the slow roll() implementation in numpy, things look pretty 
solid, yes?

Zach


On Sep 29, 2011, at 12:11 PM, Keith Hughitt wrote:

 Thank you all for the comments and suggestions.
 
 First off, I would like to say that I entirely agree with people's 
 suggestions about lack of objectiveness in the test design, and the caveat 
 about optimizing early. The main reason we put together the Python version of 
 the benchmark was as a quick sanity check to make sure that there are no 
 major show-stoppers before we began work on the library. We also wanted to 
 put together something to show other people who are firmly in the IDL camp 
 that this is a viable option.
 
 We did in fact put together another short test-suite (test_testr.py  
 time_testr.pro) which consists of operations that would are frequently used 
 by us, but it also is testing a very small portion of the kinds of things our 
 library will eventually do.
 
 That said, I made a few small changes to the original benchmark, based on 
 people's feedback, and put together a new plot.
 
 The changes made include:
 
 1. Using xrange instead of range
 2. Using uniform filter instead of median filter
 3. Fixed a typo for tests 2  3 which resulted in slower Python results
 
 Again, note that some of the tests are testing non-numpy functionality. 
 Several of the results still stand out,  but overall the results are much 
 more reasonable than before.
 
 Cheers,
 Keith
 time_test3_idl_vs_python.png___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] wanted: decent matplotlib alternative

2011-10-13 Thread Zachary Pincus
I keep meaning to use matplotlib as well, but every time I try I also get 
really turned off by the matlabish interface in the examples. I get that it's a 
selling point for matlab refugees, but I find it counterintuitive in the same 
way Christoph seems to.

I'm glad to hear the OO interface isn't as clunky as it looks on some of the 
doc pages, though. This is good news. Can anyone point out any good 
tutorials/docs on using matplotlib idiomatically via its OO interface?

Zach



On Oct 13, 2011, at 3:21 PM, Joe Kington wrote:

 Have a look at Chaco: http://code.enthought.com/chaco/  If you're wanting a 
 more pythonic api, it's a good choice.
 
 Personally, I still prefer matplotlib.
 
 You don't every need to touch the state machine interface. 
 
 The OO interface is slighly un-pythonic, but it's hardly clunky. I think 
 you're referring to one of the webpage examples of it which avoids _any_ 
 convenience functions.  You can still use the convenience functions without 
 having to rely on the state machine in any way. E.g.:
 
 import matplotlib.pyplot as plt
 
 fig, axes = plt.subplots(ncols=4)
 
 for ax in axes:
 ax.plot(range(10))
 
 plt.show()
 
 All in all, matplotlib deliberately tries to mimic matlab for a lot of the 
 conventions.  This is mostly to make it easier to switch if you're already 
 familiar with matlab.
 
 To each his own, but for better or worse, matplotlib is the most widely used 
 plotting library for python.  It's worth getting a bit more familiar with, if 
 nothing else just to see past some of the rough edges.
 
 -Joe
 
 
 
 On Thu, Oct 13, 2011 at 1:58 PM, Christoph Groth c...@falma.de wrote:
 Hello,
 
 Is it just me who thinks that matplotlib is ugly and a pain to use?  So
 far I haven't found a decent alternative usable from within python.  (I
 haven't tried all the packages out there.)  I'm mostly interested in 2d
 plots.  Who is happy enough with a numpy-compatible plotting package to
 recommend it?
 
 Thanks,
 
 Christoph
 
 
 
 
 A few things I can't stand about matplotlib:
 
  * It works as a state machine.  (There is an OO-API, too, but it's ugly
   and cumbersome to use, and most examples use the state machine mode
   or, even worse, a mixture of OO and global state.)
 
  * It uses inches by default. (I propose to switch to nails: 1 nail = 3
   digits = 2 1/4 inches = 1/16 yard.)
 
  * subplot(211)  (ugh!)
 
  * Concepts are named in a confusing way. (ax = subplot(112) anyone?)
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy nogil API

2011-10-31 Thread Zachary Pincus
 As an example, it'd be nice to have scipy.ndimage available without the GIL:
 http://docs.scipy.org/doc/scipy/reference/ndimage.html
 
 Now, this *can* easily be done as the core is written in C++. I'm just 
 pointing out that some people may wish more for calling scipy.ndimage 
 inside their prange than for some parts of NumPy.

Not exactly to your larger point wrt the GIL, but I *think* some skimage (née 
scikits.image) folks are trying to rewrite most of ndimage's functionality in 
cython. I don't know what the status of this effort is though...

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] array copy-to-self and views

2007-02-01 Thread Zachary Pincus
Hello folks,

I recently was trying to write code to modify an array in-place (so  
as not to invalidate any references to that array) via the standard  
python idiom for lists, e.g.:

a[:] = numpy.flipud(a)

Now, flipud returns a view on 'a', so assigning that to 'a[:]'  
provides pretty strange results as the buffer that is being read (the  
view) is simultaneously modified. Here is an example:

In [2]: a = numpy.arange(10).reshape((5,2))
In [3]: a
Out[3]:
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
In [4]: numpy.flipud(a)
Out[4]:
array([[8, 9],
[6, 7],
[4, 5],
[2, 3],
[0, 1]])
In [5]: a[:] = numpy.flipud(a)
In [6]: a
Out[6]:
array([[8, 9],
[6, 7],
[4, 5],
[6, 7],
[8, 9]])

A question, then: Does this represent a bug? Or perhaps there is a  
better idiom for modifying an array in-place than 'a[:] = ...'? Or is  
incumbent on the user to ensure that any time an array is directly  
modified, that the modifying array is not a view of the original array?

Thanks for any thoughts,

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array copy-to-self and views

2007-02-01 Thread Zachary Pincus
 Zachary Pincus wrote:
 Hello folks,

 I recently was trying to write code to modify an array in-place (so
 as not to invalidate any references to that array)

 I'm not sure what this means exactly.

Say one wants to keep two different variables referencing a single in- 
memory list, as so:
a = [1,2,3]
b = a
Now, if 'b' and 'a' go to live in different places (different class  
instances or whatever) but we want 'b' and 'a' to always refer to the  
same in-memory object, so that 'id(a) == id(b)', we need to make sure  
to not assign a brand new list to either one.

That is, if we do something like 'a = [i + 1 for i in a]' then 'id 
(a) != id(b)'. However, we can do 'a[:] = [i + 1 for i in a]' to  
modify a in-place. This is not super-common, but it's also not an  
uncommon python idiom.

I was in my email simply pointing out that naïvely translating that  
idiom to the numpy case can cause unexpected behavior in the case of  
views.

I think that this is is unquestionably a bug -- isn't the point of  
views that the user shouldn't need to care if a particular array  
object is a view or not? Given the lack of methods to query whether  
an array is a view, or what it might be a view on, this seems like a  
reasonable perspective... I mean, if certain operations produce  
completely different results when one of the operands is a view, that  
*seems* like a bug. It might not be worth fixing, but I can't see how  
that behavior would be considered a feature.

However, I do think there's a legitimate question about whether it  
would be worth fixing -- there could be a lot of complicated checks  
to catch these kind of corner cases.

 via the standard
 python idiom for lists, e.g.:

 a[:] = numpy.flipud(a)

 Now, flipud returns a view on 'a', so assigning that to 'a[:]'
 provides pretty strange results as the buffer that is being read (the
 view) is simultaneously modified.

 yes, weird. So why not just:

 a = numpy.flipud(a)

 Since flipud returns a view, the new a will still be using the same
 data array. Does this satisfy your need above?

Nope -- though 'a' and 'numpy.flipud(a)' share the same data, the  
actual ndarray instances are different. This means that any other  
references to the 'a' array (made via 'b = a' or whatever) now refer  
to the old 'a', not the flipped one.

The only other option for sharing arrays is to encapsulate them as  
attributes of *another* object, which itself won't change. That seems  
a bit clumsy.

 It's too bad that to do this you need to know that flipud created a
 view, rather than a copy of the data, as if it were a copy, you would
 need to do the a[:] trick to make sure a kept the same data, but  
 that's
 the price we pay for the flexibility and power of numpy -- the
 alternative is to have EVERYTHING create a copy, but there were be a
 substantial performance hit for that.

Well, Anne's email suggests another alternative -- each time a view  
is created, keep track of the original array from whence it came, and  
then only make a copy when collisions like the above would take place.

And actually, I suspect that views already need to keep a reference  
to their original array in order to keep that array from being  
deleted before the view is. But I don't know the guts of numpy well  
enough to say for sure.

 NOTE: the docstring doesn't make it clear that a view is created:

 help(numpy.flipud)
 Help on function flipud in module numpy.lib.twodim_base:

 flipud(m)
  returns an array with the columns preserved and rows flipped in
  the up/down direction.  Works on the first dimension of m.

 NOTE2: Maybe these kinds of functions should have an optional flag  
 that
 specified whether you want a view or a copy -- I'd have expected a  
 copy
 in this case!

Well, it seems like in most cases one does not need to care whether  
one is looking at a view or an array. The only time that comes to  
mind is when you're attempting to modify the array in-place, e.g.
a[something] = something else

Even if the maybe-bug above were easily fixable (again, not sure  
about that), you might *still* want to be able to figure out if a  
were a view before such a modification. Whether this needs a runtime  
'is_view' method, or just consistent documentation about what returns  
a view, isn't clear to me. Certainly the latter couldn't hurt.

 QUESTION:
 How do you tell if two arrays are views on the same data: is  
 checking if
 they have the same .base reliable?

 a = numpy.array((1,2,3,4))
 b = a.view()
 a.base is b.base
 False

 No, I guess not. Maybe .base should return self if it's the originator
 of the data.

 Is there a reliable way? I usually just test by changing a value in  
 one
 to see if it changes in the other, but that's one heck of kludge!

 a.__array_interface__['data'][0] == b.__array_interface__['data'] 
 [0]
 True

 seems to work, but that's pretty ugly!

Good question. As I mentioned above, I assume that this information  
is tracked internally to prevent

Re: [Numpy-discussion] array copy-to-self and views

2007-02-01 Thread Zachary Pincus
 A question, then: Does this represent a bug? Or perhaps there is a
 better idiom for modifying an array in-place than 'a[:] = ...'? Or is
 incumbent on the user to ensure that any time an array is directly
 modified, that the modifying array is not a view of the original  
 array?


 Yes, it is and has always been incumbent on the user to ensure that  
 any
 time an array is directly modified in-place that the modifying  
 array is
 not a view of the original array.

Fair enough. Now, how does a user ensure this -- say someone like me,  
who has been using numpy (et alia) for a couple of years, but clearly  
not long enough to have an 'intuitive' feel for every time something  
might be a view (a feeling that must seem quite natural to long-time  
numpy users, who may have forgotten precisely how long it takes to  
develop that level of intuition)?

Documentation of what returns views helps, for sure. Would any other  
'training' mechanisms help? Say a function that (despite Tim's pretty  
reasonable 'don't do that' warning) will return true when two arrays  
have overlapping memory? Or an 'inplace_modify' function that takes  
the time to make that check?

Perhaps I'm the first to have views bite me in this precise way.  
However, if there are common failure-modes with views, I hope it's  
not too unreasonable to ask about ways that those common problems  
might be addressed. (Other than just saying train for ten years, and  
you too will have numpy-fu, my son.) Giving newbies tools to deal  
with common problems with admittedly dangerous constructs might be  
useful.

Zach
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] linalg.eigh orders eigenvalues/eigenvectors differently than linalg.eig

2007-02-19 Thread Zachary Pincus
Hello all,

It seems that the 'eigh' routine from numpy.linalg does not follow  
the same convention as numpy.linalg.eig in terms of the order of the  
returned eigenvalues. (And thus eigenvectors as well...)

Specifically, eig returns eigenvalues in order from largest to  
smallest, while eigh returns them from smallest to largest.

Example:
  a = numpy.array([[21, 28, 35],[28, 38, 48],[35, 48, 61]])
  numpy.linalg.eigh(a)
(array([ -1.02825542e-14,   7.04131679e-01,   1.19295868e+02]),
array([[ 0.40824829, -0.81314396, -0.41488581],
[-0.81649658, -0.12200588, -0.56431188],
[ 0.40824829,  0.56913221, -0.71373795]]))

  numpy.linalg.eig(a)
(array([  1.19295868e+02,   7.04131679e-01,   4.62814557e-15]),
array([[-0.41488581, -0.81314396,  0.40824829],
[-0.56431188, -0.12200588, -0.81649658],
[-0.71373795,  0.56913221,  0.40824829]]))

Is this a bug? If it is, though, fixing it now might break code that  
depends on this 'wrong' order. (This is also present in  
scipy.linalg.) If not a bug, or not-fixable-now, then at least some  
documentation as to the convention regarding ordering of eigenvalues  
is probably worthwhile...

Any thoughts?

Zach
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Distributing prebuilt numpy and other extensions

2007-02-20 Thread Zachary Pincus
Hello folks,

I've developed some command-line tools for biologists using python/ 
numpy and some custom C and Fortran extensions, and I'm trying to  
figure out how to easily distribute them...

For people using linux, I figure a source distribution is no problem  
at all. (Right?)
On the other hand, for Mac users (whose computers by default don't  
have the dev tools, and even then would need to get a fortran  
compiler elsewhere) I'd like to figure out something a bit easier.

I'd like to somehow provide an installer (double-clickable or python  
script) that does a version check and then installs an appropriate  
version of prebuilt binaries for numpy and my C and Fortran  
extensions. Is this possible within the bounds of the python or numpy  
distutils? Would setuptools be a better way to go? Preferably it  
would be a dead easy, one-step thing...

Or is this whole idea problematic, and better to stick with source  
distribution in all cases?

Thanks for any advice,

Zach Pincus


Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Greek Letters

2007-02-20 Thread Zachary Pincus
I have found that the python 'unicode name' escape sequence, combined  
with the canonical list of unicode names ( http://unicode.org/Public/ 
UNIDATA/NamesList.txt ), is a good way of getting the symbols you  
want and still keeping the python code legible.

 From the above list, we see that the symbol name we want is GREEK  
SMALL LETTER CHI, so:
chi = u'\N{GREEK SMALL LETTER CHI}'
will do the trick. For chi^2, use:
chi2 = u'\N{GREEK SMALL LETTER CHI}\N{SUPERSCRIPT TWO}'

Note that to print these characters, we usually need to encode them  
somehow. My terminal supports UTF-8, so the following works for me:
import codecs
print codecs.encode(chi2, 'utf8')

giving (if your mail reader supports utf8 and mine encodes it  
properly...):
χ²

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine


On Feb 20, 2007, at 3:56 PM, Mark Janikas wrote:

 Hello all,



 I was wondering how I could print the chi-squared symbol in  
 python.  I have been looking at the Unicode docs, but I figured I  
 would ask for assistance here while I delve into it.  Thanks for  
 any help in advance.



 Mark Janikas

 Product Engineer

 ESRI, Geoprocessing

 380 New York St.

 Redlands, CA 92373

 909-793-2853 (2563)

 [EMAIL PROTECTED]



 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] what goes wrong with cos(), sin()

2007-02-21 Thread Zachary Pincus
Your results are indeed around zero.

  numpy.allclose(0, 1.22460635382e-016)
True

It's not exactly zero because floating point math is in general not  
exact. You'll need to check out a reference about doing floating  
point operations numerically for more details, but in general you  
should not expect exact results due to the limited precision of any  
fixed-width digital representation of floats.

A corrolary: in general do not two floating-point values for equality  
-- use something like numpy.allclose. (Exception -- equality is  
expected if the exact sequence of operations to generate two numbers  
were identical.)

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine

On Feb 21, 2007, at 10:11 AM, WolfgangZillig wrote:

 Hi,

 I'm quite new to numpy/scipy so please excuse if my problem is too  
 obvious.

 example code:

 import numpy as n
 print n.sin(n.pi)
 print n.cos(n.pi/2.0)

 results in:
 1.22460635382e-016
 6.12303176911e-017

 I've expected something around 0. Can anybody explain what I am doing
 wrong here?

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] the neighbourhood of each element of an array

2007-02-23 Thread Zachary Pincus
Scipy's ndimage module has a function that takes a generic callback  
and calls it with the values of each neighborhood (of a given size,  
and optionally with a particular mask footprint) centered on each  
array element. That function handles boundary conditions, etc nicely.

Unfortunately, I'm not sure if it works with masked arrays, and I  
think it hands a ravel'd set of pixels back to the callback function.  
You could probably hack masking in there by passing it the mask  
concatenated to the array, and then deal with the mask explicitly.

Zach


On Feb 23, 2007, at 11:39 AM, Bryan Cole wrote:

 On Fri, 2007-02-23 at 17:38 +0100, [EMAIL PROTECTED] wrote:
 Hi,

 Given a (possibly masked) 2d array x, is there a fast(er) way in  
 Numpy to obtain
 the same result as the following few lines?

 d = 1  # neighbourhood 'radius'
 Nrow = x.shape[0]
 Ncol = x.shape[1]
 y = array([[x[i-d:i+d+1,j-d:j+d+1].ravel() for j in range(d,Ncol- 
 d)]  \
for i in range(d,Nrow-d)])


 how about something like

 er = Nrow - d
 ec = Ncol - d
 y = array([x[i:er+i, j:ec+j] for j in arange(-d,d)
   for i in arange(-d,d)])

 now you're looping over a small array and combining slices of the big
 array (as opposed to looping over the big array and combining slices
 from a small one). This should be faster for large Nrow, Ncol.

 BC


 What you get is an array containing all the elements in a  
 neighbourhood for each
 element, disregarding the edges to avoid out-of-range problems.  
 The code above
 becomes quite slow for e.g. a 2000x2000 array. Does anyone know a  
 better
 approach?

 Ciao,
 Joris



 Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Questions about cross-compiling extensions for mac-ppc and mac-intel

2007-02-24 Thread Zachary Pincus
Hi folks,

I've been doing a lot of web-reading on the subject, but have not  
been completely able to synthesize all of the disparate bits of  
advice about building python extensions as Mac-PPC and Mac-Intel fat  
binaries, so I'm turning to the wisdom of this list for a few questions.

My general goal is to make a double-clickable Mac installer of a set  
of tools built around numpy, numpy's distutils, a very hacked-up  
version of PIL, and some fortran code too. To this end, I need to  
figure out how to get the numpy distutils to cross-compile,  
generating PPC code and Intel code in separate builds -- and/or  
generating a universal binary all in one go. (I'd like to distribute  
a universal version of numpy, but I think that my own code needs to  
be built/distributed separately for each architecture due to endian- 
ness issues.)

Is there explicit support in distutils for this, or is it a matter of  
setting the proper environment variables to entice gcc and gfortran  
to generate code for a specific architecture?

One problem is that PIL is a tricky beast, even in the neutered form  
that I'm using it. It does a compile-time check for the endian-ness  
of the system, and a compile-time search for the zlib to use, both of  
which are problematic.

To address the former, I'd like to be able to (say) include something  
like 'config_endian --big' on the 'python setup.py' command-line, and  
have that information trickle down to the PIL config script (a few  
subpackages deep). Is this easy or possible?

To address the latter, I think I need to have the PIL extensions  
dynamically link against '/Developer/SDKs/MacOSX10.4u.sdk/usr/lib/ 
libz.dylib' which is the fat-binary version of the library, using the  
headers from '/Developer/SDKs/MacOSX10.4u.sdk/usr/include/zlib.h
'. Right now, PIL is using system_info from numpy.distutils to find  
the valid library paths on which libz and its headers might live.  
This is nice and more or less platform-neutral, which I like. How  
best should I convince/configure numpy.distutils.system_info to put '/ 
Developer/SDKs/MacOSX10.4u.sdk/usr/{lib,include}' on the output to  
get_include_dirs() and get_lib_dirs()?

Thanks for any advice or counsel,

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] import numpy segmentation fault

2007-03-14 Thread Zachary Pincus
If I recall correctly, there's a bug in numpy 1.0.1 on Linux-x86-64  
that causes this segfault. This is fixed in the latest SVN version of  
numpy, so if you can grab that, it should work.

I can't find the trac ticket, but I ran into this some weeks ago.

Zach



On Mar 14, 2007, at 1:36 PM, Cory Davis wrote:

 Hi there,

 I have just installed numpy-1.0.1 from source, which seemed to go
 fine.  However when I try to import numpy I get a segmentation
 fault.

 A have a 64 bit machine running RedHat Enterprise Linux and Python  
 2.34

 Any clues greatly appreciated.

 Cheers,
 Cory.
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nd_image.affine_transform edge effects

2007-03-22 Thread Zachary Pincus
Hello all,

 By the way, ringing at sharp edges is an intrinsic feature of higher-
 order spline interpolation, right? I believe this kind of interpolant
 is really intended for smooth (band-limited) data. I'm not sure why
 the pre-filtering makes a difference though; I don't yet understand
 well enough what the pre-filter actually does.

 I'm not sure what people normally do in computer graphics, since I'm
 working more with natural band-limited images, but in the case of
 Stefan's rotate_artifacts example, wouldn't it be appropriate to
 use the 1st- or 0th-order spline instead of the 2nd order? If your
 real-life data are smooth enough, however, then in theory the
 ringing with higher orders should go away.

James is indeed correct. This whole thing is my mistake based on a  
mis-interpretation of terminology. I've more carefully re-read the  
paper on which the ndimage spline code is based ('Splines: A Perfect  
Fit for Signal/Image Processing' by Michael Unser; it's on citeseer).  
I now (hopefully) understand that the spline pre-filter, while  
explicitly analogous to a traditional anti-aliasing pre-filter, is  
actually computing the spline coefficients via a filtering-type  
operation. While a traditional anti-aliasing filter should blur an  
image (as a band-limiting step), the fact that the anti-aliasing pre- 
filter does not is of no concern since the filtered output isn't a  
band-limited set of pixels, but a set of coefficients for a band- 
limited spline.

The actual transform operators then use these coefficients to  
(properly) compute pixel values at different locations. I just  
assumed that the pre-filtering was broken (even on natural images  
with smooth variations) because images pre-filtered with the spline  
filter didn't look like traditional pre-filtered images ought.

IMO, ticket 213 should be closed as PEBCAK (I'll do that forthwith);  
further I'm sorry to have caused Peter to be further bothered about  
this non-issue.

Zach






On Mar 22, 2007, at 8:13 PM, James Turner wrote:

 By the way, ringing at sharp edges is an intrinsic feature of higher-
 order spline interpolation, right? I believe this kind of interpolant
 is really intended for smooth (band-limited) data. I'm not sure why
 the pre-filtering makes a difference though; I don't yet understand
 well enough what the pre-filter actually does.

 I'm not sure what people normally do in computer graphics, since I'm
 working more with natural band-limited images, but in the case of
 Stefan's rotate_artifacts example, wouldn't it be appropriate to
 use the 1st- or 0th-order spline instead of the 2nd order? If your
 real-life data are smooth enough, however, then in theory the
 ringing with higher orders should go away.

 Sorry if I'm stating the obvious and missing the real point! I just
 wanted to make sure this hasn't been overlooked... Likewise, sorry I
 didn't mention this before if it does turn out to be relevant. Let
 me know if you want references to explain what I said above.

 James.

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nd_image.affine_transform edge effects

2007-03-23 Thread Zachary Pincus
Hello all,

On Mar 23, 2007, at 3:04 AM, Stefan van der Walt wrote:

 On Thu, Mar 22, 2007 at 11:20:37PM -0700, Zachary Pincus wrote:
 The actual transform operators then use these coefficients to
 (properly) compute pixel values at different locations. I just
 assumed that the pre-filtering was broken (even on natural images
 with smooth variations) because images pre-filtered with the spline
 filter didn't look like traditional pre-filtered images ought.

 IMO, ticket 213 should be closed as PEBCAK (I'll do that forthwith);
 further I'm sorry to have caused Peter to be further bothered about
 this non-issue.

 The ticket is more concerned with the usability of ndimage.
 Currently, the principle of least surprise is violated.  If you use
 the default arguments of ndimage.rotate (except for axes, which should
 still be fixed to be (1,0) by default) and rotate Lena by 30 degrees,
 you get something with green splotches on (attached to the ticket).

 IMO, we should either change the default parameters or update the
 documentation before closing the ticket.

Hmm, this is worrisome. There really shouldn't be ringing on  
continuous-tone images like Lena  -- right? (And at no step in an  
image like that should gaussian filtering be necessary if you're  
doing spline interpolation -- also right?)

I assumed the problem was a non-issue after some testing with a few  
natural images didn't introduce any artifacts like those, but now  
with the Lena example I'm not sure. This is frustrating, to be sure,  
mostly because of my limited knowledge on the subject -- just enough  
to be dangerous.

Zach



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nd_image.affine_transform edge effects

2007-03-24 Thread Zachary Pincus
Hello folks,

 Hmm, this is worrisome. There really shouldn't be ringing on
 continuous-tone images like Lena  -- right? (And at no step in an
 image like that should gaussian filtering be necessary if you're
 doing spline interpolation -- also right?)

 That's hard to say. Just because it's mainly a continuous-tone image
 doesn't necessarily mean it is well sampled everywhere. This depends
 both on the subject and the camera optics. Unlike the data I usually
 work with, I think everyday digital photographs (probably a photo
 scan in the case of Lena) do not generally have the detector sampling
 frequency matched to the optical resolution of the image. If that's
 true, the presence of aliasing in interpolated images depends on the
 structure of the subject and whether the scene has edges or high-
 frequency patterns in it.

Based on my reading of the two excellent Unser papers (both the one  
that ndimage's spline code is based on, and the one that Travis  
linked to), it seems like a major point of using splines for  
interpolation is *better* behavior in the case of non-band-limited  
data than the traditional 'anti-aliasing [e.g. lowpass] filter' +  
'resampling' + 'reconstruction [e.g. lowpass] filter' procedure.

That is, based on my (very limited) understanding (I understand the  
basics of the sampling theorem and can follow the Unser paper's  
update thereof, but not much more than that), in the spline case the  
spline fitting step *replaces* the anti-aliasing filter in the above  
procedure as the method for dealing with non-band-limited data. And  
the claim is that it in fact works better in many cases.

So it seems that something is wrong if the spline interpolation tools  
in ndimage only work properly in the band-limited case, or require  
lowpass filtering.


 Stefan's rotated Lena example is indeed a bit bizarre on zooming in!
 However, the artefacts are clearly localized to distinct edges, so I
 suspect this is indeed some kind of aliasing. Moreover, it looks like
 Lena has been decimated (reduced in size) prior to the rotation. That
 is definitely a good way to get artefacts, unless an anti-aliasing
 filter is applied before shrinking the image. My impression is that
 this image is probably somewhat undersampled (to understand exactly
 what that means, read up on the Sampling Theorem).

Agreed, it looks like aliasing. Nevertheless, any resampling  
procedure is supposed to deal with this internally, right? Either by  
lowpass filtering (traditional case), or by spline fitting (spline  
case as described by Unser and understood by me) -- it shouldn't be  
letting aliasing bubble through, correct?

 The first was on Stefan's artificial data which had sharp edges, and
 got very nasty ringing artifacts even with 3rd order splines. From
 your recollection, is this expected behavior based on splines and the
 nature of Stefan's image, or more likely to be a bug?

 Your question was aimed at Travis, so I don't want to discourage him
 from answering it :-), but looking at this in more detail, I do think
 the amplitude of the artefacts here is greater than I might expect due
 to ringing with a quadratic b-spline kernel, which I think has minima
 with amplitudes 10% of the central peak. There has to be SOME
 oscillation, but in Stefan's rotate_artifacts example it seems to be
 at the level of ~100%. Also, it is not present on one of the inner
 edges for some reason. So I do wonder if the algorithm in nd_image is
 making this worse than it needs to be.

Good observation! Here are cubic spline fits to a step and a delta  
function, which both have quite small amounts of ringing compared to  
what's observed -- at least on the color images. Maybe 10% ringing in  
each color channel adds up perceptually more than it does  
mathematically?

import numpy, Gnuplot, scipy.interpolate

# step
x = numpy.arange(-10, 10)
y = numpy.where(x  0, 1, 0)
tck = scipy.interpolate.splrep(x, y, s=0, k=3)
nx = numpy.linspace(-10, 9, 200, True)
ny = scipy.interpolate.splev(nx, tck)
d = Gnuplot.Data(numpy.transpose([x, y]), with='points')
nd = Gnuplot.Data(numpy.transpose([nx,ny]), with='lines')
g = Gnuplot.Gnuplot()
g.plot(d, nd)
ny.min()
# -0.107490563074 -- largest ringing is ~10% of step size

# delta
x = numpy.arange(-10, 10)
y = numpy.where(x == 0, 1, 0)
tck = scipy.interpolate.splrep(x, y, s=0, k=3)
nx = numpy.linspace(-10, 9, 200, True)
ny = scipy.interpolate.splev(nx, tck)
d = Gnuplot.Data(numpy.transpose([x, y]), with='points')
nd = Gnuplot.Data(numpy.transpose([nx,ny]), with='lines')
g = Gnuplot.Gnuplot()
g.plot(d, nd)
ny.min()
# -0.136449610107 -- largest ringing is ~13% of impulse size

Now let's use ndimage to rotate a step function by 45%, or zoom in on  
it:

import numpy, scipy.ndimage
o = numpy.ones((100, 50), dtype=float)
z = numpy.zeros((100, 50), dtype=float)
a = numpy.concatenate([o, z], axis=1)
b = scipy.ndimage.rotate(a, 45, order=3)
b.max()
# 1.08832325055
b.min()
# -0.0883232505527
c = 

Re: [Numpy-discussion] nd_image.affine_transform edge effects

2007-03-26 Thread Zachary Pincus
Thanks for the information and the paper link, James. I certainly  
appreciate the perspective, and now see why the anti-aliasing and  
reconstruction filtering might best be left to clients of a  
resampling procedure.

Hopefully at least some of the kinks in the spline interpolation (to  
date: obligate mirror boundary conditions, extra edge values, integer  
overflow) can be smoothed out.
I can't guarantee that I'll have the time or expertise to deal with  
ni_interpolation.c, but I'll try to give it a shot some time here.

Zach


On Mar 26, 2007, at 1:16 AM, James Turner wrote:

 PS... (sorry for all the posts, for anyone who isn't interested...)

 Agreed, it looks like aliasing. Nevertheless, any resampling
 procedure is supposed to deal with this internally, right? Either by
 lowpass filtering (traditional case), or by spline fitting (spline
 case as described by Unser and understood by me) -- it shouldn't be
 letting aliasing bubble through, correct?

 In the general case, I don't think it is appropriate for the  
 resampling
 procedure to use low-pass filtering internally to avoid artefacts,
 except perhaps when downsampling. It probably makes sense for computer
 graphics work, but there are cases where the input data are band
 limited to begin with and any degradation in resolution is  
 unacceptable.
 Where needed, I think low-pass filtering should either be the
 responsibility of the main program or an option. It's not even  
 possible
 for the resampling procedure to prevent artefacts in every case, since
 the aliasing in a badly undersampled image cannot be removed post
 factum (this is for undersampled photos rather than artificial  
 graphics,
 which I think are fundamentally different because everything is  
 defined
 on the grid, although I haven't sat down and proved it  
 mathematically).
 I'm also not sure how the procedure could decide on the level of
 smoothing needed for a given dataset without external information.

 Of course intermediate-order splines will probably keep everyone  
 happy,
 being reasonably robust against ringing effects without causing much
 smoothing or interpolation error :-).

 By the way, I think you and Stefan might be interested in a medical
 imaging paper by Lehmann et al. (1999), which gives a very nice  
 overview
 of the properties of different interpolation kernels:

 http://ieeexplore.ieee.org/Xplore/login.jsp?url=/ 
 iel5/42/17698/00816070.pdf?arnumber=816070

 For what it's worth, I'd agree with both of you that the numeric
 overflow should be documented if not fixed. It sounds like Stefan has
 figured out a solution for it though. If you make sense of the code in
 ni_interpolation.c, Stefan, I'd be very interested in how to make it
 calculate one less value at the edges :-).

 Cheers,

 James.

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New Operators in Python

2007-03-26 Thread Zachary Pincus
Hi folks,

Sorry to rain on this parade, but unicode variable names and/or other  
syntactic elements have already been rejected for Python 3:

http://www.python.org/dev/peps/pep-3099/
 Python 3000 source code won't use non-ASCII Unicode characters for  
 anything except string literals or comments.
To tell the truth, in my code (for an n=1 example), I use elementwise  
(or broadcasted) multiplication about an order of magnitude more than  
matrix multiplication. Now, there's plenty of linear algebra in my  
work, but it's usually boxed off enough from the rest that converting  
everything to matrices, or using the 'dot' function (etc.), is just  
fine.

Personally, I even prefer the current way that numpy does things --  
I've always *really* disliked matlab's special syntax for elementwise  
multiplication. Now, clearly there are cases where this is handy, but  
at least in looking over my code, I find that those cases are quite  
rare, really.

So, in large part, I really don't feel a strong need for new  
operators in numpy. (And in the rest of python too! The @ decorator  
pie-syntax is quite enough for line-noise characters, in my personal  
opinion. And I think that python-dev pretty well agrees too, based on  
the raging flame wars about adding even that.)

Now, this is of course just my single opinion, but folks should  
recall that even C++, which rarely met a feature that it didn't like,  
drew the line at adding extra syntactic operators to the language  
that existed only to be overridden/implemented by users. (Which is  
precisely what's being proposed here.)

Anyhow, feel free to disagree with me -- I'm no expert here. I'm only  
mentioning this as a public service to make it clear that most of  
what's being proposed in this thread is, for better or worse, 100%  
dead-in-the-water for Python 3, and the rest will have a fairly  
significant uphill battle.

Zach




On Mar 26, 2007, at 2:42 AM, dmitrey wrote:

 The unicode keyboards sailing everywhere is just a matter of time
 And python 2-symbol operators soon will look obsolete, this will
 increase migrating from python to Sun fortress etc. I took a look at
 their unicode syntax for math formulas
 http://research.sun.com/projects/plrg/faq/NAS-CG.pdf
 it looks (vs Python) (or even MATLAB) like Pentium 4 vs Intel 386
 processors.
 It just allows copy-paste from articles of many formulas, including
 ro, 'beta' and other non-ascii symbols
 Also, implementing unicode will allow separate operators for (for
 example) MATLAB cross() equivalent (vector multiplication of vectors).
 WBR, D.

 René Bastian wrote:
 Hello,

 I am interest both in numarray type multiplication and matrix type
 multiplication.

 But I am not shure that I can buy an Unicode keyboard.

 May be it would be possible to implement a class with
 user definissable (?) signs.

 My choice :

 a * b - numarray type multi
 a !* b - matrix




 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix indexing question

2007-03-26 Thread Zachary Pincus
 Since matrices are an iterable Python object,
 we *expect* to iterate over the contained objects.
 (Arrays.)  I am not sure why this is not evident to all,
 but it is surely the sticking point in this discussion.

 A matrix is not a container of matrices.
 That it acts like one is surprsing.
 Surprises are bad unless they have a clear justification.
 Perhaps a clear justification exists,
 but it has not been offered in this discussion.

I think that a clear justification has been offered several times on  
the list recently, though maybe not all in this thread.

Matrices in numpy seem to exist as a construct to facilitate linear  
algebra. One such property is that row- and column-vector slices of  
matrices are (1xN) or (Nx1) matrices, because otherwise common linear  
algebra things -- like how you multiply a row-vector or a column  
vector by a matrix, and whether and when it needs to be transposed --  
do not translate properly from linear algebra notation like in  
textbooks and papers.

Once the matrix class is committed to providing row- and column- 
vector slices as other matrices, it makes no sense to have iteration  
over the matrix provide a different data type than slicing.

Basically, my rule of thumb has been to *only* use matrices when I'm  
copying linear algebra operations out of textbooks and papers, and I  
want the notations to align. Doing other, non-linear-algebra  
operations with matrices -- like iterating over their elements -- is  
not really worth it.

There's a meta question, of course: should things be changed to make  
it worth it to do pythonic tasks with matrices? Should there be  
special elementwise vs. matrix-operation operators? Should numpy work  
a lot more like matlab? That discussion is on-going in another  
thread, I think.

Zach
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix indexing question

2007-03-27 Thread Zachary Pincus
 Exactly: that was one other thing I found artificial.
 Surely the points will then be wanted as arrays.

 So my view is that we still do not have a use case
 for wanting matrices yielded when iterating across
 rows of a matrix.

It's pretty clear from my perspective: 1-D slices of matrices *must*  
be matrices. I gave an intuitive make-it-fit-with-known-notation  
explanation, and Charles gave a theoretically-grounded explanation.  
There was no objection to this from any quarter.

Given that 1-D slices of matrices must be matrices for deep  
mathematical reasons as well as notational compatibility with the  
rest of linear algebra -- e.g. that m[0] yields a matrix if m is a  
matrix-- it almost certainly would violate the principle of least  
surprise for iteration over m (intuitively understood to be choosing m 
[0] then m[1] and so forth) to yield anything other than a matrix.  
This can't possibly be what you're asking for, right? You aren't  
suggesting that m[0] and list(iter(m))[0] should be different types?

There are many clear and definite use cases for m[0] being a matrix,  
by the way, in addition to the theoretical arguments -- these aren't  
hard to come by. Whether or nor there are actual good use-cases for  
iterating over a matrix, if you believe that m[0] should be a matrix,  
it's madness to suggest that list(iter(m))[0] should be otherwise.

My opinion? Don't iterate over matrices. Use matrices for linear  
algebra. There's no iteration construct in linear algebra. The  
behavior you find so puzzling is a necessary by-product of the  
fundamental behavior of the matrix class -- which has been explained  
and which you offered no resistance to.

Zach

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix indexing question

2007-03-27 Thread Zachary Pincus
Hello all,

I suspect my previous email did not contain the full chain of my  
reasoning, because I thought that some parts were basically obvious.  
My point was merely that given some pretty basic fundamental tenants  
of numpy, Alan's suggestions quickly lead to *serious issues* far  
worse than the original problem. Thanks Chris for making this more  
explicit; here I will do so further.

Let A be an array of arbitrary shape, and M be m-by-n matrix.

(1) A[i] equals by definition A[i,...]

This is a pretty fundamental part of numpy, right? That property (and  
its relatives) along with the broadcasting behavior, are in large  
part what make numpy so very expressive compared to matlab, etc.

I can write code in numpy to do multidimensional operations over  
arbitrary-shaped and arbitrary-dimensioned data that look *exactly*  
like the corresponding 1D operations. It's amazing, really, how  
numpy's implicit indexing (this property), broadcasting (sort of  
the inverse of this, really, where shape tuples are extended as  
necessary with implicit 1's, instead of slices being extended  
with :'s as necessary), and fancy indexing, allow complex operations  
to be written compactly, clearly, and in a fully-general manner.

I thought that point (1) was obvious, as it's a pretty fundamental  
part of numpy (and Numeric and Numarray before that). It's in  
absolutely no way a trivial convenience. Of course, broadcasting  
and ufuncs, etc, contribute a lot more to expressiveness than this  
property, but it's part of the family.


(2) Thus, M[i] equals by definition M[i,:]. By the mathematical  
definition of matrices and for pragmatic reasons, M[i,:] is another  
matrix.

Here the trouble starts. Should matrices, not being arbitrary in  
their dimension, even have the ability to be indexed as M[i]? Or  
should they be completely different beasts than arrays? Which  
violates the principle of least surprise more?

Now, I think we all more or less agree that M[i,:] has to be a  
matrix, because otherwise the matrix class is pretty worthless for  
linear algebra, and doesn't operate according to the mathematical  
definition of a matrix. Indeed, this is one of the basic properties  
of the numpy matrix -- that, and that the multiplication operator is  
defined as matrix-multiplication. Remove that and there's almost no  
point in *having* a separate matrix class.

Want a use-case? Do the singular value decomposition, and get the  
product of the second-smallest singular vectors as Vt[:,-2] * U 
[-2,:]. (Things vaguely akin to this come up a lot.) You can copy the  
operation from any linear algebra text if the row- and column-vectors  
are treated as above. Otherwise you have to wonder whether that's  
supposed to be an outer or inner product, and use the corresponding  
operator -- and at that point, why was I using matrix classes anyway?

(3) If M[i] = M[i,:], and M[i,:] is a matrix, then the iteration  
behavior of matrices *has* to yield other matrices, unless Alan is  
willing to suggest that list(iter(m))[i] should have a different type  
than M[i] -- a clear absurdity. This is the point that I was trying  
to make previously, having mistakenly assumed that point (1) was  
clear to all, and that point (2) had been clearly established.

So, if we trace the reasoning of Alan's suggestion, coupled with  
these above properties, we get just that:
(a) Iteration over M should yield arrays.
(b) Ergo M[i] should be an array.
(c) Ergo M[i,:] should be an array.
(d) Ergo the matrix class should be identical to the array class  
except for overloaded multiplication, and the only way to get a 'row'  
or 'column' vector from a matrix that operates as a proper row or  
column vector should is M[i:i+1,:]. Whicy is a lot of typing to get  
the 'i-th' vector from the matrix.

I don't think these changes are worth it -- it basically discards the  
utility of the matrix class for linear algebra in order to make the  
matrix class more useful for general purpose data storage (a role  
already filled by the array type).

Here is another set of changes which would make matrices fully  
consistent with their linear-algebra roots:
(a) M[i,:] should be a matrix.
(b) M[i] is an error.
(c) Iteration over M is an error.

That's kind of lame though, right? Because then matrices are  
completely inconsistent with their numpy roots.

So we are left with the current behavior:
(a) M[i,:] is a matrix.
(b) M[i] is a matrix.
(c) Iteration over M yields matrices.


My view is that, fundamentally, they are linear algebra constructs.  
However, I also think it would be far more harmful to remove basic  
behavior common to the rest of numpy (e.g. that M[i] = M[i,:]), and  
behavior that comes along with that (iterability). Hence my  
suggestion: treat matrices like linear algebra constructs -- don't  
iterate over them (unless you know what you're doing). Don't index  
them like M[i] (unless you know what you're doing).

Maybe it's just a little 

Re: [Numpy-discussion] Buffer interface PEP

2007-03-27 Thread Zachary Pincus
Hi,

I have a specific question and then a general question, and some  
minor issues for clarification.

Specifically, regarding the arguments to getbufferproc:
 166 format
 167address of a format-string (following extended struct
 168syntax) indicating what is in each element of
 169of memory.  The number of elements is len / itemsize,
 170where itemsize is the number of bytes implied by the format.
 171NULL if not needed in which case format is B for
 172unsigned bytes.  The memory for this string must not
 173be freed by the consumer --- it is managed by the exporter.

Is this saying that either NULL or a pointer to B can be supplied  
by getbufferproc to indicate to the caller that the array is unsigned  
bytes? If so, is there a specific reason to put the (minor)  
complexity of handling this case in the caller's hands, instead of  
dealing with it internally to getbufferproc? In either case, the  
wording is a bit unclear, I think.

The general question is that there are several other instances where  
getbufferproc is allowed to return ambiguous information which must  
be handled on the client side. For example, C-contiguous data can be  
indicated either by a NULL strides pointer or a pointer to a properly- 
constructed strides array. Clients that can't handle C-contiguous  
data (contrived example, I know there is a function to deal with  
that) would then need to check both for NULL *and* inside the strides  
array if not null, before properly deciding that the data isn't  
usable them. Similarly, the suboffsets can be either all negative or  
NULL to indicate the same thing.

Might it be more appropriate to specify only one canonical behavior  
in these cases? Otherwise clients which don't do all the checks on  
the data might not properly interoperate with providers which format  
these values in the alternate manner.


Also, some typos, and places additional clarification could help:

 253 PYBUF_STRIDES (strides and isptr)
Should 'isptr' be 'suboffsets'?

 75 of a larger array can be described without copying the data.   T
Dangling 'T'.

 279 Get the buffer and optional information variables about the  
 buffer.
 280 Return an object-specific view object (which may be simply a
 281 borrowed reference to the object itself).
This phrasing (and similar phrasing elsewhere) is somewhat opaque to  
me. What's an object-specific view object?

 287 Call this function to tell obj that you are done with your view
Similarly, the 'view' concept and terminology should be defined more  
clearly in the preamble.

 333 The struct string-syntax is missing some characters to fully
 334 implement data-format descriptions already available elsewhere (in
 335 ctypes and NumPy for example).  Here are the proposed additions:
Is the following table just the additions? If so, it might be good to  
show the full spec, and flag the specific additions. If not, then the  
additions should be flagged.

 341 't'   bit (number before states how many bits)
vs.
 372 According to the struct-module, a number can preceed a character
 373 code to specify how many of that type there are.  The
I'm confused -- could this be phrased more clearly? Does '5t' refer  
to a field 5-bits wide, or 5-one bit fields? Is 't' allowed? If  
so, is it equivalent to or different from '5t'?

 378 Functions should be added to ctypes to create a ctypes object from
 379 a struct description, and add long-double, and ucs-2 to ctypes.
Very cool.

In general, the logic of the 'locking mechanism' should be described  
at a high level at some point. It's described in nitty-gritty  
details, but at least I would have appreciated a bit more of a  
discussion about the general how and why -- this would be helpful to  
clients trying to use the locking mechanism properly.

Thanks to Travis and everyone else involved for your work on this  
cogent and sorely-needed PEP.

Zach



On Mar 27, 2007, at 12:42 PM, Travis Oliphant wrote:


 Hi all,

 I'm having a good discussion with Carl Banks and Greg Ewing over on
 python-dev about the buffer interface. We are converging on a pretty
 good design, IMHO.   If anybody wants to participate in the  
 discussion,
 please read the PEP (there are a few things that still need to change)
 and add your two cents over on python-dev (you can read it through the
 gmane gateway and participate without signing up).

 The PEP is stored in numpy/doc/pep_buffer.txt on the SVN tree for  
 NumPy

 Best regards,

 -Travis


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A unique function...

2007-03-27 Thread Zachary Pincus
Hello,

There's unique and unique1d, but these don't output the number of  
occurences.
There's also bincount, which outputs the number of each element, but  
includes zeros for non-present elements and so could be problematic  
for certain data.

Zach



On Mar 27, 2007, at 2:28 PM, Pierre GM wrote:

 All,
 I could swear that I ran once into a numpy (or scipy) function that  
 output the
 unique values of a 1d ndarray along with the number of occurences  
 of each
 element. If I'm not completely mistaken, it 's a private function  
 initially
 in Fortran.
 Does this ring a bell to anyone ? Where could I find this function ?
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buffer interface PEP

2007-03-27 Thread Zachary Pincus
 Is this saying that either NULL or a pointer to B can be supplied
 by getbufferproc to indicate to the caller that the array is unsigned
 bytes? If so, is there a specific reason to put the (minor)
 complexity of handling this case in the caller's hands, instead of
 dealing with it internally to getbufferproc? In either case, the
 wording is a bit unclear, I think.


 Yes, the wording could be more clear.   I'm trying to make it easy for
 exporters to change
 to the new buffer interface.

 The main idea I really want to see is that if the caller just passes
 NULL instead of an address then it means they are assuming the data  
 will
 be unsigned bytes   It is up to the exporter to either allow this or
 raise an error.

 The exporter should always be explicit if an argument for returning  
 the
 format is provided (I may have thought differently a few days ago).

Understood -- I'm for the exporters being as explicit as possible if  
the argument is provided.

 The general question is that there are several other instances where
 getbufferproc is allowed to return ambiguous information which must
 be handled on the client side. For example, C-contiguous data can be
 indicated either by a NULL strides pointer or a pointer to a  
 properly-
 constructed strides array.

 Here.  I'm trying to be easy on the exporter and the consumer.  If the
 data is contiguous, then neither the exporter nor will likely care  
 about
 the strides.  Allowing this to be NULL is like the current array
 protocol convention which allows this to be None.

See below. My comments here aren't suggesting that NULL should be  
disallowed. I'm basically wondering whether it is a good idea to  
allow NULL and something else to represent the same information.  
(E.g. as above, an exporter could choose to show C-contiguous data  
with a NULL returned to the client, or with a trivial strides array).

Otherwise two different exporters exporting identical data could  
provide different representations, which the clients would need to be  
able to handle. I'm not sure that this is a recipe for perfect  
interoperability.

 Clients that can't handle C-contiguous
 data (contrived example, I know there is a function to deal with
 that) would then need to check both for NULL *and* inside the strides
 array if not null, before properly deciding that the data isn't
 usable them.
 Not really.  A client that cannot deal with strides will simply not  
 pass
 an address to a stride array to the buffer protocol (that argument  
 will
 be NULL).  If the exporter cannot provide memory without stride
 information, then an error will be raised.

This doesn't really address my question, which I obscured with a  
poorly-chosen example. The PEP says (or at least that's how I read  
it) that if the client *does* provide an address for the stride  
array, then for un-strided arrays, the exporter may either choose to  
fill on NULL at that address, or provide a strides array.

Might it be easier for clients if the PEP required that NULL be  
returned if the array is C-contiguous? Or at least strongly suggested  
that? (I understand that there might be cases where an naive exporter  
thinks it is dealing with a strided array but it really is  
contiguous, and the exporter shouldn't be required to do that  
detection.)

The use-case isn't too strong here, but I think it's clear in the  
suboffsets case (see below).

 Similarly, the suboffsets can be either all negative or
 NULL to indicate the same thing.
 I think it's much easier to check if suboffsets is NULL rather than
 checking all the entries to see if they are -1 for the very common  
 case
 (i.e. the NumPy case) of no dereferencing.Also, if you can't deal
 with suboffsets you would just not provide an address for them.

My point exactly! As written, the PEP allows an exporter to either  
return NULL, or an array of all negative numbers (in the case that  
the client requested that information), forcing a fully -conforming  
client to make *both* checks in order to decide what to do.

Especially in this case, it would make sense to require a NULL be  
returned in the case of no suboffsets. This makes things easier for  
both clients that can deal with both suboffsets or non-offsets (e.g.  
they can branch on NULL, not on NULL or all-are-negative), and also  
for clients that can *only* deal with suboffsets.

Now, in these two cases, the use-case is pretty narrow, I agree.  
Basically it makes things easier for savvy clients that can deal with  
different data types, by not forcing them to make two checks (strides  
== NULL or strides array is trivial; suboffsets == NULL or suboffsets  
are all negative) when one would do. Again, this PEP allows the same  
information can be passed in two very different ways, when it really  
doesn't seem like that ambiguity makes life that much easier for  
exporters.

Maybe I'm wrong about this last point, though. Then there comes the  
trade-off -- should savvy clients 

Re: [Numpy-discussion] matrix indexing question

2007-03-29 Thread Zachary Pincus
Looks promising!

 On 3/29/07, Bill Baxter [EMAIL PROTECTED] wrote: On 3/30/07,  
 Timothy Hochberg [EMAIL PROTECTED] wrote:
  Note, however that you can't (for instance) multiply column  
 vector with
  a row vector:
 
   (c)(r)
  Traceback (most recent call last):
...
  TypeError: Cannot matrix multiply columns with anything
 

 That should be allowed.  (N,1)*(1,M) is just an (N,M) matrix with
 entries C[i,j] = A[i,0]*B[0,]

 I thought about that a little, and while I agree that it could be  
 allowed, I'm not sure that it should be allowed. It's a trade off  
 between a bit of what I would guess is little used functionality  
 with some enhanced error checking (I would guess that usually  
 row*column signals a mistake). However, I don't care much one way  
 or the other; it's not hard to allow.

I'm also for allowing it; from the perspective of one writing code  
that looks like typical (if potentially technically incorrect)  
notation, being able to write direct analogs to a'b (to get a scalar)  
or ab' (to get an MxN matrix) and have everything work would be  
quite useful. Especially in various reconstruction tasks (e.g. using  
svd to approximate a given matrix with a lower-rank one), the outer  
product of a row and a column vector comes up often enough that I  
would be loathe to have it raise an error.

 I kind of like the idea of using call for multiply, though.  If it
 doesn't turn out to have any major down sides it could be a good way
 to give ndarray a concise syntax for dot.

 We'll see how it goes down this time. I've proposed using call  
 before, since I've thought the matrix situation was kind of silly  
 for what seems like ages now, but it always sinks without a ripple.

The call syntax is nice, if a bit opaque to someone looking at the  
code for the first time. On the other hand, it's explicit in a way  
that overloaded multiplication just isn't. I like it (for what little  
that's worth).


Zach




 -- 

 //=][=\\

 [EMAIL PROTECTED]
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] argsort and take along arbitrary axes

2007-05-14 Thread Zachary Pincus
Hello all,

I've got a few questions that came up as I tried to calculate various  
statistics about an image time-series. For example, I have an array  
of shape (t,x,y) representing t frames of a time-lapse of resolution  
(x,y).

Now, say I want to both argsort and sort this time-series, pixel- 
wise. (For example.)

In 1-d it's easy:
indices = a.argsort()
sorted = a[indices]

I would have thought that doing this on my 3-d array would work  
similarly:
indices = a.argsort(axis=0)
sorted = a.take(indices, axis=0)

Unfortunately, this gives a ValueError of dimensions too large.  
Now, I know that 'a.sort(axis=0)' works fine for the given example,  
but I'm curious about how to this sort of indexing operation in the  
general case.

Thanks for any insight,

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort and take along arbitrary axes

2007-05-14 Thread Zachary Pincus
 I've got a few questions that came up as I tried to calculate various
 statistics about an image time-series. For example, I have an array
 of shape (t,x,y) representing t frames of a time-lapse of resolution
 (x,y).

 Now, say I want to both argsort and sort this time-series, pixel-
 wise. (For example.)

 In 1-d it's easy:
 indices = a.argsort()
 sorted = a[indices]

 I would have thought that doing this on my 3-d array would work
 similarly:
 indices = a.argsort(axis=0)
 sorted = a.take(indices, axis=0)

 Unfortunately, this gives a ValueError of dimensions too large.
 Now, I know that 'a.sort(axis=0)' works fine for the given example,
 but I'm curious about how to this sort of indexing operation in the
 general case.

 Unfortunately, argsort doesn't work transparently with take or  
 fancy indexing for multidimensional arrays. I am thinking of adding  
 a function argtake for this, and also for the results returned by  
 argmax and argmin, but at the moment you have to fill in the   
 values of the other indices and use fancy indexing. For now, it is  
 probably simpler, prettier, and faster to just sort the array.

Thanks Charles. Unfortunately, the argsort/sort buisness was, as I  
mentioned, just an example of the kind of 'take' operation that I am  
trying to figure out how to do. There are other operations that will  
have similarly-formatted 'indices' arrays (as above) that aren't  
generated from argsort...

As such, how do I fill in the values of the other indices and use  
fancy indexing? Even after reading the numpy book about that, and  
reading the docstring for numpy.take, I'm still vague on this. Would  
I use numpy.indices to get a list of index arrays, and then swap in  
(at the correct position in this list) the result of argsort (or the  
other operations), and use that for fancy indexing? Is there an  
easier/faster way?

Thanks again,

Zach





___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] very large matrices.

2007-05-14 Thread Zachary Pincus
Hello Dave,

I don't know if this will be useful to your research, but it may be  
worth pointing out in general. As you know PCA (and perhaps some  
other spectral algorithms?) use eigenvalues of matrices that can be  
factored out as A'A (where ' means transpose). For example, in the  
PCA case, if A is a centered data matrix (i.e. the mean value of each  
data point has been subtracted off), and if the data elements are row- 
vectors, then A'A is the covariance matrix. PCA then examines the  
(nonzero) eigenvectors of this covariance matrix A'A.

Now, it's possible to determine the eigenvalues/eigenvectors for A'A  
from the matrix AA', which in many useful cases is much smaller than  
A'A. For example, imagine that you have 100 data points, each in  
10,000 dimensions. (This is common in imaging applications.) A'A is  
10,000x10,000, but AA' is only 100x100. We can get the eigenvalues/ 
vectors of A'A from those of AA', as described below. (This works in  
part because in such cases, the larger matrix is rank-deficient, so  
there will only be e.g. 100 nonzero eigenvalues anyway.)

If you're lucky enough to have this kind of structure in your  
problem, feel free to use the following code which exploits that (and  
explains in a bit more detail how it works).


Zach Pincus
Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine



def _symm_eig(a):
   Return the eigenvectors and eigenvalues of the symmetric matrix  
a'*a. If
   a has more columns than rows, then that matrix will be rank- 
deficient,
   and the non-zero eigenvalues and eigenvectors can be more easily  
extracted
   from the matrix a*a', from the properties of the SVD:
 if a of shape (m,n) has SVD u*s*v', then:
   a'*a = v*s'*s*v'
   a*a' = u*s*s'*u'
   That is, v contains the eigenvectors of a'*a, with s'*s the  
eigenvalues,
   according to the eigen-decomposition theorem.
   Now, let s_hat, an array of shape (m,n), be such that s * s_hat = I 
(m,m)
 and s_hat * s = I(n,n). With that, we can solve for u or v in  
terms of the
 other:
   v = a'*u*s_hat'
   u = a*v*s_hat
   
   m, n = a.shape
   if m = n:
 # just return the eigenvalues and eigenvectors of a'a
 vecs, vals = _eigh(numpy.dot(a.transpose(), a))
 vecs = numpy.where(vecs  0, 0, vecs)
 return vecs, vals
   else:
 # figure out the eigenvalues and vectors based on aa', which is  
smaller
 sst_diag, u = _eigh(numpy.dot(a, a.transpose()))
 # in case due to numerical instabilities we have sst_diag  0  
anywhere,
 # peg them to zero
 sst_diag = numpy.where(sst_diag  0, 0, sst_diag)
 # now get the inverse square root of the diagonal, which will  
form the
 # main diagonal of s_hat
 err = numpy.seterr(divide='ignore', invalid='ignore')
 s_hat_diag = 1/numpy.sqrt(sst_diag)
 numpy.seterr(**err)
 s_hat_diag = numpy.where(numpy.isfinite(s_hat_diag), s_hat_diag, 0)
 # s_hat_diag is a list of length m, a'u is (n,m), so we can just  
use
 # numpy's broadcasting instead of matrix multiplication, and  
only create
 # the upper mxm block of a'u, since that's all we'll use anyway...
 v = numpy.dot(a.transpose(), u[:,:m]) * s_hat_diag
 return sst_diag, v

def _eigh(m):
   Return eigenvalues and corresponding eigenvectors of the hermetian
   array m, ordered by eigenvalues in decreasing order.
   Note that the numpy.linalg.eigh makes no order guarantees.
   values, vectors = numpy.linalg.eigh(m)
   order = numpy.flipud(values.argsort())
   return values[order], vectors[:,order]



On May 13, 2007, at 8:35 PM, Dave P. Novakovic wrote:

 There are definitely elements of spectral graph theory in my research
 too. I'll summarise

 We are interested in seeing the each eigenvector from svd can
 represent in a semantic space
 In addition to this we'll be testing it against some algorithms like
 concept indexing (uses a bipartitional k-meansish method for dim
 reduction)
 also testing against Orthogonal Locality Preserving indexing, which
 uses the laplacian of a similarity matrix to calculate projections of
 a document (or term) into a manifold.

 These methods have been implemented and tested for document
 classification, I'm interested in seeing their applicability to
 modelling semantics with a system known as Hyperspace to analog
 language.

 I was hoping to do svd to my HAL built out of reuters, but that was
 way too big. instead i'm trying with the traces idea i mentioned
 before (ie contextually grepping a keyword out of the docs to build a
 space around it.)

 Cheers

 Dave

 On 5/14/07, Charles R Harris [EMAIL PROTECTED] wrote:


 On 5/13/07, Dave P. Novakovic [EMAIL PROTECTED] wrote:
 Are you trying some sort of principal components analysis?

 PCA is indeed one part of the research I'm doing.

 I had the impression you were trying to build a linear space in  
 which to
 embed a model, like atmospheric folk do when they try to invert  
 

[Numpy-discussion] Change to get_printoptions?

2007-07-25 Thread Zachary Pincus
Hello all,

I just recently updated to the SVN version of numpy to test my code  
against it, and found that a small change made to  
numpy.get_printoptions (it now returns a dictionary instead of a  
list) breaks my code.

Here's the changeset:
http://projects.scipy.org/scipy/numpy/changeset/3877

I'm not really looking forward to needing to detect numpy versions  
just so I can do the right thing with get_printoptions, but I do  
agree that the new version of the function is more sensible. My  
question is if there's any particular policy about backwards- 
incompatible python api changes, or if I need to be aware of their  
possibility at every point release. (Either is fine -- I'm happy for  
numpy to be better at the cost of incompatibility, but I'd like to  
know if changes like these are the rule or exception.)

Also, in terms of compatibility checking, has anyone written a little  
function to check if numpy is within a particular version range?  
Specifically, one that handles numpy's built from SVN as well as from  
release tarballs.


Thanks,

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] getting numPy happening for sciPy

2007-07-27 Thread Zachary Pincus
On Jul 27, 2007, at 2:42 AM, Nils Wagner wrote:

 I cannot reproduce the problem concerning #401. It is Mac specific
 problem. Am I missing something ?

I can't reproduce this problem either. I just yesterday built scipy  
from SVN on two different OS X 10.4.10 boxes, one using the fortran  
compiler from hpc.sourceforge.net (not the latest 2007 release, but  
the december 2006 one), and the other using the compiler from  
r.research.att.com/tools. Everything else was similar, and everything  
worked fine with regard to ticket 401.

On the other hand, when I tried to compile scipy using the latest  
(2007-05) gfortran from hpc.sourceforge.net, I got bizarre link  
errors about MACOSX_DEPLOYMENT_TARGET being set incorrectly. (See  
previous email here http://projects.scipy.org/pipermail/scipy-user/ 
2007-June/012542.html ). Interestingly, with the earlier version of  
gfortran from hpc.sourceforge.net, and with the r.research.att.com/ 
tools version, this problem  does not arise.

Anyhow, my point is that there are still odd linker errors (as in  
ticket 401) lurking that may or may not have anything to do with  
scipy per se, but might have to do with odd and perhaps buggy builds  
of gfortran. Feh -- I wish Apple would just start including a fortran  
compiler with the rest of their dev tools. The situation otherwise is  
not good.

Zach
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vector magnitude?

2007-09-05 Thread Zachary Pincus
Hello,

'len' is a (pretty basic) python builtin function for getting the  
length of anything with a list-like interface. (Or more generally,  
getting the size of anything that is sized, e.g. a set or dictionary.)

Numpy arrays offer a list-like interface allowing you to iterate  
along their first dimension, etc. (*) Thus, len(numpy_array) is  
equivalent to numpy_array.shape[0], which is the number of elements  
along the first dimension of the array.

Zach


(*) For example, this is useful if you've got numerous data vectors  
packed into an array along the first dimension, and want to iterate  
across the different vectors.

a = numpy.ones((number_of_data_elements, size_of_data_element))
for element in a:
# element is a 1-D array with a length of 'size_of_data_element'

Note further that this works even if your data elements are multi- 
dimensional; i.e. the above works the same if:
element_shape = (x,y,z)
a = numpy.ones((number_of_data_elements,)+element_shape)
for element in a:
# element is a 3-D array with a shape of (x,y,z)





On Sep 5, 2007, at 2:40 PM, Robert Dailey wrote:

 Thanks for your response.

 I was not able to find len() in the numpy documentation at the  
 following link:
 http://www.scipy.org/doc/numpy_api_docs/namespace_index.html

 Perhaps I'm looking in the wrong location?

 On 9/5/07, Matthieu Brucher [EMAIL PROTECTED]  wrote:

 2007/9/5, Robert Dailey  [EMAIL PROTECTED]: Hi,

 I have two questions:

 1) Is there any way in numpy to represent vectors? Currently I'm  
 using 'array' for vectors.


 A vector is an array with one dimension, it's OK. You could use a  
 matrix of dimension 1xn or nx1 as well.


 2) Is there a way to calculate the magnitude (length) of a vector  
 in numpy?

 Yes, len(a)

 Matthieu

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Find first occurrence?

2007-09-19 Thread Zachary Pincus
Hello all,

On several occasions, I've had the need to find only the first  
occurrence of a value in an unsorted numpy array. I usually use  
numpy.where(arr==val)[0] or similar, and don't worry about the fact  
that I'm iterating across the entire array.

However, sometimes the arrays are pretty big and the find operation  
is in an inner loop, so I was wondering if there's already a C  
extension function somewhere in numpy or scipy that does a fast find  
first operation, or anything similar. (Easy enough to write my own,  
and maybe given the issues inherent in comparing float equality,  
etc., it doesn't belong in the core numpy anyway...)

Thanks,

Zach Pincus
Program in Biomedical Informatics and Department of Biochemistry

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] display numpy array as image

2007-11-29 Thread Zachary Pincus
Thanks for the suggestions, everyone! All very informative and most  
helpful.

For what it's worth, here's my application: I'm building a tool for  
image processing which needs some manual input in a few places (e.g.  
user draws a few lines). The images are greyscale images with 12-14  
bits of dynamic range (from a microscope), so I need to have some  
basic brightness/contrast/gamma controls, as well as allowing basic  
drawing on the image to get the needed user input. It looks like GL  
or wx will be best suited here, I think? (I presume that python/numpy/ 
[GL|wx] can keep up with things like dragging a slider to change  
brightness/contrast/other LUT changes, as long as I code reasonably.)

Anyhow, thanks for all the input,

Zach


On Nov 29, 2007, at 9:03 PM, Joe Harrington wrote:

 If you want to explore the array interactively, blink images, mess  
 with
 colormaps using the mouse, rescale the image values, mark regions, add
 labels, look at dynamic plots of rows and columns, etc., get the ds9
 image viewer and the xpa programs that come with it that allow it to
 communicate with other programs:

 ftp://sao-ftp.harvard.edu/pub/rd/ds9
 http://hea-www.harvard.edu/RD/ds9/index.html

 Then get the Python numdisplay package, which uses xpa.  You have  
 to get
 numdisplay from inside the stsci_python package:

 http://www.stsci.edu/resources/software_hardware/pyraf/stsci_python/ 
 current/download

 Just grab the numdisplay directory from within that.  Older  
 versions of
 numdisplay are standalone but don't work perfectly.  Beware, there are
 outdated web sites about numdisplay on the stsci site.  Don't google!

 Run ds9 before you load numdisplay.  Then you can send your python
 arrays to a real interactive data viewer at will.  There are even
 mechanisms to define physical coordinates mapped from the image
 coordinates.

 --jh--



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] display numpy array as image

2007-12-03 Thread Zachary Pincus
Hi Stéfan,

Thanks -- I hadn't realized matplotlib's user-interaction abilities  
were that sophisticated! I'll definitely give that route a shot.

Zach


On Dec 3, 2007, at 9:46 AM, Stefan van der Walt wrote:

 Hi Zach

 Attached is some code for removing radial distortion from images.  It
 shows how to draw lines based on user input using matplotlib.  It is
 not suited for a big application, but useful for demonstrations.

 Try it on

   http://mentat.za.net/results/window.jpg

 Regards
 Stéfan

 On Thu, Nov 29, 2007 at 11:59:05PM -0500, Zachary Pincus wrote:
 Thanks for the suggestions, everyone! All very informative and most
 helpful.

 For what it's worth, here's my application: I'm building a tool for
 image processing which needs some manual input in a few places (e.g.
 user draws a few lines). The images are greyscale images with 12-14
 bits of dynamic range (from a microscope), so I need to have some
 basic brightness/contrast/gamma controls, as well as allowing basic
 drawing on the image to get the needed user input. It looks like GL
 or wx will be best suited here, I think? (I presume that python/ 
 numpy/
 [GL|wx] can keep up with things like dragging a slider to change
 brightness/contrast/other LUT changes, as long as I code reasonably.)

 Anyhow, thanks for all the input,

 Zachradial.py
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how do I list all combinations

2007-12-26 Thread Zachary Pincus
Hi all,

I use numpy's own ndindex() for tasks like these:

 In: numpy.ndindex?
 Type:   type
 Base Class: type 'type'
 String Form:class 'numpy.lib.index_tricks.ndindex'
 Namespace:  Interactive
 File:   /Library/Frameworks/Python.framework/Versions/2.4/ 
 lib/python2.4/site-packages/numpy/lib/index_tricks.py
 Docstring:
 Pass in a sequence of integers corresponding
 to the number of dimensions in the counter.  This iterator
 will then return an N-dimensional counter.

 Example:
  for index in ndindex(3,2,1):
 ... print index
 (0, 0, 0)
 (0, 1, 0)
 (1, 0, 0)
 (1, 1, 0)
 (2, 0, 0)
 (2, 1, 0)

Here's a combination function using numpy.ndindex:

def combinations(*lists):
   lens = [len(l) for l in lists]
   for indices in numpy.ndindex(*lens):
 yield [l[i] for l, i in zip(lists, indices)]

r1=[dog,cat]
r2=[1,2]
list(combinations(r1, r2))

Out: [['dog', 1], ['dog', 2], ['cat', 1], ['cat', 2]]

Or you could use 'map' and 'operator.getitem', which might be faster(?):
def combinations(*lists):
   lens = [len(l) for l in lists]
   for indices in numpy.ndindex(*lens):
 yield map(operator.getitem, lists, indices)


In the python cookbook, there are numerous similar recipes for  
generating permutations, combinations, and the like.


Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine




On Dec 26, 2007, at 5:48 PM, Timothy Hochberg wrote:


 Here's a baroque way to do it using generated code:

 def cg_combinations(seqs):
 n = len(seqs)
 chunks = [def f(%s): % ', '.join('s%s' % i for i in range 
 (n))]
 for i in reversed(range(n)):
 chunks.append(  * (n -i) + for x%s in s%s: % (i, i))
 chunks.append(  * n +  yield ( + ', '.join('x%s' % i  
 for i in range(n)) + ')')
 code = '\n'.join(chunks)
 exec code
 return f(*seqs)

 It should be reasonably fast, if non-obvious. I've included a  
 version with some timing and testing against Chucks version below.  
 Enjoy.

 (Also, it can be simplified slightly, but I wanted to generate in  
 the same order as Chuck for comparison purposes).


 ==


 def count(seqs) :
 counter = [0 for i in seqs]
 maxdigit = [len(i) - 1 for i in seqs]
 yield counter
 while True :
 i = 0;
 while i  len(counter) and counter[i] = maxdigit[i] :
 counter[i] = 0
 i += 1
 if i  len(counter) :
 counter[i] += 1
 yield counter
 else :
 return

 def count_combinations(seqs):
 for c in count(seqs):
 yield tuple(s[i] for (s, i) in zip(seqs, c))

 def cg_combinations(seqs):
 n = len(seqs)
 chunks = [def f(%s): % ', '.join('s%s' % i for i in range 
 (n))]
 for i in reversed(range(n)):
 chunks.append(  * (n -i) + for x%s in s%s: % (i, i))
 chunks.append(  * n +  yield ( + ', '.join('x%s' % i  
 for i in range(n)) + ')')
 code = '\n'.join(chunks)
 exec code
 return f(*seqs)

 a = abcde*10
 b = range(99)
 c = [x**2 for x in range(33)]
 seqs = [a, b, c]

 if __name__ == __main__:
 assert list(count_combinations(seqs)) == list 
 (cg_combinations(seqs))

 import timeit
 test = timeit.Timer('list(count_combinations(seqs))',
'from __main__ import count_combinations, seqs')
 t1 = test.timeit(number=10)
 test = timeit.Timer('list(cg_combinations(seqs))',
'from __main__ import cg_combinations, seqs')
 t2 = test.timeit(number=10)
 print t1, t2
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nasty bug using pre-initialized arrays

2008-01-04 Thread Zachary Pincus
Hello all,

 That's well and good.  But NumPy should *never* automatically -- and
 silently -- chop the imaginary part off your complex array elements,
 particularly if you are just doing an innocent assignment!
 Doing something drastic like silently throwing half your data away can
 lead to all kinds of bugs in code written by somebody who is unaware
 of this behavior (i.e. most people)!

 It sounds to me like the right thing is to throw an exception instead
 of downcasting a data object.

I'm not sure that I agree! I'd rather not have to litter my code with  
casting operations every time I wanted to down-convert data types  
(creating yet more temporary arrays...) via assignment. e.g.:

A[i] = calculate(B).astype(A.dtype)
vs.
A[i] = calculate(B)

Further, writing functions to operate on generic user-provided output  
arrays (or arrays of user-provided dtype; both of which are common  
e.g. in scipy.ndimage) becomes more bug-prone, as every assignment  
would need to be modified as above.

This change would break a lot of the image-processing code I've  
written (where one often does computations in float or even double,  
and then re-scales and down-converts the result to integer for  
display), for example.

I guess that this could be rolled in via the geterr/seterr mechanism,  
and could by default just print warnings. I agree that silent  
truncation can be troublesome, but not having to spell every  
conversion out in explicit ( and error-prone) detail is also pretty  
useful. (And arguably more pythonic.)

Zach
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nasty bug using pre-initialized arrays

2008-01-05 Thread Zachary Pincus
 There are two related hierarchies of datatypes: different kinds  
 (integer,
 floating point, complex floating point) and different precisions  
 within a given
 kind (int8, int16, int32, int64). The term downcasting should  
 probably be
 reserved for the latter only.

 It seems to me that Zach and Scott are defending downcasting of  
 precisions
 within a given kind. It does not necessarily follow that the  
 behavior we choose
 for dtypes within a given kind should be the behavior when we are  
 dealing with
 dtypes across different kinds. We can keep the precision  
 downcasting behavior
 that you want while raising an error when one attempts to assign a  
 complex
 number into a floating point array.

Actually, my points were explicitly about cross-kind conversions! A  
lot of image-processing code features heavy use of converting integer  
images to float for intermediate calculations, and then rescaling and  
down-converting back to the original type for display, etc.

In fact, scipy.ndimage makes use of this, allowing for operations  
with output into user-specified arrays, or arrays of user-specified  
dtype, while (I believe) carrying out some of the intermediate  
calculations at higher precision.

As such, pretty much any code that takes a user-specified array and  
assigns to it the result of a (potentially) mixed-mode calculation  
would need to be changed from:

A[i] = calculate(B)
to
A[i] = calculate(B).astype(A.dtype)

with the proliferation of temp arrays that that entails.

I think, but am not sure, that there is a lot of code out there that  
does this, intentionally, which would be broken by this change.  
Explicit is indeed better than implicit, but in this case, finding  
all of the places where mixed-mode conversions happen and tracking  
them down could be a pain on the same scale as const chasing in C++,  
where fixing the error in one place makes it reappear elsewhere in  
the chain of usage, leading to long, painful, and often totally  
pointless debugging sessions.

In the specific case of converting from complex to anything else, I  
can see the desire for a warning to prevent data loss. But converting  
from float to integer is a lot more routine and is used a lot in the  
kind of work that I do, at least.

Zach
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nasty bug using pre-initialized arrays

2008-01-07 Thread Zachary Pincus
 For large arrays, it makes since to do automatic
 conversions, as is also the case in functions taking output arrays,
 because the typecast can be pushed down into C where it is time and
 space efficient, whereas explicitly converting the array uses up
 temporary space. However, I can imagine an explicit typecast  
 function,
 something like

 a[...] = typecast(b)

 that would replace the current behavior. I think the typecast  
 function
 could be implemented by returning a view of b with a castable flag  
 set
 to true, that should supply enough information for the assignment
 operator to do its job. This might be a good addition for Numpy 1.1.

 While that seems like an ok idea, I'm still not sure what's wrong with
 raising an exception when there will be information loss.  The  
 exception
 is already raised with standard python complex objects.  I can  
 think of
 many times in my code where explicit looping is a necessity, so
 pre-allocating the array is the only way to go.

The issue Charles is dealing with here is how to *suppress* the  
proposed exception in cases (as the several that I described) where  
the information loss is explicitly desired.

With what's currently in numpy now, you would have to do something  
like this:
A[...] = B.astype(A.dtype)
to set a portion of A to B, unless you are *certain* that A and B are  
of compatible types.

This is ugly and also bug-prone, seeing as how there's some violation  
of the don't-repeat-yourself principle. (I.e. A is mentioned twice,  
and to change the code to use a different array, you need to change  
the variable name A twice.)

Moreover, and worse, the phrase 'A = B.astype(A.dtype)' creates and  
destroys a temporary array the same size as B. It's equivalent to:
temp = B.astype(A.dtype)
A[...] = temp

Which is no good if B is very large. Currently, the type conversion  
in 'A[...] = B' cases is handled implicitly, deep down in the C code  
where it is very very fast, and no temporary array is made.

Charles suggests a 'typecast' operator that would set a flag on the  
array B so that trying to convert it would *not* raise an exception,  
allowing for the fast, C-level conversion. (This assumes your  
proposed change in which by default such conversions would raise  
exceptions.) This 'typecast' operation wouldn't do anything but set a  
flag, so it doesn't create a temporary array or do any extra work.

But still, every time that you are not *certain* what the type of a  
result from a given operation is, any code that looks like:
A[i] = calculate(...)
will need to look like this instead:
A[i] = typecast(calculate(...))

I agree with others that such assignments aren't highly common, but  
there will be broken code from this. And as Charles demonstrates,  
getting the details right of how to implement such a change is non- 
trivial.

Zach

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] CASTABLE flag

2008-01-07 Thread Zachary Pincus
Hello all,

In order to help make things regarding this casting issue more  
explicit, let me present the following table of potential down-casts.

(Also, for the record, nobody is proposing automatic up-casting of  
any kind. The proposals on the table focus on preventing some or all  
implicit down-casting.)

(1) complex - anything else:
Data is lost wholesale.

(2) large float - smaller float (also large complex - smaller  
complex):
Precision is lost.

(3) float - integer:
Precision is lost, to a much greater degree than (2).

(4) large integer - smaller integer (also signed/unsigned conversions):
Data gets lost/mangled/wrapped around.

The original requests for exceptions to be raised focused on case  
(1), where it is most clear that loss-of-data is happening in a way  
that is unlikely to be intentional.

Robert Kern suggested that exceptions might be raised for cases (1)  
and (3), which are cross-kind casts, but not for within-kind  
conversions like (2) and (4). However, I personally don't think that  
down-converting from float to int is any more or less fraught than  
converting from int32 to int8: if you need a warning/exception in one  
case, you'd need it in the rest. Moreover, there's the principle of  
least surprise, which would suggest that complex rules for when  
exceptions get raised based on the kind of conversion being made is  
just asking for trouble.

So, as a poll, if you are in favor of exceptions instead of automatic  
down-conversion, where do you draw the line? What causes an error?  
Robert seemed to be in favor of (1) and (3). Anne seemed to think  
that only (1) was problematic enough to worry about. I am personally  
cool toward the exceptions, but I think that case (4) is just as  
bad as case (3) in terms of data-loss, though I agree that case (1)  
seems the worst (and I don't really find any of them particularly  
bad, though case (1) is something of an oddity for newcomers, I agree.)

Finally, if people really want these sort of warnings, I would  
suggest that they be rolled into the get/setoptions mechanism, so  
that there's a fine-grained mechanism for turning them to off/warn/ 
raise exception.

Zach
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Unexpected integer overflow handling

2008-01-07 Thread Zachary Pincus
Hello all,

On my (older) version of numpy (1.0.4.dev3896), I found several  
oddities in the handling of assignment of long-integer values to  
integer arrays:

In : numpy.array([2**31], dtype=numpy.int8)
 
---
ValueErrorTraceback (most recent call  
last)
/Users/zpincus/ipython console
ValueError: setting an array element with a sequence.

While this might be reasonable to be an error condition, the precise  
error raised seems not quite right! But not all overflow errors are  
caught in this way:

In : numpy.array([2**31-1], dtype=numpy.int8)
Out: array([-1], dtype=int8)

As above, numpy is quite happy allowing overflows; it just breaks  
when doing a python long-int to int conversion. The conversion from  
numpy-long-int to int does the right thing though (if by right  
thing you mean allows silent overflow, which is a matter of  
discussion elsewhere right now...):

In : numpy.array(numpy.array([2**31], dtype=numpy.int64),  
dtype=numpy.int8)
Out: array([0], dtype=int8)

At least on item assignment, the overflow exception is less odd:

In : a = numpy.empty(shape=(1,), dtype=numpy.int8)
In : a[0] = 2**31
 
---
OverflowError Traceback (most recent call  
last)
/Users/zpincus/ipython console
OverflowError: long int too large to convert to int

Things work right with array element assignment:
In : a[0] = numpy.array([2**31], dtype=numpy.int64)[0]

But break again with array scalars, and with the strange ValueError  
again!
In : a[0] = numpy.array(2**31, dtype=numpy.int64)
 
---
ValueErrorTraceback (most recent call  
last)
/Users/zpincus/ipython console
ValueError: setting an array element with a sequence.

Note that non-long-int-to-int array scalar conversions work:
In : a[0] = numpy.array(2**31-1, dtype=numpy.int64)

Is this still the case for the current version of numpy?

Best,
Zach
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] finding eigenvectors etc

2008-02-21 Thread Zachary Pincus

Hi all,


How are you using the values? How significant are the differences?



i am using these eigenvectors to do PCA on a set of images(of faces).I
sort the eigenvectors in descending order of their eigenvalues and
this is multiplied with the  (orig data of some images viz a matrix)to
obtain a facespace.


I've dealt with similar issues a lot -- performing PCA on data where  
the dimensionality of the data is much greater than the number of  
data points. (Like images.)


In this case, the maximum number of non-trivial eigenvectors of the  
covariance matrix of the data is min(dimension_of_data,  
number_of_data_points), so one always runs into the zero-eigenvalue  
problem; the matrix is thus always ill-conditioned, but that's not a  
problem in these cases.


Nevertheless, if you've got (say) 100 images that are each 100x100  
pixels, to do PCA in the naive way you need to make a 1x1  
covariance matrix and then decompose it into 10 eigenvectors and  
values just to get out the 100 non-trivial ones. That's a lot of  
computation wasted calculating noise! Fortunately, there are better  
ways. One is to perform the SVD on the 100x1 data matrix. Let the  
centered (mean-subtracted) data matrix be D, then the SVD provides  
matrices U, S, and V'. IIRC, the eigenvalues of D'D (the covariance  
matrix of interest) are then packed along the first dimension of V',  
and the eigenvalues are the square of the values in S.


But! There's an even faster way (from my testing). The trick is that  
instead of calculating the 1x1 outer covariance matrix D'D,  
or doing the SVD on D, one can calculate the 100x100 inner  
covariance matrix DD' and perform the eigen-decomposition thereon and  
then trivially transform those eigenvalues and vectors to the ones of  
the D'D matrix. This computation is often substantially faster than  
the SVD.


Here's how it works:
Let D, our re-centered data matrix, be of shape (n, k) -- that is, n  
data points in k dimensions.
We know that D has a singular value decomposition D = USV' (no need  
to calculate the SVD though; just enough to know it exists).

From this, we can rewrite the covariance matrices:
D'D = VS'SV'
DD' = USS'U'

Now, from the SVD, we know that S'S and SS' are diagonal matrices,  
and V and U (and V' and U') form orthogonal bases. One way to write  
the eigen-decomposition of a matrix is A = QLQ', where Q is  
orthogonal and L is diagonal. Since the eigen-decomposition is unique  
(up to a permutation of the columns of Q and L), we know that V must  
therefore contain the eigenvectors of D'D in its columns, and U must  
contain the eigenvectors of DD' in its columns. This is the origin of  
the SVD recipe for that I gave above.


Further, let S_hat, of shape (n, k) be the elementwise reciprocal of  
S (i.e. SS_hat = I of shape (m, n) and S_hatS = I of shape (n, m),  
where I is the identity matrix).

Then, we can solve for U or V in terms of the other:
V = D'US_hat'
U = DVS_hat

So, to get the eigenvectors and eigenvalues of D'D, we just calculate  
DD' and then apply the symmetric eigen-decomposition (symmetric  
version is faster, and DD' is symmetric) to get eigenvectors U and  
eigenvalues L. We know that L=SS', so S_hat = 1/sqrt(L) (where the  
sqrt is taken elementwise, of course). So, the eigenvectors we're  
looking for are:

V = D'US_hat
Then, the principal components (eigenvectors) are in the columns of V  
(packed along the second dimension of V).


Fortunately, I've packaged this all up into a python module for PCA  
that takes care of this all. It's attached.


Zach Pincus

Postdoctoral Fellow, Lab of Dr. Frank Slack
Molecular, Cellular and Developmental Biology
Yale University

# Copyright 2007 Zachary Pincus
# 
# This is free software; you can redistribute it and/or modify
# it under the terms of the Python License version 2.4 as published by the
# Python Software Foundation.


import numpy

def pca(data, algorithm='eig'):
  pca(data) - mean, pcs, norm_pcs, variances, positions, norm_positions
  Perform Principal Components Analysis on a set of n data points in k
  dimensions. The data array must be of shape (n, k).
  This function returns the mean data point, the principal components (of
  shape (p, k), where p is the number pf principal components: p=min(n,k)),
  the normalized principal components, where each component is normalized by
  the data's standard deviation along that component (shape (p, k)), the
  variance each component represents (shape (p,)), the position of each data
  point along each component (shape (n, p)), and the position of each data
  point along each normalized component (shape (n, p)).
  
  The optional algorithm parameter can be either 'svd' to perform PCA with 
  the singular value decomposition, or 'eig' to use a symmetric eigenvalue
  decomposition. Empirically, eig is faster on the datasets I have tested.
  
  
  data = numpy.asarray(data)
  mean = data.mean(axis = 0)
  centered = data

Re: [Numpy-discussion] ctypes and numpy

2009-11-07 Thread Zachary Pincus
Check out this thread:
http://www.mail-archive.com/numpy-discuss...@lists.sourceforge.net/msg01154.html

In shot, it can be done, but it can be tricky to make sure you don't  
leak memory. A better option if possible is to pre-allocate the array  
with numpy and pass that buffer into the C code -- but that's not  
always possible...

Zach


On Nov 6, 2009, at 10:18 PM, Chris Colbert wrote:

 Are you bound to using ctypes as the only option?

 This could be done quite easily in Cython...

 On Fri, Nov 6, 2009 at 1:57 PM, Trevor Clarke tre...@notcows.com  
 wrote:
 I've found some info on ctypes and numpy but it mostly tells me how  
 to pass
 numpy objects to C funcs. I've got a C func which returns a  
 c_void_p to a
 buffer and I'd like to access the data using numpy without creating  
 a copy
 of the data. Could someone point me in the right direction?

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] neighborhood iterator

2009-11-23 Thread Zachary Pincus
Hi,

Having just written some cython code to iterate a neighborhood across  
an array, I have some ideas about features that would be useful for a  
general frame. Specifically, being able to pass in a footprint  
boolean array to define the neighborhood is really useful in many  
contexts. Also useful is the ability to query the offset of the  
current pixel from the center of the neighborhood. (These two  
features, plus very efficient handling of boundary conditions by  
breaking the image into regions where the conditions are and are not  
required, make the image iterators in ITK really nice to use.)

Zach


On Nov 23, 2009, at 9:12 AM, Nadav Horesh wrote:


 Thank you, this is a start. I seems that there are more issues to  
 resolve. I am trying to make a general frame that would enable one  
 to write filters using this iterator.


   Nadav

 -Original Message-
 From: numpy-discussion-boun...@scipy.org on behalf of David Warde- 
 Farley
 Sent: Mon 23-Nov-09 03:21
 To: Discussion of Numerical Python
 Subject: Re: [Numpy-discussion] neighborhood iterator


 On 22-Nov-09, at 12:50 PM, Nadav Horesh wrote:


 I wonder if the neighbourhood iterator can be used as a more
 efficient replacement for ndimage.generic_filter. Is there a way to
 use it from cython?

 Yes, using the NumPy C API, called like any other C function is from
 Cython. Something like:

 ##

 import numpy as np
 cimport numpy as np

 cdef extern from numpy/arrayobject.h:
 object PyArray_NeighborhoodIterNew(object iter, np.npy_intp  
 bounds,
   int mode, object, np.ndarray
 fill_value)
 int PyArrayNeighborhoodIter_Next(object iter)
 int PyArrayNeighborhoodIter_Reset(object iter)

 ##

 should do the trick.

 Note that you'll need to call np.import_array() before using any of
 these functions to initialize the C API, I think.

 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 winmail.dat___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] convert strides/shape/offset into nd index?

2009-11-29 Thread Zachary Pincus
Hi all,

I'm curious as to what the most straightforward way is to convert an  
offset into a memory buffer representing an arbitrarily strided array  
into the nd index into that array. (Let's assume for simplicity that  
each element is one byte...)

Does sorting the strides from largest to smallest and then using  
integer division and mod (in the obvious way) always work? (I can't  
seem to find a counterexample, but I am not used to thinking too  
deeply about bizarrely-strided configurations.) Is there a method that  
doesn't involve sorting?

Thanks,
Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] convert strides/shape/offset into nd index?

2009-12-01 Thread Zachary Pincus
 Not to be a downer, but this problem is technically NP-complete. The
 so-called knapsack problem is to find a subset of a collection of
 numbers that adds up to the specified number, and it is NP-complete.
 Unfortunately, it is exactly what you need to do to find the indices
 to a particular memory location in an array of shape (2,2,...,2).

Ha ha, right -- that is the knapsack problem isn't it.
Oh well... I'll just require fortran- or C-style strided arrays, for  
which case it is easy to unravel offsets into indices.

Thanks everyone!

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] u in [u+1]

2010-02-05 Thread Zachary Pincus
 I'm having some trouble here.  I have a list of numpy arrays.  I  
 want to
 know if an array 'u' is in the list.

Try:

any(numpy.all(u == l) for l in array_list)

standard caveats about float comparisons apply; perhaps
any(numpy.allclose(u, l) for l in array_list)
is more appropriate in certain circumstances.

Can of course replace the first 'any' with 'all' or 'sum' to get  
different kinds of information, but using 'any' is equivalent to the  
'in' query that you wanted.

Why the 'in' operator below fails is that behind the scenes, 'u not in  
[u+1]' causes Python to iterate through the list testing each element  
for equality with u. Except that as the error states, arrays don't  
support testing for equality because such tests are ambiguous. (cf.  
many threads about this.)

Zach


On Feb 5, 2010, at 6:47 AM, Neal Becker wrote:

 I'm having some trouble here.  I have a list of numpy arrays.  I  
 want to
 know if an array 'u' is in the list.

 As an example,
 u = np.arange(10)

 : u not in [u+1]
 ---
 ValueErrorTraceback (most recent  
 call last)

 /home/nbecker/raysat/test/ipython console in module()

 ValueError: The truth value of an array with more than one element is
 ambiguous. Use a.any() or a.all()

 What would be the way to do this?

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Loading bit strings

2010-03-05 Thread Zachary Pincus

 Is there a good way in NumPy to convert from a bit string to a boolean
 array?

 For example, if I have a 2-byte string s='\xfd\x32', I want to get a
 16-length boolean array out of it.

numpy.unpackbits(numpy.fromstring('\xfd\x32', dtype=numpy.uint8))
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.array(arr.flat) mutates arr if arr.flags.fortran: bug?

2010-03-24 Thread Zachary Pincus
Hello,

I assume it is a bug that calling numpy.array() on a flatiter of a  
fortran-strided array that owns its own data causes that array to be  
rearranged somehow?

Not sure what happens with a fancier-strided array that also owns its  
own data (because I'm not sure how to create one of those in python).

This is from the latest svn version (2.0.0.dev8302) but was also  
present in a previous version too.

Zach


In [9]: a = numpy.array([[1,2],[3,4]]).copy('F')

In [10]: a
Out[10]:
array([[1, 2],
[3, 4]])

In [11]: list(a.flat)
Out[11]: [1, 2, 3, 4]

In [12]: a # no problem
Out[12]:
array([[1, 2],
[3, 4]])

In [13]: numpy.array(a.flat)
Out[13]: array([1, 2, 3, 4])

In [14]: a # this ain't right!
Out[14]:
array([[1, 3],
[2, 4]])

In [15]: a = numpy.array([[1,2],[3,4]]).copy('C')

In [16]: numpy.array(a.flat)
Out[16]: array([1, 2, 3, 4])

In [17]: a
Out[17]:
array([[1, 2],
[3, 4]])

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.array(arr.flat) mutates arr if arr.flags.fortran: bug?

2010-03-27 Thread Zachary Pincus
 You should open a ticket for this.

http://projects.scipy.org/numpy/ticket/1439



On Mar 26, 2010, at 11:26 AM, Charles R Harris wrote:



 On Wed, Mar 24, 2010 at 1:13 PM, Zachary Pincus zachary.pin...@yale.edu 
  wrote:
 Hello,

 I assume it is a bug that calling numpy.array() on a flatiter of a
 fortran-strided array that owns its own data causes that array to be
 rearranged somehow?

 Not sure what happens with a fancier-strided array that also owns its
 own data (because I'm not sure how to create one of those in python).

 This is from the latest svn version (2.0.0.dev8302) but was also
 present in a previous version too.

 Zach


 In [9]: a = numpy.array([[1,2],[3,4]]).copy('F')

 In [10]: a
 Out[10]:
 array([[1, 2],
[3, 4]])

 In [11]: list(a.flat)
 Out[11]: [1, 2, 3, 4]

 In [12]: a # no problem
 Out[12]:
 array([[1, 2],
[3, 4]])

 In [13]: numpy.array(a.flat)
 Out[13]: array([1, 2, 3, 4])

 In [14]: a # this ain't right!
 Out[14]:
 array([[1, 3],
[2, 4]])

 In [15]: a = numpy.array([[1,2],[3,4]]).copy('C')

 In [16]: numpy.array(a.flat)
 Out[16]: array([1, 2, 3, 4])

 In [17]: a
 Out[17]:
 array([[1, 2],
[3, 4]])


 You should open a ticket for this.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Replacing NANs

2010-03-30 Thread Zachary Pincus
 In an array I want to replace all NANs with some number say 100, I  
 found a method nan_to_num but it only replaces with zero.
 Any solution for this?

Indexing with a mask is one approach here:

a[numpy.isnan(a)] = 100

also cf. numpy.isfinite as well in case you want the same with infs.

Zach


On Mar 30, 2010, at 2:59 PM, Vishal Rana wrote:

 Hi,

 In an array I want to replace all NANs with some number say 100, I  
 found a method nan_to_num but it only replaces with zero.
 Any solution for this?

 Thanks
 Vishal
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to combine a pair of 1D arrays?

2010-04-14 Thread Zachary Pincus
 On Wed, Apr 14, 2010 at 10:25, Peter Shinners p...@shinners.org  
 wrote:
 Is there a way to combine two 1D arrays with the same size into a 2D
 array? It seems like the internal pointers and strides could be
 combined. My primary goal is to not make any copies of the data.

 There is absolutely no way to get around that, I am afraid.

You could start with the 2d array... instead of:
   a = np.array((1,2,3,4))
   b = np.array((11,12,13,14))
   c = np.magical_fuse(a, b)   # what goes here?

perhaps something like:

   c = np.empty((2,4), int)
   a = c[0]
   b = c[1]

Now a and b are views on c. So if you (say) pass them to, say, some  
(strides-aware) C routine that fills them in element-by-element, c  
will get filled in at the same time.

Zach


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-01 Thread Zachary Pincus
 Hi
 Can anyone think of a clever (non-lopping) solution to the following?

 A have a list of latitudes, a list of longitudes, and list of data  
 values. All lists are the same length.

 I want to compute an average  of data values for each lat/lon pair.  
 e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
 data[1001] = (data[1001] + data[2001])/2

 Looping is going to take wa to long.

As a start, are the equal lat/lon pairs exactly equal (i.e. either  
not floating-point, or floats that will always compare equal, that is,  
the floating-point bit-patterns will be guaranteed to be identical) or  
approximately equal to float tolerance?

If you're in the approx-equal case, then look at the KD-tree in scipy  
for doing near-neighbors queries.

If you're in the exact-equal case, you could consider hashing the lat/ 
lon pairs or something. At least then the looping is O(N) and not  
O(N^2):

import collections
grouped = collections.defaultdict(list)
for lt, ln, da in zip(lat, lon, data):
   grouped[(lt, ln)].append(da)

averaged = dict((ltln, numpy.mean(da)) for ltln, da in grouped.items())

Is that fast enough?

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-01 Thread Zachary Pincus
 I guess it's as fast as I'm going to get. I don't really see any  
 other way. BTW, the lat/lons are integers)

You could (in c or cython) try a brain-dead hashtable with no  
collision detection:

for lat, long, data in dataset:
   bin = (lat ^ long) % num_bins
   hashtable[bin] = update_incremental_mean(hashtable[bin], data)

you'll of course want to do some experiments to see if your data are  
sufficiently sparse and/or you can afford a large enough hashtable  
array that you won't get spurious hash collisions. Adding error- 
checking to ensure that there are no collisions would be pretty  
trivial (just keep a table of the lat/long for each hash value, which  
you'll need anyway, and check that different lat/long pairs don't get  
assigned the same bin).

Zach



 -Mathew

 On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincus zachary.pin...@yale.edu 
  wrote:
  Hi
  Can anyone think of a clever (non-lopping) solution to the  
 following?
 
  A have a list of latitudes, a list of longitudes, and list of data
  values. All lists are the same length.
 
  I want to compute an average  of data values for each lat/lon pair.
  e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
  data[1001] = (data[1001] + data[2001])/2
 
  Looping is going to take wa to long.

 As a start, are the equal lat/lon pairs exactly equal (i.e. either
 not floating-point, or floats that will always compare equal, that is,
 the floating-point bit-patterns will be guaranteed to be identical) or
 approximately equal to float tolerance?

 If you're in the approx-equal case, then look at the KD-tree in scipy
 for doing near-neighbors queries.

 If you're in the exact-equal case, you could consider hashing the lat/
 lon pairs or something. At least then the looping is O(N) and not
 O(N^2):

 import collections
 grouped = collections.defaultdict(list)
 for lt, ln, da in zip(lat, lon, data):
   grouped[(lt, ln)].append(da)

 averaged = dict((ltln, numpy.mean(da)) for ltln, da in  
 grouped.items())

 Is that fast enough?

 Zach
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   >