Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
Glad I finally found this discussion. I implemented some of the ideas from the SciPy BOAF discussion, and Joshua has already merged them into his datarray on GitHub (thanks, Joshua, for being so fast on the merge button). To introduce these changes, here's a couple of examples of how you could

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Wed, Jul 7, 2010 at 10:25 PM, Rob Speer rsp...@mit.edu wrote: Glad I finally found this discussion. I implemented some of the ideas from the SciPy BOAF discussion, and Joshua has already merged them into his datarray on GitHub (thanks, Joshua, for being so fast on the merge button). To

Re: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot

2010-07-08 Thread Christoph Gohlke
On 7/7/2010 9:43 PM, Charles R Harris wrote: On Wed, Jul 7, 2010 at 10:13 PM, Christoph Gohlke cgoh...@uci.edu mailto:cgoh...@uci.edu wrote: Dear NumPy developers, I am trying to solve some scipy.sparse TypeError failures reported in [1] and reduced them to the following

Re: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot

2010-07-08 Thread Christoph Gohlke
On 7/7/2010 9:59 PM, Charles R Harris wrote: On Wed, Jul 7, 2010 at 10:13 PM, Christoph Gohlke cgoh...@uci.edu mailto:cgoh...@uci.edu wrote: Dear NumPy developers, I am trying to solve some scipy.sparse TypeError failures reported in [1] and reduced them to the following

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Rob Speer writes: arr.country.named('Netherlands').year.named(2010) arr.country.named('Spain').year.named(slice(1994, 2010)) arr.year.named(2006).country[0:2] This looks too verbose to me. As axis always have a total order, I'd go for the most compact representation (assuming 'country' is

Re: [Numpy-discussion] Memory usage of numpy-arrays

2010-07-08 Thread Bruce Southey
On 07/08/2010 08:52 AM, Wes McKinney wrote: On Thu, Jul 8, 2010 at 9:26 AM, Hannes Bretschneider hannes.bretschnei...@wiwi.hu-berlin.de wrote: Dear NumPy developers, I have to process some big data files with high-frequency financial data. I am trying to load a delimited text file

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 3:13 AM, Lluís xscr...@gmx.net wrote: Rob Speer writes: arr.country.named('Netherlands').year.named(2010) arr.country.named('Spain').year.named(slice(1994, 2010)) arr.year.named(2006).country[0:2] This looks too verbose to me. As axis always have a total order, I'd

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
On Thu, Jul 8, 2010 at 7:13 AM, Lluís xscr...@gmx.net wrote: Thus, we can use something in the middle:   arr[0,1]   arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks' Ah ha. So this is the case with positional axes but named ticks, which we haven't really brought up

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Joshua Holbrook writes: On Thu, Jul 8, 2010 at 3:13 AM, Lluís xscr...@gmx.net wrote: Rob Speer writes: arr.country.named('Netherlands').year.named(2010) arr.country.named('Spain').year.named(slice(1994, 2010)) arr.year.named(2006).country[0:2] This looks too verbose to me. As axis

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
While I haven't had a chance to really look in-depth at the changes myself (I'm a busy man! So many mailing lists!), I so far like the look and sound of them. That's just my opinion, though. If people are okay with the attribute magic, I have a proposal for more of it. In my own project where

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer rsp...@mit.edu wrote: While I haven't had a chance to really look in-depth at the changes myself (I'm a busy man! So many mailing lists!), I so far like the look and sound of them. That's just my opinion, though. If people are okay with the attribute

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Rob Speer writes: On Thu, Jul 8, 2010 at 7:13 AM, Lluís xscr...@gmx.net wrote: Thus, we can use something in the middle:   arr[0,1]   arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks' Ah ha. So this is the case with positional axes but named ticks, which we

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Skipper Seabold writes: On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer rsp...@mit.edu wrote: [...] My proposal is that datarray.row should be equivalent to datarray.axes[0], and datarray.column should be equivalent to datarray.axes[1], so that you can always ask for something like

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
But I don't understand your second example:   arr.country['Spain'].year[1994:2010] That seems to run straight into the index/name ambiguity. Shouldn't that take the 1994th through 2010th indices along the year axis? Not every axis will have names, so you can't make *all* the indexing go by

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Rob Speer writes: Or what I was striving for:   arr.year.named[1994:2010]   arr.year['1994':'2010']   arr.year['1994':-3] So your proposal is, whenever there's an index that is not an integer, look it up by name, and use .named only if you want integer tick names? This feels too

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Wes McKinney
On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer rsp...@mit.edu wrote: Forgive me if this is has already been addressed, but my question is what happens when we have more than one label (not as in a labeled axis but an observation label -- but not a tick because they're not unique!) per say row axis

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer rsp...@mit.edu wrote: Forgive me if this is has already been addressed, but my question is what happens when we have more than one label (not as in a labeled axis but an observation label -- but not a tick because they're not unique!) per say row axis

[Numpy-discussion] Fwd: effect of shape=None (the default) in format.open_memmap

2010-07-08 Thread David Goldsmith
No reply? -- Forwarded message -- From: David Goldsmith d.l.goldsm...@gmail.com Date: Tue, Jul 6, 2010 at 7:03 PM Subject: effect of shape=None (the default) in format.open_memmap To: numpy-discussion@scipy.org Hi, I'm trying to wrap my brain around the affect of leaving

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 1:38 PM, Lluís xscr...@gmx.net wrote: Skipper Seabold writes: On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer rsp...@mit.edu wrote: [...] My proposal is that datarray.row should be equivalent to datarray.axes[0], and datarray.column should be equivalent to

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
On Thu, Jul 8, 2010 at 2:27 PM, Skipper Seabold jsseab...@gmail.com wrote: On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer rsp...@mit.edu wrote: Your labels are unique if you look at them the right way. Here's how I would represent that in a datarray: * axis0 = 'city', ['Austin', 'Boston', ...] *

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 2:41 PM, Rob Speer rsp...@mit.edu wrote: On Thu, Jul 8, 2010 at 2:27 PM, Skipper Seabold jsseab...@gmail.com wrote: On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer rsp...@mit.edu wrote: Your labels are unique if you look at them the right way. Here's how I would represent that

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
No. I'd rather go for eliminating the 'arr.year.named', and providing only:  * arr.__getitem__  * arr.named.__getitem__  * arr.label.__getitem__ The first being just the current ndarray.__getitem__, and the two last methods would accept both strings and integers, assuming that names/ticks

Re: [Numpy-discussion] [Numpy-svn] r8413 - trunk/numpy/lib - Author: oliphant - Add percentile function.

2010-07-08 Thread Keith Goodman
On Thu, Jul 8, 2010 at 12:27 PM, Sebastian Haase seb.ha...@gmail.com wrote: isn't this related to http://projects.scipy.org/numpy/ticket/626 percentile() and clamp() which was set to invalid -Sebastian The new percentile function has an axis input. I like that.

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Skipper Seabold writes: [...] If I understood well, you could have 4 axes (assuming that an Axis can only handle a single label/variable). a = DatArray(numpy.array([...], dtype = [(precipitation, float),                                         (temperature, float)]),             ((city,

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Rob Speer writes: No. I'd rather go for eliminating the 'arr.year.named', and providing only:  * arr.__getitem__  * arr.named.__getitem__  * arr.label.__getitem__ The first being just the current ndarray.__getitem__, and the two last methods would accept both strings and integers,

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Fernando Perez
On Thu, Jul 8, 2010 at 1:15 PM, Lluís xscr...@gmx.net wrote: My impression from SciPy was that people would prefer separate accessors for names and indices, especially because integers (a really common data type, after all) shouldn't be forbidden. Also, working with strings of integers like

[Numpy-discussion] DataArray ticks

2010-07-08 Thread Keith Goodman
What do you think of adding a ticks parameter to DataArray? Would that make sense? Current behavior: x = DataArray([[1, 2], [3, 4]], (('row', ['A','B']), ('col', ['C', 'D']))) x.axes (Axis(label='row', index=0, ticks=['A', 'B']), Axis(label='col', index=1, ticks=['C', 'D'])) Proposed ticks

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Fernando Perez writes: The consensus at the BoF (not that it means it's set in stone, simply that there was good chance for back-and-forth on the topic with many voices) was that: 1. There are valid use cases for 'integer ticks', i.e. integers that index arbitrarily into an array instead

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 12:20 PM, Fernando Perez fperez@gmail.com wrote: On Thu, Jul 8, 2010 at 1:15 PM, Lluís xscr...@gmx.net wrote: My impression from SciPy was that people would prefer separate accessors for names and indices, especially because integers (a really common data type,

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 12:55 PM, Lluís xscr...@gmx.net wrote: Fernando Perez writes: The consensus at the  BoF (not that it means it's set in stone, simply that there was  good chance for back-and-forth on the topic with many voices) was that: 1. There are valid use cases for 'integer

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Joshua Holbrook writes: arr[not int] - tick-based indexing At the BoF, we chose to drop this because we wanted to allow integer ticks (or implicit type conversion, either way) without the ambiguity of, did we mean that in the ndarray sense or in a tick with the name '1' sense? Sorry, I

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 1:30 PM, Lluís xscr...@gmx.net wrote: Joshua Holbrook writes: arr[not int] - tick-based indexing At the BoF, we chose to drop this because we wanted to allow integer ticks (or implicit type conversion, either way) without the ambiguity of, did we mean that in the

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
Still, I have a question. Did you also agree that this should forcibly index through ticks?  arr.something[int]      - tick-based indexing Yes. I feel like people are talking about different things because it's unclear what the .something is. If the .something is an axis name, then no.

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 1:39 PM, Rob Speer rsp...@mit.edu wrote: Still, I have a question. Did you also agree that this should forcibly index through ticks?  arr.something[int]      - tick-based indexing Yes. I feel like people are talking about different things because it's unclear what

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Bruce Southey
On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer rsp...@mit.edu wrote: Still, I have a question. Did you also agree that this should forcibly index through ticks?  arr.something[int]      - tick-based indexing Yes. I feel like people are talking about different things because it's unclear what

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
3. That the  best solution to allow integer ticks while retaining 'normal' indexing semantics for integers would be to have arr[int] - normal indexing arr.somethin[int] - tick-based indexing, where an int can mean anything. All right, it's clear lots of people like it better this way, so I

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
Then how is this not different than a record array? How is this the *same* as a record array? --Josh On Thu, Jul 8, 2010 at 2:03 PM, Rob Speer rsp...@mit.edu wrote: 3. That the  best solution to allow integer ticks while retaining 'normal' indexing semantics for integers would be to have

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Robert Kern
On Thu, Jul 8, 2010 at 18:00, Bruce Southey bsout...@gmail.com wrote: On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer rsp...@mit.edu wrote: Still, I have a question. Did you also agree that this should forcibly index through ticks?  arr.something[int]      - tick-based indexing Yes. I feel

Re: [Numpy-discussion] Memory usage of numpy-arrays

2010-07-08 Thread Hannes Bretschneider
Sebastian Haase seb.haase at gmail.com writes: I would expect a 700MB text file translate into less than 200MB of data - assuming that you are talking about decimal numbers (maybe total of 10 digits each + spaces) and saving as float32 binary. So the problem would only be the loading in -

Re: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot

2010-07-08 Thread Christoph Gohlke
On 7/7/2010 9:13 PM, Christoph Gohlke wrote: Dear NumPy developers, I am trying to solve some scipy.sparse TypeError failures reported in [1] and reduced them to the following example: import numpy a = numpy.array([[1]]) numpy.dot(a.astype('single'), a.astype('longdouble'))

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
I think we have to start from the nD case, even if I (and I think most users) will tend to think in 2D.  The rest is just going to have to be up to developers how they want users to interact with what we, the developers, see as axes.  No end-user wants to think about the 6th axis of the data,

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Bruce Southey
On Thu, Jul 8, 2010 at 5:09 PM, Robert Kern robert.k...@gmail.com wrote: On Thu, Jul 8, 2010 at 18:00, Bruce Southey bsout...@gmail.com wrote: On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer rsp...@mit.edu wrote: Still, I have a question. Did you also agree that this should forcibly index through

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 4:05 PM, Lluís xscr...@gmx.net wrote: Another reason to have multiple variables, is that the insertion of NaNs to maintain shape homogeneity will make these synthetic NaNs undistinguishable from other NaNs that might be on your original input data, unless you use a

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
http://github.com/rspeer/datarray represents my best guess at the SciPy BOF consensus. I recently switched the method of accessing named ticks from .named() to .named[] based on further discussion here. My implementation is still missing the case with named ticks but positional axes, however.