Not really. 1-D structured arrays can and do work well for the very
common case where one has unlabeled rows and labeled columns. They are
also a little bit more flexible in that the columns can be
heterogeneous in dtype, as columns are wont to do.
May I politely suggest that, just as some
Ryan May writes:
On Mon, Jul 12, 2010 at 8:04 AM, Neil Crighton neilcrigh...@gmail.com wrote:
Gael Varoquaux gael.varoquaux at normalesup.org writes:
I do such manipulation all the time, and keeping track of which axis is
what is fairly tedious and error prone. It would be much nicer to be
On Sun, Jul 11, 2010 at 11:59:30AM +, Neil Crighton wrote:
What is a use case for the new array type that can't be solved by
structured/record arrays? Sounds like it was decided at the Sciy
BOF they were a good idea, several people have implemented a
version of them and Fernando and Gael
Gael Varoquaux gael.varoquaux at normalesup.org writes:
Let say that you have a dataset that is in a 3D array, where axis 0
corresponds to days, axis 1 to hours of the day, and axis 2 to
temperature, you might want to have the mean of the temperature in each
day, which would be in current
On Mon, Jul 12, 2010 at 8:04 AM, Neil Crighton neilcrigh...@gmail.com wrote:
Gael Varoquaux gael.varoquaux at normalesup.org writes:
I do such manipulation all the time, and keeping track of which axis is
what is fairly tedious and error prone. It would be much nicer to be able
to write:
On Mon, Jul 12, 2010 at 01:04:55PM +, Neil Crighton wrote:
Thanks, that's a really nice description. Instead of
data.ax_day.mean(axis=0)
I think it would be clearer to do something like
data.mean(axis='day')
Yes, that's even better. The problem is that it does not extend to
operations
Rob Speer rspeer at MIT.EDU writes:
It's not just about the rows: a 2-D datarray can also index by
columns, an operation that has no equivalent in a 1-D array of records
like your example.
rec['305'] effectively indexes by column. This is one the main attractions of
structured/record arrays.
rec['305'] extracts a single value from a single record.
arr.named[:,305] extracts an *entire column* from a 2-D datarray,
returning you a 1-D datarray.
Once again, 1-D record arrays and 2-D labeled arrays look similar when
you print them, but the data structures are so unrelated that there is
On Mon, Jul 12, 2010 at 17:30, Rob Speer rsp...@mit.edu wrote:
rec['305'] extracts a single value from a single record.
No, in Neil's example `rec` was a structured array. You can index
structured arrays using the names of the record members, not just
scalars.
arr.named[:,305] extracts an
Robert Kern robert.kern at gmail.com writes:
Please install Fernando's datarray package, play with it, read its
documentation, then come back with objections or alternatives. I
really don't think you understand what is being proposed.
Well the discussion has been pretty confusing. For
But the utility of named indices is not so clear
to me. As I understand it, these new arrays will still only be
able to have a single type of data (one of float, str, int and so
on). This seems to be pretty limiting.
This just shows that people use NumPy for lots of different things. I
myself
Rob Speer writes:
Can axis labels be anything besides None or str?
Possibly. The part of this question I particularly like is accessing
attributes programmatically, using arr.axis[axisname]. That gives
.axis much more of a purpose. (Follow-up question: should we merge
.axis and .axes in the
Rob Speer writes:
My implementation is still missing the case with named ticks but
positional axes, however.
[...]
Just a quick hack: http://github.com/xscript/datarray
Lluis
--
And it's much the same thing with knowledge, for whenever you learn
something new, the whole world becomes that
On Fri, Jul 9, 2010 at 11:42 AM, Rob Speer rsp...@mit.edu wrote:
Now, the one part I've implemented that I just made up instead of
looking to the SciPy consensus (because there was no SciPy consensus)
was how to refer to multiple labeled axes without repeating .axis
all over the place. My
On Fri, Jul 9, 2010 at 1:17 PM, Joshua Holbrook josh.holbr...@gmail.com wrote:
On Fri, Jul 9, 2010 at 11:42 AM, Rob Speer rsp...@mit.edu wrote:
Now, the one part I've implemented that I just made up instead of
looking to the SciPy consensus (because there was no SciPy consensus)
was how to
On Fri, Jul 9, 2010 at 1:53 PM, Rob Speer rsp...@mit.edu wrote:
Keith Goodman wrote:
I ran into a few more questions while playing with datarrays, so I started a
list:
http://github.com/kwgoodman/datarrayQ
I have quick answers to some of the questions.
Thank you! Comments below.
Can I
Hi folks,
There has been a lot of great discussion about this -- and Im really
glad to see it happening -- I've been putting off refactoring my own
lame, limited, poorly designed, special purpose version of this for ages.
One question I have (that may have been answered) is about slicing:
I
On Fri, Jul 9, 2010 at 4:05 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
So what would you get if you wanted:
MyDataArray['jones':'wilson']
or
MyDataArray.names[slice('jones','wilson')]
or whatever the syntax would be?
If it was in alphabetical order, you'd be all set, but what
Keith Goodman wrote:
On Fri, Jul 9, 2010 at 4:05 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
So what would you get if you wanted:
MyDataArray['jones':'wilson']
or
MyDataArray.names[slice('jones','wilson')]
or whatever the syntax would be?
If it was in alphabetical order,
On Fri, Jul 9, 2010 at 5:00 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
Keith Goodman wrote:
On Fri, Jul 9, 2010 at 4:05 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
So what would you get if you wanted:
MyDataArray['jones':'wilson']
or
On Fri, Jul 9, 2010 at 5:52 PM, Joshua Holbrook josh.holbr...@gmail.com wrote:
On Fri, Jul 9, 2010 at 4:22 PM, Keith Goodman kwgood...@gmail.com wrote:
On Fri, Jul 9, 2010 at 5:00 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
Keith Goodman wrote:
On Fri, Jul 9, 2010 at 4:05 PM,
On Fri, Jul 9, 2010 at 4:56 PM, Keith Goodman kwgood...@gmail.com wrote:
On Fri, Jul 9, 2010 at 5:52 PM, Joshua Holbrook josh.holbr...@gmail.com
wrote:
On Fri, Jul 9, 2010 at 4:22 PM, Keith Goodman kwgood...@gmail.com wrote:
On Fri, Jul 9, 2010 at 5:00 PM, Christopher Barker
Glad I finally found this discussion.
I implemented some of the ideas from the SciPy BOAF discussion, and
Joshua has already merged them into his datarray on GitHub (thanks,
Joshua, for being so fast on the merge button).
To introduce these changes, here's a couple of examples of how you
could
On Wed, Jul 7, 2010 at 10:25 PM, Rob Speer rsp...@mit.edu wrote:
Glad I finally found this discussion.
I implemented some of the ideas from the SciPy BOAF discussion, and
Joshua has already merged them into his datarray on GitHub (thanks,
Joshua, for being so fast on the merge button).
To
Rob Speer writes:
arr.country.named('Netherlands').year.named(2010)
arr.country.named('Spain').year.named(slice(1994, 2010))
arr.year.named(2006).country[0:2]
This looks too verbose to me.
As axis always have a total order, I'd go for the most compact representation
(assuming 'country' is
On Thu, Jul 8, 2010 at 3:13 AM, Lluís xscr...@gmx.net wrote:
Rob Speer writes:
arr.country.named('Netherlands').year.named(2010)
arr.country.named('Spain').year.named(slice(1994, 2010))
arr.year.named(2006).country[0:2]
This looks too verbose to me.
As axis always have a total order, I'd
On Thu, Jul 8, 2010 at 7:13 AM, Lluís xscr...@gmx.net wrote:
Thus, we can use something in the middle:
arr[0,1]
arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks'
Ah ha. So this is the case with positional axes but named ticks, which
we haven't really brought up
Joshua Holbrook writes:
On Thu, Jul 8, 2010 at 3:13 AM, Lluís xscr...@gmx.net wrote:
Rob Speer writes:
arr.country.named('Netherlands').year.named(2010)
arr.country.named('Spain').year.named(slice(1994, 2010))
arr.year.named(2006).country[0:2]
This looks too verbose to me.
As axis
While I haven't had a chance to really look in-depth at the changes
myself (I'm a busy man! So many mailing lists!), I so far like the
look and sound of them. That's just my opinion, though.
If people are okay with the attribute magic, I have a proposal for more of it.
In my own project where
On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer rsp...@mit.edu wrote:
While I haven't had a chance to really look in-depth at the changes
myself (I'm a busy man! So many mailing lists!), I so far like the
look and sound of them. That's just my opinion, though.
If people are okay with the attribute
Rob Speer writes:
On Thu, Jul 8, 2010 at 7:13 AM, Lluís xscr...@gmx.net wrote:
Thus, we can use something in the middle:
arr[0,1]
arr.names['Netherlands',2010] # I'd rather go for 'names' instead of
'ticks'
Ah ha. So this is the case with positional axes but named ticks, which
we
Skipper Seabold writes:
On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer rsp...@mit.edu wrote:
[...]
My proposal is that datarray.row should be equivalent to
datarray.axes[0], and datarray.column should be equivalent to
datarray.axes[1], so that you can always ask for something like
But I don't understand your second example:
arr.country['Spain'].year[1994:2010]
That seems to run straight into the index/name ambiguity. Shouldn't
that take the 1994th through 2010th indices along the year axis? Not
every axis will have names, so you can't make *all* the indexing go by
Rob Speer writes:
Or what I was striving for:
arr.year.named[1994:2010]
arr.year['1994':'2010']
arr.year['1994':-3]
So your proposal is, whenever there's an index that is not an integer,
look it up by name, and use .named only if you want integer tick
names? This feels too
On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer rsp...@mit.edu wrote:
Forgive me if this is has already been addressed, but my question is
what happens when we have more than one label (not as in a labeled
axis but an observation label -- but not a tick because they're not
unique!) per say row axis
On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer rsp...@mit.edu wrote:
Forgive me if this is has already been addressed, but my question is
what happens when we have more than one label (not as in a labeled
axis but an observation label -- but not a tick because they're not
unique!) per say row axis
On Thu, Jul 8, 2010 at 1:38 PM, Lluís xscr...@gmx.net wrote:
Skipper Seabold writes:
On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer rsp...@mit.edu wrote:
[...]
My proposal is that datarray.row should be equivalent to
datarray.axes[0], and datarray.column should be equivalent to
On Thu, Jul 8, 2010 at 2:27 PM, Skipper Seabold jsseab...@gmail.com wrote:
On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer rsp...@mit.edu wrote:
Your labels are unique if you look at them the right way. Here's how I
would represent that in a datarray:
* axis0 = 'city', ['Austin', 'Boston', ...]
*
On Thu, Jul 8, 2010 at 2:41 PM, Rob Speer rsp...@mit.edu wrote:
On Thu, Jul 8, 2010 at 2:27 PM, Skipper Seabold jsseab...@gmail.com wrote:
On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer rsp...@mit.edu wrote:
Your labels are unique if you look at them the right way. Here's how I
would represent that
No. I'd rather go for eliminating the 'arr.year.named', and providing only:
* arr.__getitem__
* arr.named.__getitem__
* arr.label.__getitem__
The first being just the current ndarray.__getitem__, and the two last methods
would accept both strings and integers, assuming that names/ticks
Skipper Seabold writes:
[...]
If I understood well, you could have 4 axes (assuming that an Axis can only
handle a single label/variable).
a = DatArray(numpy.array([...], dtype = [(precipitation, float),
(temperature, float)]),
((city,
Rob Speer writes:
No. I'd rather go for eliminating the 'arr.year.named', and providing only:
* arr.__getitem__
* arr.named.__getitem__
* arr.label.__getitem__
The first being just the current ndarray.__getitem__, and the two last
methods
would accept both strings and integers,
On Thu, Jul 8, 2010 at 1:15 PM, Lluís xscr...@gmx.net wrote:
My impression from SciPy was that people would prefer separate
accessors for names and indices, especially because integers (a really
common data type, after all) shouldn't be forbidden. Also, working
with strings of integers like
Fernando Perez writes:
The consensus at the BoF (not that it means it's set in stone, simply
that there was good chance for back-and-forth on the topic with many
voices) was that:
1. There are valid use cases for 'integer ticks', i.e. integers that
index arbitrarily into an array instead
On Thu, Jul 8, 2010 at 12:20 PM, Fernando Perez fperez@gmail.com wrote:
On Thu, Jul 8, 2010 at 1:15 PM, Lluís xscr...@gmx.net wrote:
My impression from SciPy was that people would prefer separate
accessors for names and indices, especially because integers (a really
common data type,
On Thu, Jul 8, 2010 at 12:55 PM, Lluís xscr...@gmx.net wrote:
Fernando Perez writes:
The consensus at the BoF (not that it means it's set in stone, simply
that there was good chance for back-and-forth on the topic with many
voices) was that:
1. There are valid use cases for 'integer
Joshua Holbrook writes:
arr[not int] - tick-based indexing
At the BoF, we chose to drop this because we wanted to allow integer ticks (or
implicit type conversion, either way) without the ambiguity of, did we mean
that in the ndarray sense or in a tick with the name '1' sense?
Sorry, I
On Thu, Jul 8, 2010 at 1:30 PM, Lluís xscr...@gmx.net wrote:
Joshua Holbrook writes:
arr[not int] - tick-based indexing
At the BoF, we chose to drop this because we wanted to allow integer ticks
(or
implicit type conversion, either way) without the ambiguity of, did we mean
that in the
Still, I have a question. Did you also agree that this should forcibly index
through ticks?
arr.something[int] - tick-based indexing
Yes.
I feel like people are talking about different things because it's
unclear what the .something is.
If the .something is an axis name, then no.
On Thu, Jul 8, 2010 at 1:39 PM, Rob Speer rsp...@mit.edu wrote:
Still, I have a question. Did you also agree that this should forcibly index
through ticks?
arr.something[int] - tick-based indexing
Yes.
I feel like people are talking about different things because it's
unclear what
On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer rsp...@mit.edu wrote:
Still, I have a question. Did you also agree that this should forcibly index
through ticks?
arr.something[int] - tick-based indexing
Yes.
I feel like people are talking about different things because it's
unclear what
3. That the best solution to allow integer ticks while retaining
'normal' indexing semantics for integers would be to have
arr[int] - normal indexing
arr.somethin[int] - tick-based indexing, where an int can mean anything.
All right, it's clear lots of people like it better this way, so I
Then how is this not different than a record array?
How is this the *same* as a record array?
--Josh
On Thu, Jul 8, 2010 at 2:03 PM, Rob Speer rsp...@mit.edu wrote:
3. That the best solution to allow integer ticks while retaining
'normal' indexing semantics for integers would be to have
On Thu, Jul 8, 2010 at 18:00, Bruce Southey bsout...@gmail.com wrote:
On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer rsp...@mit.edu wrote:
Still, I have a question. Did you also agree that this should forcibly
index
through ticks?
arr.something[int] - tick-based indexing
Yes.
I feel
I think we have to start from the nD case, even if I (and I think most
users) will tend to think in 2D. The rest is just going to have to be
up to developers how they want users to interact with what we, the
developers, see as axes. No end-user wants to think about the 6th
axis of the data,
On Thu, Jul 8, 2010 at 5:09 PM, Robert Kern robert.k...@gmail.com wrote:
On Thu, Jul 8, 2010 at 18:00, Bruce Southey bsout...@gmail.com wrote:
On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer rsp...@mit.edu wrote:
Still, I have a question. Did you also agree that this should forcibly
index
through
On Thu, Jul 8, 2010 at 4:05 PM, Lluís xscr...@gmx.net wrote:
Another reason to have multiple variables, is that the insertion of NaNs to
maintain shape homogeneity will make these synthetic NaNs undistinguishable
from other NaNs that might be on your original input data, unless you use a
http://github.com/rspeer/datarray represents my best guess at the
SciPy BOF consensus. I recently switched the method of accessing named
ticks from .named() to .named[] based on further discussion here.
My implementation is still missing the case with named ticks but
positional axes, however.
Bruce Southey writes:
1) Indexing especially related to slicing and broadcasting.
1.1) Absolute indexing/slicing
a[0], a['tickvalue']
1.2) Partial slicing
For the case of compund ticks that is merging multiple ticks into a
single one:
a['subtick1value-subtick2value']
Bruce Southey writes:
4) How should this interact with the rest of numpy?
BTW, now I rememberd something I wanted to implement but required too much
monkeypatching right now.
For all functions accepting the 'axis' argument, I'd like to provide a string
that uniquely identifies a dimension/axis,
On Tue, Jul 6, 2010 at 10:47 AM, Joshua Holbrook
josh.holbr...@gmail.com wrote:
I really really really want to work on this. I already forked datarray
on github and did some research on What Other People Have Done (
http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
luck I'll
On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote:
I really really really want to work on this. I already forked datarray
on github and did some research on What Other People Have Done (
http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
luck I'll
On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote:
On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com
wrote:
I really really really want to work on this. I already forked datarray
on github and did some research on What Other People Have Done (
I'm kinda-sorta still getting around to building/reading the sphinx
docs for datarray. _ Like, I've gone through them before, but it was
more cursory than I'd like. Honestly, I kinda let myself get caught up
in trying to automate the process of getting them onto github pages.
I have to admit that
On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote:
On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook
josh.holbr...@gmail.com wrote:
I'm kinda-sorta still getting around to building/reading the sphinx
docs for datarray. _ Like, I've gone through them before, but it was
On Tue, Jul 6, 2010 at 9:52 AM, Joshua Holbrook josh.holbr...@gmail.com wrote:
On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote:
On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook
josh.holbr...@gmail.com wrote:
I'm kinda-sorta still getting around to building/reading the
On Tue, Jul 6, 2010 at 12:56 PM, Keith Goodman kwgood...@gmail.com wrote:
On Tue, Jul 6, 2010 at 9:52 AM, Joshua Holbrook josh.holbr...@gmail.com
wrote:
On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote:
On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook
Just to give a data point, my research group and I would be very excited
at the idea of having Fernando's data arrays in Numpy. We can't offer to
maintain it, because we are already fairly involved in machine learning
and neuroimaging specific code, but we would be able to rely on it more
in our
My opinion on the matter is that, as a matter of purity, labels
should all have the string datatype. That said, I'd imagine that
passing an int as an argument would be fine, due to python's
loosey-goosey attitude towards datatypes. :) That, or, y'know,
str(myint).
That's kind of what I went
Fernando Perez proposed a NumPy enhancement, an ndarray with named axes,
prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew
Brett, Kilian Koepsell and Stefan van der Walt.
At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather)
discussion of this proposal.
The
70 matches
Mail list logo