Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-19 Thread Benjamin Root
matplotlib would be more than happy if numpy could take those functions off our hands! They don't get nearly the correct visibility in matplotlib because no one is expecting them to be in a plotting library, and they don't have any useful unit-tests. None of us made them, so we are very hesitant

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-19 Thread josef.pktd
On Fri, Feb 19, 2016 at 12:08 PM, Allan Haldane wrote: > I also want to add a historical note here, that 'groupby' has been > discussed a couple times before. > > Travis Oliphant even made an NEP for it, and Wes McKinney lightly hinted > at adding it to numpy. > >

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-19 Thread Allan Haldane
I also want to add a historical note here, that 'groupby' has been discussed a couple times before. Travis Oliphant even made an NEP for it, and Wes McKinney lightly hinted at adding it to numpy. http://thread.gmane.org/gmane.comp.python.numeric.general/37480/focus=37480

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-16 Thread Sérgio
e: Sat, 13 Feb 2016 22:41:13 -0500 > From: Allan Haldane <allanhald...@gmail.com> > To: numpy-discussion@scipy.org > Subject: Re: [Numpy-discussion] [Suggestion] Labelled Array > Message-ID: <56bff759.7010...@gmail.com> > Content-Type: text/plain; charset=windows-1252; fo

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-15 Thread Paul Hobson
Just for posterity -- any future readers to this thread who need to do pandas-like on record arrays should look at matplotlib's mlab submodule. I've been in situations (::cough:: Esri production ::cough::) where I've had one hand tied behind my back and unable to install pandas. mlab was a big

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-15 Thread Lluís Vilanova
Benjamin Root writes: > Seems like you are talking about xarray: https://github.com/pydata/xarray Oh, I wasn't aware of xarray, but there's also this: https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#basic-indexing

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread Allan Haldane
I've had a pretty similar idea for a new indexing function 'split_classes' which would help in your case, which essentially does def split_classes(c, v): return [v[c == u] for u in unique(c)] Your example could be coded as >>> [sum(c) for c in split_classes(label, data)]

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread Allan Haldane
Sorry, to reply to myself here, but looking at it with fresh eyes maybe the performance of the naive version isn't too bad. Here's a comparison of the naive vs a better implementation: def split_classes_naive(c, v): return [v[c == u] for u in unique(c)] def split_classes(c, v): perm =

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread Nathaniel Smith
I believe this is basically a groupby, which is one of pandas's core competencies... even if numpy were to add some utilities for this kind of thing, then I doubt we'd do as well as them, so you might check whether pandas works for you first :-) On Feb 12, 2016 6:40 AM, "Sérgio"

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread josef.pktd
On Sat, Feb 13, 2016 at 1:01 PM, Allan Haldane wrote: > Sorry, to reply to myself here, but looking at it with fresh eyes maybe > the performance of the naive version isn't too bad. Here's a comparison of > the naive vs a better implementation: > > def

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread Jeff Reback
In [10]: pd.options.display.max_rows=10 In [13]: np.random.seed(1234) In [14]: c = np.random.randint(0,32,size=10) In [15]: v = np.arange(10) In [16]: df = DataFrame({'v' : v, 'c' : c}) In [17]: df Out[17]: c v 0 15 0 1 19 1 2 6 2 3 21

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread Jeff Reback
These operations get slower as the number of groups increase, but with a faster function (e.g. the standard ones which are cythonized), the constant on the increase is pretty low. In [23]: c = np.random.randint(0,1,size=10) In [24]: df = DataFrame({'v' : v, 'c' : c}) In [25]: %timeit

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread josef.pktd
On Sat, Feb 13, 2016 at 1:42 PM, Jeff Reback wrote: > These operations get slower as the number of groups increase, but with a > faster function (e.g. the standard ones which are cythonized), the > constant on > the increase is pretty low. > > In [23]: c =

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread Allan Haldane
Impressive! Possibly there's still a case for including a 'groupby' function in numpy itself since it's a generally useful operation, but I do see less of a need given the nice pandas functionality. At least, next time someone asks a stackoverflow question like the ones below someone should

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-12 Thread Benjamin Root
Seems like you are talking about xarray: https://github.com/pydata/xarray Cheers! Ben Root On Fri, Feb 12, 2016 at 9:40 AM, Sérgio wrote: > Hello, > > This is my first e-mail, I will try to make the idea simple. > > Similar to masked array it would be interesting to use a

[Numpy-discussion] [Suggestion] Labelled Array

2016-02-12 Thread Sérgio
Hello, This is my first e-mail, I will try to make the idea simple. Similar to masked array it would be interesting to use a label array to guide operations. Ex.: >>> x labelled_array(data = [[0 1 2] [3 4 5] [6 7 8]], label = [[0 1 2] [0 1 2] [0 1 2]]) >>> sum(x)

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-12 Thread Benjamin Root
Re-reading your post, I see you are talking about something different. Not exactly sure what your use-case is. Ben Root On Fri, Feb 12, 2016 at 9:49 AM, Benjamin Root wrote: > Seems like you are talking about xarray: https://github.com/pydata/xarray > > Cheers! > Ben Root