matplotlib would be more than happy if numpy could take those functions off
our hands! They don't get nearly the correct visibility in matplotlib
because no one is expecting them to be in a plotting library, and they
don't have any useful unit-tests. None of us made them, so we are very
hesitant
On Fri, Feb 19, 2016 at 12:08 PM, Allan Haldane
wrote:
> I also want to add a historical note here, that 'groupby' has been
> discussed a couple times before.
>
> Travis Oliphant even made an NEP for it, and Wes McKinney lightly hinted
> at adding it to numpy.
>
>
I also want to add a historical note here, that 'groupby' has been
discussed a couple times before.
Travis Oliphant even made an NEP for it, and Wes McKinney lightly hinted
at adding it to numpy.
http://thread.gmane.org/gmane.comp.python.numeric.general/37480/focus=37480
e: Sat, 13 Feb 2016 22:41:13 -0500
> From: Allan Haldane <allanhald...@gmail.com>
> To: numpy-discussion@scipy.org
> Subject: Re: [Numpy-discussion] [Suggestion] Labelled Array
> Message-ID: <56bff759.7010...@gmail.com>
> Content-Type: text/plain; charset=windows-1252; fo
Just for posterity -- any future readers to this thread who need to do
pandas-like on record arrays should look at matplotlib's mlab submodule.
I've been in situations (::cough:: Esri production ::cough::) where I've
had one hand tied behind my back and unable to install pandas. mlab was a
big
Benjamin Root writes:
> Seems like you are talking about xarray: https://github.com/pydata/xarray
Oh, I wasn't aware of xarray, but there's also this:
https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#basic-indexing
I've had a pretty similar idea for a new indexing function
'split_classes' which would help in your case, which essentially does
def split_classes(c, v):
return [v[c == u] for u in unique(c)]
Your example could be coded as
>>> [sum(c) for c in split_classes(label, data)]
Sorry, to reply to myself here, but looking at it with fresh eyes maybe
the performance of the naive version isn't too bad. Here's a comparison
of the naive vs a better implementation:
def split_classes_naive(c, v):
return [v[c == u] for u in unique(c)]
def split_classes(c, v):
perm =
I believe this is basically a groupby, which is one of pandas's core
competencies... even if numpy were to add some utilities for this kind of
thing, then I doubt we'd do as well as them, so you might check whether
pandas works for you first :-)
On Feb 12, 2016 6:40 AM, "Sérgio"
On Sat, Feb 13, 2016 at 1:01 PM, Allan Haldane
wrote:
> Sorry, to reply to myself here, but looking at it with fresh eyes maybe
> the performance of the naive version isn't too bad. Here's a comparison of
> the naive vs a better implementation:
>
> def
In [10]: pd.options.display.max_rows=10
In [13]: np.random.seed(1234)
In [14]: c = np.random.randint(0,32,size=10)
In [15]: v = np.arange(10)
In [16]: df = DataFrame({'v' : v, 'c' : c})
In [17]: df
Out[17]:
c v
0 15 0
1 19 1
2 6 2
3 21
These operations get slower as the number of groups increase, but with a
faster function (e.g. the standard ones which are cythonized), the constant
on
the increase is pretty low.
In [23]: c = np.random.randint(0,1,size=10)
In [24]: df = DataFrame({'v' : v, 'c' : c})
In [25]: %timeit
On Sat, Feb 13, 2016 at 1:42 PM, Jeff Reback wrote:
> These operations get slower as the number of groups increase, but with a
> faster function (e.g. the standard ones which are cythonized), the
> constant on
> the increase is pretty low.
>
> In [23]: c =
Impressive!
Possibly there's still a case for including a 'groupby' function in
numpy itself since it's a generally useful operation, but I do see less
of a need given the nice pandas functionality.
At least, next time someone asks a stackoverflow question like the ones
below someone should
Seems like you are talking about xarray: https://github.com/pydata/xarray
Cheers!
Ben Root
On Fri, Feb 12, 2016 at 9:40 AM, Sérgio wrote:
> Hello,
>
> This is my first e-mail, I will try to make the idea simple.
>
> Similar to masked array it would be interesting to use a
Hello,
This is my first e-mail, I will try to make the idea simple.
Similar to masked array it would be interesting to use a label array to
guide operations.
Ex.:
>>> x
labelled_array(data =
[[0 1 2]
[3 4 5]
[6 7 8]],
label =
[[0 1 2]
[0 1 2]
[0 1 2]])
>>> sum(x)
Re-reading your post, I see you are talking about something different. Not
exactly sure what your use-case is.
Ben Root
On Fri, Feb 12, 2016 at 9:49 AM, Benjamin Root wrote:
> Seems like you are talking about xarray: https://github.com/pydata/xarray
>
> Cheers!
> Ben Root
17 matches
Mail list logo