Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-14 Thread Stephen Simmons
I would really like to see this become a core part of numpy...

For groupby-like summing over arrays, I use a modified version of
numpy.bincount() which has optional arguments that greatly enhance its 
flexibility:
   bincount(bin, weights=, max_bins=. out=)
where:
   *  bins- numpy array of bin numbers (uint8, int16 or int32).
  [1]  *Negative bins numbers indicate weights to be ignored
   *  weights - (opt) numpy array of weights (float or double)
  [2]  *  max_bin - (opt) bin numbers greater than this are ignored when 
counting
   *  out - (opt) numpy output array (int32 or double)

[1]  This is how I support Robert Kern's comment below If there are some
areas you want to ignore, that's difficult to do with reduceat().

[2]  Specifying the number of bins up front has two benefits: (i) saves
scanning the bins array to see how big the output needs to be;
and (ii) allows you to control the size of the output array, as you may
want it bigger than the number of bins would suggest.


I look forward to the draft NEP!

Best regards
Stephen Simmons



On 13/04/2010 10:34 PM, Robert Kern wrote:
 On Sat, Apr 10, 2010 at 17:59, Robert Kernrobert.k...@gmail.com  wrote:

 On Sat, Apr 10, 2010 at 12:45, Pauli Virtanenp...@iki.fi  wrote:
  
 la, 2010-04-10 kello 12:23 -0500, Travis Oliphant kirjoitti:
 [clip]

 Here are my suggested additions to NumPy:
 ufunc methods:
  
 [clip]

* reducein (array, indices, axis=0)
 similar to reduce-at, but the indices provide both the
 start and end points (rather than being fence-posts like reduceat).
  
 Is the `reducein` important to have, as compared to `reduceat`?

 Yes, I think so. If there are some areas you want to ignore, that's
 difficult to do with reduceat().
  
 And conversely overlapping areas are highly useful but completely
 impossible to do with reduceat.



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-13 Thread Travis Oliphant


On Apr 12, 2010, at 5:31 PM, Robert Kern wrote:


We should collect all of these proposals into a NEP.  To  
clarify what I

mean by group-by behavior.
Suppose I have an array of floats and an array of integers.   Each  
element
in the array of integers represents a region in the float array of  
a certain

kind.   The reduction should take place over like-kind values:
Example:
add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
results in the calculations:
1 + 3 + 6 + 7
2 + 4
5 + 8 + 9
and therefore the output (notice the two arrays --- perhaps a  
structured

array should be returned instead...)
[0,1,2],
[17, 6, 22]

The real value is when you have tabular data and you want to do  
reductions
in one field based on values in another field.   This happens all  
the time
in relational algebra and would be a relatively straightforward  
thing to

support in ufuncs.


I might suggest a simplification where the by array must be an array
of non-negative ints such that they are indices into the output. For
example (note that I replace 2 with 3 and have no 2s in the by array):

add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
[17, 6, 0, 22]

This basically generalizes bincount() to other binary ufuncs.



Interesting proposal.   I do like the having only one output.

I'm particularly interested in reductions with by arrays of  
strings.  i.e.  something like:


add.reduceby([10,11,12,13,14,15,16],  
by=['red','green','red','green','red','blue', 'blue']).


resulting in:

10+12+14
11+13
15+16

In practice, these would have to be essentially mapped to the kind of  
integer array I used in the original example, and so I suppose if we  
couple your proposal with the segment function from the rest of my  
original proposal, then the same resulting functionality is available  
(with perhaps the extra intermediate integer array that may not be  
strictly necessary).


But, having simple building blocks is usually better in the long run  
(and typically leads to better optimizations by human programmers).


Thanks,

-Travis


--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliph...@enthought.com





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-13 Thread Travis Oliphant


On Apr 12, 2010, at 5:54 PM, Warren Weckesser wrote:


A bit more generalization of `by` gives behavior like matlab's  
accumarray
(http://www.mathworks.com/access/helpdesk/help/techdoc/ref/accumarray.html 
),

which I partly cloned here:
[This would be a link to the scipy cookbook, but scipy.org is not
responding.]


Reading the accumarray docstring, it does seem related, but they use  
the subs array as an index into the original array (instead of an  
index into the output array like Robert's simplification).I do  
like the Matlab functionality, but would propose a different reduction  
function:  reduceover  to implement it.


It also feels like we should figure out different kinds of reductions  
for generalized ufuncs as well.If anyone has a primer on  
generalized ufuncs, I would love to see it.   Isn't a reduction on a  
generalized ufunc just another generalized ufunc?   Perhaps we could  
automatically create these reduced generalized ufuncs


I would love to explore just how general these generalized ufuncs are  
and what can be subsumed by them.



-Travis



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-13 Thread josef . pktd
On Tue, Apr 13, 2010 at 10:03 AM, Travis Oliphant
oliph...@enthought.com wrote:

 On Apr 12, 2010, at 5:31 PM, Robert Kern wrote:

 We should collect all of these proposals into a NEP.      To clarify what I

 mean by group-by behavior.

 Suppose I have an array of floats and an array of integers.   Each element

 in the array of integers represents a region in the float array of a certain

 kind.   The reduction should take place over like-kind values:

 Example:

 add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])

 results in the calculations:

 1 + 3 + 6 + 7

 2 + 4

 5 + 8 + 9

 and therefore the output (notice the two arrays --- perhaps a structured

 array should be returned instead...)

 [0,1,2],

 [17, 6, 22]

 The real value is when you have tabular data and you want to do reductions

 in one field based on values in another field.   This happens all the time

 in relational algebra and would be a relatively straightforward thing to

 support in ufuncs.

 I might suggest a simplification where the by array must be an array
 of non-negative ints such that they are indices into the output. For
 example (note that I replace 2 with 3 and have no 2s in the by array):

 add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
 [17, 6, 0, 22]

 This basically generalizes bincount() to other binary ufuncs.


 Interesting proposal.   I do like the having only one output.
 I'm particularly interested in reductions with by arrays of strings.  i.e.
  something like:
 add.reduceby([10,11,12,13,14,15,16],
 by=['red','green','red','green','red','blue', 'blue']).
 resulting in:
 10+12+14
 11+13
 15+16
 In practice, these would have to be essentially mapped to the kind of
 integer array I used in the original example, and so I suppose if we couple
 your proposal with the segment function from the rest of my original
 proposal, then the same resulting functionality is available (with perhaps
 the extra intermediate integer array that may not be strictly necessary).
 But, having simple building blocks is usually better in the long run (and
 typically leads to better optimizations by human programmers).

Currently I'm using unique return_inverse to do the recoding into integers

 np.unique(['red','green','red','green','red','blue', 
 'blue'],return_inverse=True)
(array(['blue', 'green', 'red'],
  dtype='|S5'), array([2, 1, 2, 1, 2, 0, 0]))

and then feed into bincount.

Your plans are a good generalization and speedup.

Josef


 Thanks,
 -Travis

 --
 Travis Oliphant
 Enthought Inc.
 1-512-536-1057
 http://www.enthought.com
 oliph...@enthought.com





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-13 Thread Robert Kern
On Sat, Apr 10, 2010 at 17:59, Robert Kern robert.k...@gmail.com wrote:
 On Sat, Apr 10, 2010 at 12:45, Pauli Virtanen p...@iki.fi wrote:
 la, 2010-04-10 kello 12:23 -0500, Travis Oliphant kirjoitti:
 [clip]
 Here are my suggested additions to NumPy:
 ufunc methods:
 [clip]
       * reducein (array, indices, axis=0)
                similar to reduce-at, but the indices provide both the
 start and end points (rather than being fence-posts like reduceat).

 Is the `reducein` important to have, as compared to `reduceat`?

 Yes, I think so. If there are some areas you want to ignore, that's
 difficult to do with reduceat().

And conversely overlapping areas are highly useful but completely
impossible to do with reduceat.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-12 Thread Travis Oliphant


On Apr 11, 2010, at 2:56 PM, Anne Archibald wrote:


2010/4/10 Stéfan van der Walt ste...@sun.ac.za:

On 10 April 2010 19:45, Pauli Virtanen p...@iki.fi wrote:
Another addition to ufuncs that should be though about is  
specifying the

Python-side interface to generalized ufuncs.


This is an interesting idea; what do you have in mind?


I can see two different kinds of answer to this question: one is a
tool like vectorize/frompyfunc that allows construction of generalized
ufuncs from python functions, and the other is thinking out what
methods and support functions generalized ufuncs need.

The former would be very handy for prototyping gufunc-based libraries
before delving into the templated C required to make them actually
efficient.

The latter is more essential in the long run: it'd be nice to have a
reduce-like function, but obviously only when the arity and dimensions
work out right (which I think means (shape1,shape2)-(shape2) ). This
could be applied along an axis or over a whole array. reduceat and the
other, more sophisticated, schemes might also be worth supporting. At
a more elementary level, gufunc objects should have good introspection
- docstrings, shape specification accessible from python, named formal
arguments, et cetera. (So should ufuncs, for that matter.)


We should collect all of these proposals into a NEP.  To clarify  
what I mean by group-by behavior.


Suppose I have an array of floats and an array of integers.   Each  
element in the array of integers represents a region in the float  
array of a certain kind.   The reduction should take place over like- 
kind values:


Example:

add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])

results in the calculations:

1 + 3 + 6 + 7
2 + 4
5 + 8 + 9

and therefore the output (notice the two arrays --- perhaps a  
structured array should be returned instead...)


[0,1,2],
[17, 6, 22]


The real value is when you have tabular data and you want to do  
reductions in one field based on values in another field.   This  
happens all the time in relational algebra and would be a relatively  
straightforward thing to support in ufuncs.


-Travis






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-12 Thread Robert Kern
On Mon, Apr 12, 2010 at 17:26, Travis Oliphant oliph...@enthought.com wrote:

 On Apr 11, 2010, at 2:56 PM, Anne Archibald wrote:

 2010/4/10 Stéfan van der Walt ste...@sun.ac.za:

 On 10 April 2010 19:45, Pauli Virtanen p...@iki.fi wrote:

 Another addition to ufuncs that should be though about is specifying the

 Python-side interface to generalized ufuncs.

 This is an interesting idea; what do you have in mind?

 I can see two different kinds of answer to this question: one is a
 tool like vectorize/frompyfunc that allows construction of generalized
 ufuncs from python functions, and the other is thinking out what
 methods and support functions generalized ufuncs need.

 The former would be very handy for prototyping gufunc-based libraries
 before delving into the templated C required to make them actually
 efficient.

 The latter is more essential in the long run: it'd be nice to have a
 reduce-like function, but obviously only when the arity and dimensions
 work out right (which I think means (shape1,shape2)-(shape2) ). This
 could be applied along an axis or over a whole array. reduceat and the
 other, more sophisticated, schemes might also be worth supporting. At
 a more elementary level, gufunc objects should have good introspection
 - docstrings, shape specification accessible from python, named formal
 arguments, et cetera. (So should ufuncs, for that matter.)

 We should collect all of these proposals into a NEP.      To clarify what I
 mean by group-by behavior.
 Suppose I have an array of floats and an array of integers.   Each element
 in the array of integers represents a region in the float array of a certain
 kind.   The reduction should take place over like-kind values:
 Example:
 add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
 results in the calculations:
 1 + 3 + 6 + 7
 2 + 4
 5 + 8 + 9
 and therefore the output (notice the two arrays --- perhaps a structured
 array should be returned instead...)
 [0,1,2],
 [17, 6, 22]

 The real value is when you have tabular data and you want to do reductions
 in one field based on values in another field.   This happens all the time
 in relational algebra and would be a relatively straightforward thing to
 support in ufuncs.

I might suggest a simplification where the by array must be an array
of non-negative ints such that they are indices into the output. For
example (note that I replace 2 with 3 and have no 2s in the by array):

add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
[17, 6, 0, 22]

This basically generalizes bincount() to other binary ufuncs.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-12 Thread Anne Archibald
On 12 April 2010 18:26, Travis Oliphant oliph...@enthought.com wrote:

 We should collect all of these proposals into a NEP.

Or several NEPs, since I think they are quasi-orthogonal.

 To clarify what I
 mean by group-by behavior.
 Suppose I have an array of floats and an array of integers.   Each element
 in the array of integers represents a region in the float array of a certain
 kind.   The reduction should take place over like-kind values:
 Example:
 add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
 results in the calculations:
 1 + 3 + 6 + 7
 2 + 4
 5 + 8 + 9
 and therefore the output (notice the two arrays --- perhaps a structured
 array should be returned instead...)
 [0,1,2],
 [17, 6, 22]

 The real value is when you have tabular data and you want to do reductions
 in one field based on values in another field.   This happens all the time
 in relational algebra and would be a relatively straightforward thing to
 support in ufuncs.

As an example, if I understand correctly, this would allow the
histogram functions to be replaced by a one-liner, e.g.:

add.reduceby(array=1, by=((A-min)*n/(max-min)).astype(int))

It would also be valuable to support output arguments of some sort, so
that, for example, reduceby could be used to accumulate values into an
output array at supplied indices. (I leave the value of using this on
matrix multiplication or arctan2 to the reader's imagination.)

Anne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-12 Thread Warren Weckesser
Robert Kern wrote:
 On Mon, Apr 12, 2010 at 17:26, Travis Oliphant oliph...@enthought.com wrote:
   
 On Apr 11, 2010, at 2:56 PM, Anne Archibald wrote:

 2010/4/10 Stéfan van der Walt ste...@sun.ac.za:

 On 10 April 2010 19:45, Pauli Virtanen p...@iki.fi wrote:

 Another addition to ufuncs that should be though about is specifying the

 Python-side interface to generalized ufuncs.

 This is an interesting idea; what do you have in mind?

 I can see two different kinds of answer to this question: one is a
 tool like vectorize/frompyfunc that allows construction of generalized
 ufuncs from python functions, and the other is thinking out what
 methods and support functions generalized ufuncs need.

 The former would be very handy for prototyping gufunc-based libraries
 before delving into the templated C required to make them actually
 efficient.

 The latter is more essential in the long run: it'd be nice to have a
 reduce-like function, but obviously only when the arity and dimensions
 work out right (which I think means (shape1,shape2)-(shape2) ). This
 could be applied along an axis or over a whole array. reduceat and the
 other, more sophisticated, schemes might also be worth supporting. At
 a more elementary level, gufunc objects should have good introspection
 - docstrings, shape specification accessible from python, named formal
 arguments, et cetera. (So should ufuncs, for that matter.)

 We should collect all of these proposals into a NEP.  To clarify what I
 mean by group-by behavior.
 Suppose I have an array of floats and an array of integers.   Each element
 in the array of integers represents a region in the float array of a certain
 kind.   The reduction should take place over like-kind values:
 Example:
 add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
 results in the calculations:
 1 + 3 + 6 + 7
 2 + 4
 5 + 8 + 9
 and therefore the output (notice the two arrays --- perhaps a structured
 array should be returned instead...)
 [0,1,2],
 [17, 6, 22]

 The real value is when you have tabular data and you want to do reductions
 in one field based on values in another field.   This happens all the time
 in relational algebra and would be a relatively straightforward thing to
 support in ufuncs.
 

 I might suggest a simplification where the by array must be an array
 of non-negative ints such that they are indices into the output. For
 example (note that I replace 2 with 3 and have no 2s in the by array):

 add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
 [17, 6, 0, 22]

 This basically generalizes bincount() to other binary ufuncs.

   


A bit more generalization of `by` gives behavior like matlab's accumarray
(http://www.mathworks.com/access/helpdesk/help/techdoc/ref/accumarray.html),
which I partly cloned here:
[This would be a link to the scipy cookbook, but scipy.org is not 
responding.]

Warren

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-12 Thread Robert Kern
On Mon, Apr 12, 2010 at 17:54, Warren Weckesser
warren.weckes...@enthought.com wrote:

 A bit more generalization of `by` gives behavior like matlab's accumarray
 (http://www.mathworks.com/access/helpdesk/help/techdoc/ref/accumarray.html),
 which I partly cloned here:
 [This would be a link to the scipy cookbook, but scipy.org is not
 responding.]

I've bounced the server. Try again.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-11 Thread Anne Archibald
2010/4/10 Stéfan van der Walt ste...@sun.ac.za:
 On 10 April 2010 19:45, Pauli Virtanen p...@iki.fi wrote:
 Another addition to ufuncs that should be though about is specifying the
 Python-side interface to generalized ufuncs.

 This is an interesting idea; what do you have in mind?

I can see two different kinds of answer to this question: one is a
tool like vectorize/frompyfunc that allows construction of generalized
ufuncs from python functions, and the other is thinking out what
methods and support functions generalized ufuncs need.

The former would be very handy for prototyping gufunc-based libraries
before delving into the templated C required to make them actually
efficient.

The latter is more essential in the long run: it'd be nice to have a
reduce-like function, but obviously only when the arity and dimensions
work out right (which I think means (shape1,shape2)-(shape2) ). This
could be applied along an axis or over a whole array. reduceat and the
other, more sophisticated, schemes might also be worth supporting. At
a more elementary level, gufunc objects should have good introspection
- docstrings, shape specification accessible from python, named formal
arguments, et cetera. (So should ufuncs, for that matter.)

Anne


 Regards
 Stéfan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Proposal for new ufunc functionality

2010-04-10 Thread Travis Oliphant

Hi,

I've been mulling over a couple of ideas for new ufunc methods plus a  
couple of numpy functions that I think will help implement group-by  
operations with NumPy arrays.

I wanted to discuss them on this list before putting forward an actual  
proposal or patch to get input from others.

The group-by operation is very common in relational algebra and NumPy  
arrays (especially structured arrays) can often be seen as a database  
table.There are common and easy-to implement approaches for select  
and other relational algebra concepts, but group-by basically has to  
be implemented yourself.

Here are my suggested additions to NumPy:

ufunc methods:
* reduceby (array, by, sorted=1, axis=0)

  array is the array to reduce
 by is the array to provide the grouping (can be a structured  
array or a list of arrays)

  if sorted is 1, then possibly a faster algorithm can be  
used.

* reducein (array, indices, axis=0)

   similar to reduce-at, but the indices provide both the  
start and end points (rather than being fence-posts like reduceat).

numpy functions (or methods):

 * segment(array)

   (produce an array of integers from an array producing the  
different regions of an array:

segment([10,20,10,20,30,30,10])  would produce ([0,1,0,1,2,2,0])


 * edges(array, at=True)

   produce an index array providing the edges (with either fence-post  
like syntax for reduce-at or both boundaries like reducein.


Thoughts?

-Travis






Thoughts on the general idea?


--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliph...@enthought.com





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-10 Thread Pauli Virtanen
la, 2010-04-10 kello 12:23 -0500, Travis Oliphant kirjoitti:
[clip]
 Here are my suggested additions to NumPy:
 ufunc methods:
[clip]
   * reducein (array, indices, axis=0)
similar to reduce-at, but the indices provide both the  
 start and end points (rather than being fence-posts like reduceat).

Is the `reducein` important to have, as compared to `reduceat`?

[clip]
 numpy functions (or methods):

I'd prefer functions here. ndarray already has a huge number of methods.

* segment(array)
 
  (produce an array of integers from an array producing the  
 different regions of an array:
 
   segment([10,20,10,20,30,30,10])  would produce ([0,1,0,1,2,2,0])

Sounds like `np.digitize(x, bins=np.unique(x))-1`. What would the
behavior be with structured arrays?

* edges(array, at=True)
   
  produce an index array providing the edges (with either fence-post  
 like syntax for reduce-at or both boundaries like reducein.

This can probably easily be based on segment().

 Thoughts on the general idea?

One question is whether these methods should be stuffed to the main
namespace, or under numpy.rec.


Another addition to ufuncs that should be though about is specifying the
Python-side interface to generalized ufuncs.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-10 Thread josef . pktd
On Sat, Apr 10, 2010 at 1:23 PM, Travis Oliphant oliph...@enthought.com wrote:

 Hi,

 I've been mulling over a couple of ideas for new ufunc methods plus a
 couple of numpy functions that I think will help implement group-by
 operations with NumPy arrays.

 I wanted to discuss them on this list before putting forward an actual
 proposal or patch to get input from others.

 The group-by operation is very common in relational algebra and NumPy
 arrays (especially structured arrays) can often be seen as a database
 table.    There are common and easy-to implement approaches for select
 and other relational algebra concepts, but group-by basically has to
 be implemented yourself.

 Here are my suggested additions to NumPy:

 ufunc methods:
        * reduceby (array, by, sorted=1, axis=0)

              array is the array to reduce
             by is the array to provide the grouping (can be a structured
 array or a list of arrays)

              if sorted is 1, then possibly a faster algorithm can be
 used.

how is the grouping in by specified?

These functions would be very useful for statistics. One problem with
the current bincount is that it doesn't allow multi-dimensional weight
arrays (with axis argument).

Josef

        * reducein (array, indices, axis=0)

               similar to reduce-at, but the indices provide both the
 start and end points (rather than being fence-posts like reduceat).

 numpy functions (or methods):

         * segment(array)

           (produce an array of integers from an array producing the
 different regions of an array:

            segment([10,20,10,20,30,30,10])  would produce ([0,1,0,1,2,2,0])


         * edges(array, at=True)

           produce an index array providing the edges (with either fence-post
 like syntax for reduce-at or both boundaries like reducein.


 Thoughts?

 -Travis






 Thoughts on the general idea?


 --
 Travis Oliphant
 Enthought Inc.
 1-512-536-1057
 http://www.enthought.com
 oliph...@enthought.com





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-10 Thread Stéfan van der Walt
On 10 April 2010 19:45, Pauli Virtanen p...@iki.fi wrote:
 Another addition to ufuncs that should be though about is specifying the
 Python-side interface to generalized ufuncs.

This is an interesting idea; what do you have in mind?

Regards
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-10 Thread Robert Kern
On Sat, Apr 10, 2010 at 12:45, Pauli Virtanen p...@iki.fi wrote:
 la, 2010-04-10 kello 12:23 -0500, Travis Oliphant kirjoitti:
 [clip]
 Here are my suggested additions to NumPy:
 ufunc methods:
 [clip]
       * reducein (array, indices, axis=0)
                similar to reduce-at, but the indices provide both the
 start and end points (rather than being fence-posts like reduceat).

 Is the `reducein` important to have, as compared to `reduceat`?

Yes, I think so. If there are some areas you want to ignore, that's
difficult to do with reduceat().

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion