Re: [Jprogramming] Neophyte Performance Question - GROUPBY

Rob Hodgkinson Fri, 08 Feb 2008 12:44:47 -0800

Mike, something further to Devon's email...

You do not need (i.~) in your expression...
>    ...  (i.~ xy) sum/. z
Becomes ...     xy sum/. z       NB. Same result


See Vocabulary page on oblique (u/.) ... eg

   1 2 3 1 3 2 1 </. 'abcdefg'
+---+--+--+
|adg|bf|ce|
+---+--+--+
   (i.~ 1 2 3 1 3 2 1) </. 'abcdefg'
+---+--+--+
|adg|bf|ce|
+---+--+--+

But this has no noticeable impact on overall performance.

Rob Hodgkinson


On 8/02/08 10:39 PM, "Mike Thompson" <[EMAIL PROTECTED]> wrote:

> 
> I'm experimenting with 'groupby-like' operations across columns of an
> 'inverted table'.
> 
>    NB.  Inverted table has three columns: x,  y,  z
>    x =: 1 1 2 2 3 3 4 4
>    y =: 1 1 1 2 2 2 3 3
>    z =: 1 2 3 4 5 6 7 8
> 
>    sum =: +/
> 
>    xy =:  x ,. y      NB.  I want to groupby x and y
>    (('x' , ' ', 'y'); 'sum z') ,: (~. xy) ; ,.  (i.~ xy) sum/. z     NB.
> Sum z for distinct x, y pairs
> 
> Which yields this table:
> 
> ----T-----┐
> │x y│sum z│
> +---+-----+
> │1 1│ 3   │
> │2 1│ 3   │
> │2 2│ 4   │
> │3 2│11   │
> │4 3│15   │
> L---+------
> 
> Thrilled that I can at least produce right answers, I now want to improve
> the performance.
> 
> So, I've been experiementing with:
> 
>     x =: ? 10000000 $ 999
>     y =: ? 10000000 $ 999
>     z =: ? 10000000 $ 999
> 
> Cutting away the formatting fluff from the table-forming expression above,
> the CPU consuming core is this:
>      
>       (~. xy) ;  (i.~ xy) sum/. z
> 
> Any suggestions on how to do this more efficiently (faster)?   For a start,
> I feel as if my 
> approach must be calculating the nub of xy twice.  Also, perhaps xy, as I've
> created it, is a poor choice
> of structure to work with (I found the nub of x was massively faster to
> calculate than the nub of xy):
> 
>      Ts '~. x'
> 0.21264 6784
> 
>    Ts '~. xy' 
> 9.13063 2.68436e8
> 
> 
> Note:  to simplify the explanation above, I used 'sum' but actually I want
> to 'collect' partitions of z:
> 
> collect =: <@,  
> 
> (i.~ xy) collect/. z
> 
> ----T-T-T---T---┐
> │1 2│3│4│5 6│7 8│
> L---+-+-+---+----
> 
> Finally, I'm keen to have a generalised form of this groupby available to
> me.  Ie. Group by an arbitrary number of columns, not just two.
> 
> Many thanks for any insights,
> Mike
> 
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Neophyte Performance Question - GROUPBY

Reply via email to