Mike, something further to Devon's email...
You do not need (i.~) in your expression...
> ... (i.~ xy) sum/. z
Becomes ... xy sum/. z NB. Same result
See Vocabulary page on oblique (u/.) ... eg
1 2 3 1 3 2 1 </. 'abcdefg'
+---+--+--+
|adg|bf|ce|
+---+--+--+
(i.~ 1 2 3 1 3 2 1) </. 'abcdefg'
+---+--+--+
|adg|bf|ce|
+---+--+--+
But this has no noticeable impact on overall performance.
Rob Hodgkinson
On 8/02/08 10:39 PM, "Mike Thompson" <[EMAIL PROTECTED]> wrote:
>
> I'm experimenting with 'groupby-like' operations across columns of an
> 'inverted table'.
>
> NB. Inverted table has three columns: x, y, z
> x =: 1 1 2 2 3 3 4 4
> y =: 1 1 1 2 2 2 3 3
> z =: 1 2 3 4 5 6 7 8
>
> sum =: +/
>
> xy =: x ,. y NB. I want to groupby x and y
> (('x' , ' ', 'y'); 'sum z') ,: (~. xy) ; ,. (i.~ xy) sum/. z NB.
> Sum z for distinct x, y pairs
>
> Which yields this table:
>
> ----T-----┐
> │x y│sum z│
> +---+-----+
> │1 1│ 3 │
> │2 1│ 3 │
> │2 2│ 4 │
> │3 2│11 │
> │4 3│15 │
> L---+------
>
> Thrilled that I can at least produce right answers, I now want to improve
> the performance.
>
> So, I've been experiementing with:
>
> x =: ? 10000000 $ 999
> y =: ? 10000000 $ 999
> z =: ? 10000000 $ 999
>
> Cutting away the formatting fluff from the table-forming expression above,
> the CPU consuming core is this:
>
> (~. xy) ; (i.~ xy) sum/. z
>
> Any suggestions on how to do this more efficiently (faster)? For a start,
> I feel as if my
> approach must be calculating the nub of xy twice. Also, perhaps xy, as I've
> created it, is a poor choice
> of structure to work with (I found the nub of x was massively faster to
> calculate than the nub of xy):
>
> Ts '~. x'
> 0.21264 6784
>
> Ts '~. xy'
> 9.13063 2.68436e8
>
>
> Note: to simplify the explanation above, I used 'sum' but actually I want
> to 'collect' partitions of z:
>
> collect =: <@,
>
> (i.~ xy) collect/. z
>
> ----T-T-T---T---┐
> │1 2│3│4│5 6│7 8│
> L---+-+-+---+----
>
> Finally, I'm keen to have a generalised form of this groupby available to
> me. Ie. Group by an arbitrary number of columns, not just two.
>
> Many thanks for any insights,
> Mike
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm