Thanks Thomas. I started working from your original answer and came up with:

groups=:[: (}."1 each </. ])  ({."1 </. ])
ids=: ; L:2 @: ([: (~.L:1) 0{"1 L:1 ])

ids groups t
+-------+----+----+
|+--+--+|+--+|+--+|
||c1|c2|||c3|||c4||
|+--+--+|+--+|+--+|
+-------+----+----+

Yours seems cleaner to me:

key=. (}."1</.{."1) t
m=:(<"1 t) ,.~ (;key) ,. (<"1|:e.key)
~. each 0{"1 each (1&{"1 </. ]) m
+-------+----+----+
|+--+--+|+--+|+--+|
||c1|c2|||c3|||c4||
|+--+--+|+--+|+--+|
+-------+----+----+


And yours provides easier access to the intermediate values.

However, yours gives an out of memory error with on my 4 gig laptop with
j64... I only have about 2 gig free of RAM.

My sloppier version did complete:

timespacex 'ids groups t2'
13.7034 2.5414e8

$ t2
1609875 3

$ ids groups t2
3639



Thank you again







On Fri, Jul 25, 2014 at 7:46 AM, Thomas Costigliola <[email protected]>
wrote:

> Sorry, I hit send too early. My complete answer is:
>
>  t=. ;: ;._2 ( 0 : 0 )
>
> c1 p1 0.25
>
> c1 p2 0.35
>
> c2 p1 0.25
>
> c2 p2 0.35
>
> c3 p1 0.25
>
> c3 p2 0.35
>
> c3 p3 0.45
>
> c4 p1 0.25
>
> )
>
>
> ]key=. (}."1</.{."1) t
>
> (<"1 t) ,.~ (;key) ,. (<"1|:e.key)
>
>
> Rows with the same signature i.e., 'raise in' (2nd column) share the sam
> values. You'll have to try it on the 1.6m rows to test its speed. I'm
> curious.
>
>
>
>
>
> On Fri, Jul 25, 2014 at 7:19 AM, Thomas Costigliola <[email protected]>
> wrote:
>
> > You could key the first column on the values.
> >
> > (}."1 </. {."1) t
> >  On Jul 25, 2014 7:07 AM, "Joe Bogner" <[email protected]> wrote:
> >
> >> Given the following data:
> >>
> >> t =: ;: ;._2 (0 : 0)
> >> c1 p1 0.25
> >> c1 p2 0.35
> >> c2 p1 0.25
> >> c2 p2 0.35
> >> c3 p1 0.25
> >> c3 p2 0.35
> >> c3 p3 0.45
> >> )
> >>
> >>
> >> c1 has two rows (p1 0.25) and (p2 0.35)
> >> c2 has two rows (p1 0.25) and (p2 0.35)
> >> c3 has three rows (p1 025), (p2 0.35), (p3 0.45)
> >>
> >> How can I identify that c1 and c2 have the same set of values and that
> c3
> >> is different?
> >>
> >> I'd like to run the algorithm on a 1.6M row table
> >>
> >> I created a prototype in javascript using a rough approach, but I
> haven't
> >> translated it to J in case there is a better way:
> >>
> >> 1. Sort array by column 2 (product)
> >> 2. Loop through the array and create a hash table of the concatenated
> >> product/value pair (e.g: p2 0.35)  for each customer
> >> 3. Loop through the hash table and create a list of customers for each
> >> unique string of product/value pairs
> >>
> >> var t = function(){/*
> >> c1 p1 0.25
> >> c1 p2 0.35
> >> c2 p1 0.25
> >> c2 p2 0.35
> >> c3 p1 0.25
> >> c3 p2 0.35
> >> c3 p3 0.45
> >> */}.toString().slice(15,-4).split('\n').map(function(x) { return
> x.split('
> >> ') })
> >> t = t.sort(function(x,y) { return x[1]>y[1] })
> >>
> >> var cs = t.reduce(function(memo,val) { memo[val[0]] =
> >> (memo[val[0]]||'')+val[1]+val[2]; return memo;}, {});
> >>
> >> //JSON.stringify(cs)
> >> //"{"c1":"p10.25p20.35","c2":"p10.25p20.35","c3":"p10.25p20.35p30.45"}"
> >>
> >> var matches = Object.keys(cs).reduce(function(memo,val) { var key =
> >> memo[cs[val]] = (memo[cs[val]] || []); key.push(val);  return memo;},
> {})
> >>
> >> JSON.stringify(matches)
> >>
> >> "{"p10.25p20.35":["c1","c2"],"p10.25p20.35p30.45":["c3"]}"
> >>
> >> How should this problem be approached in J?
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to