Given the following data:

t =: ;: ;._2 (0 : 0)
c1 p1 0.25
c1 p2 0.35
c2 p1 0.25
c2 p2 0.35
c3 p1 0.25
c3 p2 0.35
c3 p3 0.45
)


c1 has two rows (p1 0.25) and (p2 0.35)
c2 has two rows (p1 0.25) and (p2 0.35)
c3 has three rows (p1 025), (p2 0.35), (p3 0.45)

How can I identify that c1 and c2 have the same set of values and that c3
is different?

I'd like to run the algorithm on a 1.6M row table

I created a prototype in javascript using a rough approach, but I haven't
translated it to J in case there is a better way:

1. Sort array by column 2 (product)
2. Loop through the array and create a hash table of the concatenated
product/value pair (e.g: p2 0.35)  for each customer
3. Loop through the hash table and create a list of customers for each
unique string of product/value pairs

var t = function(){/*
c1 p1 0.25
c1 p2 0.35
c2 p1 0.25
c2 p2 0.35
c3 p1 0.25
c3 p2 0.35
c3 p3 0.45
*/}.toString().slice(15,-4).split('\n').map(function(x) { return x.split('
') })
t = t.sort(function(x,y) { return x[1]>y[1] })

var cs = t.reduce(function(memo,val) { memo[val[0]] =
(memo[val[0]]||'')+val[1]+val[2]; return memo;}, {});

//JSON.stringify(cs)
//"{"c1":"p10.25p20.35","c2":"p10.25p20.35","c3":"p10.25p20.35p30.45"}"

var matches = Object.keys(cs).reduce(function(memo,val) { var key =
memo[cs[val]] = (memo[cs[val]] || []); key.push(val);  return memo;}, {})

JSON.stringify(matches)

"{"p10.25p20.35":["c1","c2"],"p10.25p20.35p30.45":["c3"]}"

How should this problem be approached in J?
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to