Given the following data:
t =: ;: ;._2 (0 : 0)
c1 p1 0.25
c1 p2 0.35
c2 p1 0.25
c2 p2 0.35
c3 p1 0.25
c3 p2 0.35
c3 p3 0.45
)
c1 has two rows (p1 0.25) and (p2 0.35)
c2 has two rows (p1 0.25) and (p2 0.35)
c3 has three rows (p1 025), (p2 0.35), (p3 0.45)
How can I identify that c1 and c2 have the same set of values and that c3
is different?
I'd like to run the algorithm on a 1.6M row table
I created a prototype in javascript using a rough approach, but I haven't
translated it to J in case there is a better way:
1. Sort array by column 2 (product)
2. Loop through the array and create a hash table of the concatenated
product/value pair (e.g: p2 0.35) for each customer
3. Loop through the hash table and create a list of customers for each
unique string of product/value pairs
var t = function(){/*
c1 p1 0.25
c1 p2 0.35
c2 p1 0.25
c2 p2 0.35
c3 p1 0.25
c3 p2 0.35
c3 p3 0.45
*/}.toString().slice(15,-4).split('\n').map(function(x) { return x.split('
') })
t = t.sort(function(x,y) { return x[1]>y[1] })
var cs = t.reduce(function(memo,val) { memo[val[0]] =
(memo[val[0]]||'')+val[1]+val[2]; return memo;}, {});
//JSON.stringify(cs)
//"{"c1":"p10.25p20.35","c2":"p10.25p20.35","c3":"p10.25p20.35p30.45"}"
var matches = Object.keys(cs).reduce(function(memo,val) { var key =
memo[cs[val]] = (memo[cs[val]] || []); key.push(val); return memo;}, {})
JSON.stringify(matches)
"{"p10.25p20.35":["c1","c2"],"p10.25p20.35p30.45":["c3"]}"
How should this problem be approached in J?
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm