If your non key columns are very repetitive, you might find symbols worthwhile:

({. (, <) [: s: }.)"1 t
┌──┬─────────┐ 
│c1│`p1 `0.25│ 
├──┼─────────┤ 
│c1│`p2 `0.35│ 
├──┼─────────┤ 
│c2│`p1 `0.25│ 
├──┼─────────┤ 
│c2│`p2 `0.35│ 
├──┼─────────┤ 
│c3│`p1 `0.25│ 
├──┼─────────┤ 
│c3│`p2 `0.35│ 
├──┼─────────┤ 
│c3│`p3 `0.45│ 
└──┴─────────┘

this alternative makes the non key columns into one symbol which should make 
comparisons faster for just your requirement.

   ({. (, <)  [: s: <@:(1&{::  , ' ' , [: ": 2&{::))"1 t 
┌──┬────────┐ 
│c1│`p1 0.25│ 
├──┼────────┤ 
│c1│`p2 0.35│ 
├──┼────────┤ 
│c2│`p1 0.25│ 
├──┼────────┤ 
│c2│`p2 0.35│ 
├──┼────────┤ 
│c3│`p1 0.25│ 
├──┼────────┤ 
│c3│`p2 0.35│ 
├──┼────────┤ 
│c3│`p3 0.45│ 
└──┴────────┘ 

sticking the bossman's function ahead of it

 (([: ~. {."1) </.~ [: (i.~ ~.)({."1 </.}."1)) symboled=. ( ({. (, <)  [: s: 
<@:(1&{::  , ' ' , [: ": 2&{::))) "1 t 
┌───────┬────┐ 
│┌──┬──┐│┌──┐│ 
││c1│c2│││c3││ 
│└──┴──┘│└──┘│ 
└───────┴────┘


----- Original Message -----
From: R.E. Boss <[email protected]>
To: [email protected]
Cc: 
Sent: Saturday, July 26, 2014 6:40:45 AM
Subject: Re: [Jprogramming] finding matching sets


   (~.{."1 t)</.~ (i.~ ~.)({."1 </.}."1) t
+-------+----+
|+--+--+|+--+|
||c1|c2|||c3||
|+--+--+|+--+|
+-------+----+
  
If that costs too much memory, try 

   [T=:(i.~ ~.)"1 &.|:t
0 0 0
0 1 1
1 0 0
1 1 1
2 0 0
2 1 1
2 2 2

   (~.{."1 t)</.~ (i.~ ~.)({."1 </.}."1) T
+-------+----+
|+--+--+|+--+|
||c1|c2|||c3||
|+--+--+|+--+|
+-------+----+
  

R.E. Boss

(Add your info to http://www.jsoftware.com/jwiki/Community/Demographics )



> -----Original Message-----
> From: [email protected] [mailto:programming-
> [email protected]] On Behalf Of Joe Bogner
> Sent: vrijdag 25 juli 2014 13:08
> To: [email protected]
> Subject: [Jprogramming] finding matching sets
> 
> Given the following data:
> 
> t =: ;: ;._2 (0 : 0)
> c1 p1 0.25
> c1 p2 0.35
> c2 p1 0.25
> c2 p2 0.35
> c3 p1 0.25
> c3 p2 0.35
> c3 p3 0.45
> )
> 
> 
> c1 has two rows (p1 0.25) and (p2 0.35)
> c2 has two rows (p1 0.25) and (p2 0.35)
> c3 has three rows (p1 025), (p2 0.35), (p3 0.45)
> 
> How can I identify that c1 and c2 have the same set of values and that c3
> is different?
> 
> I'd like to run the algorithm on a 1.6M row table
> 
> I created a prototype in javascript using a rough approach, but I haven't
> translated it to J in case there is a better way:
> 
> 1. Sort array by column 2 (product)
> 2. Loop through the array and create a hash table of the concatenated
> product/value pair (e.g: p2 0.35)  for each customer
> 3. Loop through the hash table and create a list of customers for each
> unique string of product/value pairs
> 
> var t = function(){/*
> c1 p1 0.25
> c1 p2 0.35
> c2 p1 0.25
> c2 p2 0.35
> c3 p1 0.25
> c3 p2 0.35
> c3 p3 0.45
> */}.toString().slice(15,-4).split('\n').map(function(x) { return x.split('
> ') })
> t = t.sort(function(x,y) { return x[1]>y[1] })
> 
> var cs = t.reduce(function(memo,val) { memo[val[0]] =
> (memo[val[0]]||'')+val[1]+val[2]; return memo;}, {});
> 
> //JSON.stringify(cs)
> //"{"c1":"p10.25p20.35","c2":"p10.25p20.35","c3":"p10.25p20.35p30.45"}"
> 
> var matches = Object.keys(cs).reduce(function(memo,val) { var key =
> memo[cs[val]] = (memo[cs[val]] || []); key.push(val);  return memo;}, {})
> 
> JSON.stringify(matches)
> 
> "{"p10.25p20.35":["c1","c2"],"p10.25p20.35p30.45":["c3"]}"
> 
> How should this problem be approached in J?
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm



----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to