Thanks for letting us know, Joe.
You might see a big improvement with unboxed symbols taken just before
comparisons as in:
(~.{."1 t)</.~ ( i.~ ~.) s:@:<@:, &> ({."1 </.([:;}.)"1) t
or
lr =: 3 : ' 5!:5 < ''y'''
(~.{."1 t)</.~ ( i.~ ~.) s: lr each ({."1 </.([:;}.)"1) t
(~.{."1 t)</.~ ( i.~ ~.) s: lr each ({."1 </.}."1) (i.~ ~.)"1 &.|: t
(guessing previous slightly faster)
----- Original Message -----
From: Joe Bogner <[email protected]>
To: [email protected]
Cc:
Sent: Sunday, July 27, 2014 9:18:31 AM
Subject: Re: [Jprogramming] finding matching sets
Thank you all for the suggestions and alternate implementations. I ran
benchmarks on them all and wanted to share:
$ t2
1763140 3
NB. Thomas's suggestion
groups1 =: 3 : 0
]key=. (}."1</.{."1) y
(<"1 t) ,.~ (;key) ,. (<"1|:e.key)
)
timespacex 'groups1 t2'
NB. out of memory
NB. Joe's implementation
groups2 =: 3 : 0
groups=. [: (}."1 each </. ]) ({."1 </. ])
ids=. ; L:2 @: ([: (~.L:1) 0{"1 L:1 ])
ids groups y
)
timespacex 'groups2 t2'
NB. 7.87302 2.63376e8
NB. R.E. Boss's 1st implementation
groups3a=: 3 : 0
(~. {."1 t2) </.~ (i.~ ~.) ({."1 </. }."1) y
)
timespacex 'groups3a t2'
NB. 6.53307 2.23953e8
NB. R.E. Boss's 2nd implementation
groups3b =: 3 : 0
T=.(i.~ ~.)"1 &.|:y
(~.{."1 y) </.~ (i.~ ~.)({."1 </.}."1) T
)
timespacex 'groups3b t2'
NB. 2.46257 2.33259e8
NB. Greg Hei's implementation
groups4 =: 3 : 0
(~.{."1 y) </. ~ (i.~ ~.) ({."1 </.([:;}.)"1) y
)
timespacex 'groups4 t2'
NB. 2.74571 3.24026e8
NB. Pascal Jasmin's symbol test
groups5 =: 3 : 0
symboled=. ( ({. (, <) [: s: <@:(1&{:: , ' ' , [: ": 2&{::))) "1 y
(([: ~. {."1) </.~ [: (i.~ ~.)({."1 </.}."1)) symboled
)
timespacex 'groups5 t2'
NB. 11.5519 5.73956e8
timespacex '(([: ~. {."1) </.~ [: (i.~ ~.)({."1 </.}."1)) symboled'
NB. 5.42007 1.38506e8
If I apply Greg's to the symboled:
timespacex '(~.{."1 symboled) </. ~ (i.~ ~.) ({."1 </.([:;}.)"1) symboled'
NB. 1.43217 1.09604e8
So it seems that the symbol comparisons are faster, but in this case
probably not worth the six second penalty to create
Of the 1.7M rows:
There are 2104 unique values in the first column
$ ~. 0{"1 t2
2104
And 7,277 in the 2nd two columns
$ ~. }."1 t2
7277 2
On Sun, Jul 27, 2014 at 12:58 AM, greg heil <[email protected]> wrote:
> Joe
>
> If the only thing that matters is the car and cdr of each entry why
> not just raze the cdr, eg
>
> (~.{."1 t)</.~ (i.~ ~.) ({."1 </.([:;}.)"1) t
>
> or just not box them separately in the first place;-)
>
> One could independently take symbols, and do that timing experiment.
>
> greg
> ~krsnadas.org
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm