Pascal,
Thank you for the additional examples. Here are some timings on the same data
timespacex '(~.{."1 t2)</.~ ( i.~ ~.) s:@:<@:, &> ({."1 </.([:;}.)"1) t2'
1.95001 3.24026e8
timespacex '(~.{."1 t2)</.~ ( i.~ ~.) s: lr each ({."1 </.}."1) (i.~
~.)"1 &.|: t2'
3.2968 2.33261e8
It looks like using unboxed symbols just before comparison is a bit
faster than the previous winner. For this data set, I don't think it's
worth the slight added complexity however it's a valuable tool to have
in the future. I keep thinking that symbols are the way to go and then
I become surprised by how fast comparisons are without having to use
them. Thanks again
@R.E. Boss - The plot was a good way to look at the differences. I
don't use plotting often enough. Thank you
On Sun, Jul 27, 2014 at 10:22 AM, 'Pascal Jasmin' via Programming
<[email protected]> wrote:
> Thanks for letting us know, Joe.
>
> You might see a big improvement with unboxed symbols taken just before
> comparisons as in:
>
> (~.{."1 t)</.~ ( i.~ ~.) s:@:<@:, &> ({."1 </.([:;}.)"1) t
>
> or
>
> lr =: 3 : ' 5!:5 < ''y'''
>
> (~.{."1 t)</.~ ( i.~ ~.) s: lr each ({."1 </.([:;}.)"1) t
> (~.{."1 t)</.~ ( i.~ ~.) s: lr each ({."1 </.}."1) (i.~ ~.)"1 &.|: t
> (guessing previous slightly faster)
>
>
> ----- Original Message -----
> From: Joe Bogner <[email protected]>
> To: [email protected]
> Cc:
> Sent: Sunday, July 27, 2014 9:18:31 AM
> Subject: Re: [Jprogramming] finding matching sets
>
> Thank you all for the suggestions and alternate implementations. I ran
> benchmarks on them all and wanted to share:
>
> $ t2
> 1763140 3
>
> NB. Thomas's suggestion
> groups1 =: 3 : 0
> ]key=. (}."1</.{."1) y
> (<"1 t) ,.~ (;key) ,. (<"1|:e.key)
> )
> timespacex 'groups1 t2'
> NB. out of memory
>
> NB. Joe's implementation
> groups2 =: 3 : 0
> groups=. [: (}."1 each </. ]) ({."1 </. ])
> ids=. ; L:2 @: ([: (~.L:1) 0{"1 L:1 ])
> ids groups y
> )
> timespacex 'groups2 t2'
> NB. 7.87302 2.63376e8
>
> NB. R.E. Boss's 1st implementation
> groups3a=: 3 : 0
> (~. {."1 t2) </.~ (i.~ ~.) ({."1 </. }."1) y
> )
> timespacex 'groups3a t2'
> NB. 6.53307 2.23953e8
>
> NB. R.E. Boss's 2nd implementation
> groups3b =: 3 : 0
> T=.(i.~ ~.)"1 &.|:y
> (~.{."1 y) </.~ (i.~ ~.)({."1 </.}."1) T
> )
> timespacex 'groups3b t2'
> NB. 2.46257 2.33259e8
>
> NB. Greg Hei's implementation
> groups4 =: 3 : 0
> (~.{."1 y) </. ~ (i.~ ~.) ({."1 </.([:;}.)"1) y
> )
> timespacex 'groups4 t2'
> NB. 2.74571 3.24026e8
>
> NB. Pascal Jasmin's symbol test
> groups5 =: 3 : 0
> symboled=. ( ({. (, <) [: s: <@:(1&{:: , ' ' , [: ": 2&{::))) "1 y
> (([: ~. {."1) </.~ [: (i.~ ~.)({."1 </.}."1)) symboled
> )
> timespacex 'groups5 t2'
> NB. 11.5519 5.73956e8
>
> timespacex '(([: ~. {."1) </.~ [: (i.~ ~.)({."1 </.}."1)) symboled'
> NB. 5.42007 1.38506e8
>
> If I apply Greg's to the symboled:
> timespacex '(~.{."1 symboled) </. ~ (i.~ ~.) ({."1 </.([:;}.)"1) symboled'
> NB. 1.43217 1.09604e8
>
> So it seems that the symbol comparisons are faster, but in this case
> probably not worth the six second penalty to create
>
>
> Of the 1.7M rows:
>
> There are 2104 unique values in the first column
> $ ~. 0{"1 t2
> 2104
>
> And 7,277 in the 2nd two columns
> $ ~. }."1 t2
> 7277 2
>
> On Sun, Jul 27, 2014 at 12:58 AM, greg heil <[email protected]> wrote:
>> Joe
>>
>> If the only thing that matters is the car and cdr of each entry why
>> not just raze the cdr, eg
>>
>> (~.{."1 t)</.~ (i.~ ~.) ({."1 </.([:;}.)"1) t
>>
>> or just not box them separately in the first place;-)
>>
>> One could independently take symbols, and do that timing experiment.
>>
>> greg
>> ~krsnadas.org
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm