Depending on how you intend to use this, the following might suggest a
more substantial speed-up. I'll first re-cap what we've seen so far
to provide a basis for my timings, then I'll sketch out an unfinished
idea for potentially speeding up the process.
findSDM=: +/@:="1/~ NB. Tarmo Veskioja's original
findSDMvc=: +/"1@:(="1/~) NB. Victor Cerovski's
findSDMrm=: +/@:(=/~"1) NB. Raul Miller's
(10) 6!:2 'findSDM t2'
5.68061
(10) 6!:2 'findSDMvc t2'
4.32129
(10) 6!:2 'findSDMrm T2' [ T2=: |:t2
4.21779
NB. So, the two suggestions are both a little bit better on my machine.
NB. A preliminary idea for speeding up process by reducing amount
NB. of data processed per invocation by grouping "like" items:
<.%:#t2 NB. Try to scale as square root of
number of records...
70
refpts=: 70 40 ?@$ 5 NB. Random reference points...
$keys=. +/+/"1 refpts="1/t2 NB. Group by similarity to reference points
5000
$findSDM&.>keys </. t2
136
NB. This gives matches within groups - a partial, approximate solution...
$&.>findSDM&.>keys </. t2
+-----+-----+-------+-------+-----+-----+-----+-----...
|57 57|73 73|107 107|116 116|76 76|29 29|74 74|65 65...
+-----+-----+-------+-------+-----+-----+-----+-----...
NB. Combining these ideas:
findSDMdhm=: 3 : 'refpts;findSDM &.> (+/+/"1 (refpts=:
((<.%:#y),1{$y)?@$5)="1/y) </. y'
(10) 6!:2 'findSDMdhm t2'
0.150275
--
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm