Re: [Jprogramming] Any faster way of computing a similarity distance matrix ?

Devon McCormick Thu, 17 May 2012 10:45:57 -0700

Depending on how you intend to use this, the following might suggest a
more substantial speed-up.  I'll  first re-cap what we've seen so far
to provide a basis for my timings, then I'll sketch out an unfinished
idea for potentially speeding up the process.


   findSDM=: +/@:="1/~        NB. Tarmo Veskioja's original
   findSDMvc=: +/"1@:(="1/~)  NB. Victor Cerovski's
   findSDMrm=: +/@:(=/~"1)    NB. Raul Miller's

   (10) 6!:2 'findSDM t2'
5.68061
   (10) 6!:2 'findSDMvc t2'
4.32129
   (10) 6!:2 'findSDMrm T2' [ T2=: |:t2
4.21779

NB. So, the two suggestions are both a little bit better on my machine.

NB. A preliminary idea for speeding up process by reducing amount
NB. of data processed per invocation by grouping "like" items:

   <.%:#t2                         NB. Try to scale as square root of
number of records...
70
   refpts=: 70 40 ?@$ 5            NB. Random reference points...
   $keys=. +/+/"1 refpts="1/t2     NB. Group by similarity to reference points
5000
   $findSDM&.>keys </. t2
136

NB. This gives matches within groups - a partial, approximate solution...
   $&.>findSDM&.>keys </. t2
+-----+-----+-------+-------+-----+-----+-----+-----...
|57 57|73 73|107 107|116 116|76 76|29 29|74 74|65 65...
+-----+-----+-------+-------+-----+-----+-----+-----...

NB. Combining these ideas:
   findSDMdhm=: 3 : 'refpts;findSDM &.> (+/+/"1 (refpts=:
((<.%:#y),1{$y)?@$5)="1/y) </. y'
   (10) 6!:2 'findSDMdhm t2'
0.150275

-- 
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Any faster way of computing a similarity distance matrix ?

Reply via email to