In http://www.jsoftware.com/jwiki/Essays/Longest%20Common%20Subsequence lcs is
described which gives the length of the longest
common subsequence.
The following solution is quite a bit more efficient.
'X Y'=: <@('ACTG'{~ 4 ?@$~ ])"0 [200 200
X lcs Y
127
{: ((>./\@:+ |.!.0)>. ])/|. X=/Y
127
ts'X lcs Y'
0.099401083 7474688
ts'{: ((>./\@:+ |.!.0)>. ])/|. X=/Y'
0.00057205218 148864
0.099401083 7474688 % 0.00057205218 148864
173.76227 50.211522
It scales rather well:
'X Y'=: <@('ACTG'{~ 4 ?@$~ ])"0 [2000 2000
lcsB=: 4 : '{: ((>./\@:+ |.!.0)>. ])/|. x=/y'
ts' X lcsB Y'
0.040288245 8485248
'X Y'=: <@('ACTG'{~ 4 ?@$~ ])"0 [20000 20000
ts' X lcsB Y '
4.1813038 1.0751246e9
So it can be easily used for script comparison.
The longest common subsequence is given by:
'p q'=: 'armageddon in the desert';'armed and dangerous' NB.
http://www.jsoftware.com/pipermail/general/2003-March/015449.html
(q;p)(];{~L:0"0 1) |:>({:"1 {.L:0@{.@(~.@:({:"1) /:~ {:"1 </. }:"1)/.
}:"1) (5&$. ,.~ [: (,+/)"(1) 4&$.) $. M * ((>./\@:+
|.!.0)>. ])/\.&.|. M=: q=/p
+---------------------------+-----------+-----------+
|0 1 2 3 4 5 7 9 10 14 15|armed n der|armed n der|
|0 1 2 5 6 10 12 13 18 19 22| | |
+---------------------------+-----------+-----------+
R.E. Boss
(Add your info to http://www.jsoftware.com/jwiki/Community/Demographics )
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm