Genius! I tried (and failed) to convert your reply to a script on my iPhone:
X=:'Now is the time for all good men to come ti' Y=: 'Now is the time for the quick brown good men to come ti' trig=: 3,\&.> X;Y NB. Get the nub of the union of both sets of trigrams and prepend it to each trigram set. supertrig=: (,~&.> <@~.@;) trig NB. Now we can use Key to count the trigrams in each set and decrement by 1 (for the extra copy that we added). show=: verb define <: #/.~&> ) show supertrig It almost worked, in particular your code did (j701): load'~/user/ric.ijs' |syntax error: show | show supertrig |[-11] /j/user/ric.ijs supertrig +---+---+ |Now|Now| |ow |ow | |w i|w i| | is| is| |is |is | |s t|s t| | th| th| |the|the| |he |he | |e t|e t| | ti| ti| |tim|tim| . Clip... | |to | | |o c| | | co| | |com| | |ome| | |me | | |e t| | | ti| +---+---+ show 3 : '<: #/.~&>' <: #/.~&> supertrig 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 1 1 2 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > On Oct 12, 2019, at 1:22 AM, Ric Sherlock <tikk...@gmail.com> wrote: > > Here's one approach... > > I find it much easier to work with if there is actual data. The following > may not be representative of your data but it gives us somewhere to start. > > ]'X Y'=: 'actg' {~ 2 30 ?@$ 4 > > ggtaaaatgactgtagtgaagaaggagtcc > > ctgattaaggttcggtgtcgataccgcgca > > > We now have 2 strings X and Y. Let's obtain the trigrams for each string > > trig=: 3,\&.> X;Y Get the nub of the union of both sets of trigrams and > prepend it to each trigram set. supertrig=: (,~&.> <@~.@;) trig Now we can > use Key to count the trigrams in each set and decrement by 1 (for the extra > copy that we added). <: #/.~&> supertrig > > 1 2 1 2 1 1 2 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 2 0 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 0 0 1 0 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 > > Or to summarise by trigram: > > (~.@; trig);|: <: #/.~&> supertrig > > +---+---+ > > |ggt|1 2| > > |gta|2 0| > > |taa|1 1| > > |aaa|2 0| > > |aat|1 0| > > |atg|1 0| > > |tga|2 1| > > |gac|1 0| > > |act|1 0| > > |ctg|1 1| > > |tgt|1 1| > > |tag|1 0| > > |agt|2 0| > > |gtg|1 1| > > |gaa|2 0| > > |aag|2 1| > > |aga|1 0| > > |agg|1 1| > > |gga|1 0| > > |gag|1 0| > > |gtc|1 1| > > |tcc|1 0| > > |gat|0 2| > > |att|0 1| > > |tta|0 1| > > |gtt|0 1| > > |ttc|0 1| > > |tcg|0 2| > > |cgg|0 1| > > |cga|0 1| > > |ata|0 1| > > |tac|0 1| > > |acc|0 1| > > |ccg|0 1| > > |cgc|0 2| > > |gcg|0 1| > > |gca|0 1| > > +---+---+ > > >> On Sat, Oct 12, 2019 at 4:40 PM 'Jim Russell' via Programming < >> programm...@jsoftware.com> wrote: >> >> Sure, thanks. I'm working to re-implement a text comparison program I did >> using VBA & Microsoft Access a number of years back. >> >> The object is to compare two text documents and see how similar one is to >> the other by comparing the number of unique trigrams that are found in >> each. >> For each text string a table of trigrams is constructed with the >> expression 3,\x. The resulting table of 3-character samples m is then >> tallied using #/.~m . This yields a vector of counts of each unique trigram >> corresponding to (an unseen) nub of m. The count, and a copy of the nub of >> m, represent a summary of the text in string x. >> This same process then repeated to creat a smry for the second string, y. >> >> The next step in the process is to assign a score of 0 to 1 based on a >> comparison of the two string summaries. It would seem sensible to compare >> the nub of the two text strings to each other. What is the difference in >> counts between the trigrams they have in common, and how many trigram hits >> for each are unique? >> That is where using nub1 #/. nub2 would be attractive, were it not >> required that the arguments had the same row counts, and Key could not >> count unmatched rows. >> >> As it stands, I fear I am duplicating effort to find the nubs in preparing >> the summaries, and again if I have to use i. to calculate the scores. If I >> get a vector result when I use key on vectors, might I expect a table >> result (including the counts and the nub) when key is applied to tables? >> >> Or is there a more appropriate approach? (In access and VBA, I used >> dictionary objects with 3 character keys, as I recall. But I was very >> pleasantly surprised at how well the 3 character trigrams recognized text >> similarities.) >> >> I really appreciate any insights you might have, Ric, and thanks for >> tolerating my ignorance. >> >>>> On Oct 11, 2019, at 10:23 PM, Ric Sherlock <tikk...@gmail.com> wrote: >>> >>> Not sure I'm understanding your questions. Maybe including some of the >>> expressions you've tried to illustrate your points would help? >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm