Genius! I tried (and failed) to convert your reply to a script on my iPhone:

X=:'Now is the time for all good men to come ti'
Y=: 'Now is the time for the quick brown good men to come ti'
trig=: 3,\&.> X;Y
NB. Get the nub of the union of both sets of trigrams and prepend it to each 
trigram set. 
supertrig=: (,~&.> <@~.@;) trig 
NB. Now we can use Key to count the trigrams in each set and decrement by 1 
(for the extra  copy that we added).

show=: verb define
<: #/.~&>
)
show supertrig

It almost worked, in particular your code did (j701):
   load'~/user/ric.ijs'
|syntax error: show
|       show supertrig
|[-11] /j/user/ric.ijs
   supertrig
+---+---+
|Now|Now|
|ow |ow |
|w i|w i|
| is| is|
|is |is |
|s t|s t|
| th| th|
|the|the|
|he |he |
|e t|e t|
| ti| ti|
|tim|tim|

. Clip...

|   |to |
|   |o c|
|   | co|
|   |com|
|   |ome|
|   |me |
|   |e t|
|   | ti|
+---+---+
   show
3 : '<: #/.~&>'
 
   <: #/.~&> supertrig
1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 2 2 2 2 2 1 1 2 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 1 1


> On Oct 12, 2019, at 1:22 AM, Ric Sherlock <tikk...@gmail.com> wrote:
> 
> Here's one approach...
> 
> I find it much easier to work with if there is actual data. The following
> may not be representative of your data but it gives us somewhere to start.
> 
>  ]'X Y'=: 'actg' {~ 2 30 ?@$ 4
> 
> ggtaaaatgactgtagtgaagaaggagtcc
> 
> ctgattaaggttcggtgtcgataccgcgca
> 
> 
> We now have 2 strings X and Y. Let's obtain the trigrams for each string
> 
> trig=: 3,\&.> X;Y Get the nub of the union of both sets of trigrams and
> prepend it to each trigram set. supertrig=: (,~&.> <@~.@;) trig Now we can
> use Key to count the trigrams in each set and decrement by 1 (for the extra
> copy that we added). <: #/.~&> supertrig
> 
> 1 2 1 2 1 1 2 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 
> 2 0 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 0 0 1 0 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1
> 
> Or to summarise by trigram:
> 
> (~.@; trig);|: <: #/.~&> supertrig
> 
> +---+---+
> 
> |ggt|1 2|
> 
> |gta|2 0|
> 
> |taa|1 1|
> 
> |aaa|2 0|
> 
> |aat|1 0|
> 
> |atg|1 0|
> 
> |tga|2 1|
> 
> |gac|1 0|
> 
> |act|1 0|
> 
> |ctg|1 1|
> 
> |tgt|1 1|
> 
> |tag|1 0|
> 
> |agt|2 0|
> 
> |gtg|1 1|
> 
> |gaa|2 0|
> 
> |aag|2 1|
> 
> |aga|1 0|
> 
> |agg|1 1|
> 
> |gga|1 0|
> 
> |gag|1 0|
> 
> |gtc|1 1|
> 
> |tcc|1 0|
> 
> |gat|0 2|
> 
> |att|0 1|
> 
> |tta|0 1|
> 
> |gtt|0 1|
> 
> |ttc|0 1|
> 
> |tcg|0 2|
> 
> |cgg|0 1|
> 
> |cga|0 1|
> 
> |ata|0 1|
> 
> |tac|0 1|
> 
> |acc|0 1|
> 
> |ccg|0 1|
> 
> |cgc|0 2|
> 
> |gcg|0 1|
> 
> |gca|0 1|
> 
> +---+---+
> 
> 
>> On Sat, Oct 12, 2019 at 4:40 PM 'Jim Russell' via Programming <
>> programm...@jsoftware.com> wrote:
>> 
>> Sure, thanks. I'm working to re-implement a text comparison program I did
>> using VBA & Microsoft Access a number of years back.
>> 
>> The object is to compare two text documents and see how similar one is to
>> the other by  comparing the number of unique trigrams that are found in
>> each.
>> For each text string a table of trigrams is constructed with the
>> expression 3,\x. The resulting table of 3-character samples m is then
>> tallied using #/.~m . This yields a vector of counts of each unique trigram
>> corresponding to (an unseen) nub of m. The count, and a copy of the nub of
>> m, represent a summary of the text in string x.
>> This same process then repeated to creat a smry for the second string, y.
>> 
>> The next step in the process is to assign a score of 0 to 1 based on a
>> comparison of the two string summaries. It would seem sensible to compare
>> the nub of the two text strings to each other. What is the difference in
>> counts between the trigrams they have in common, and how many trigram hits
>> for each are unique?
>> That is where using nub1 #/. nub2 would be attractive, were it not
>> required that the arguments had the same row counts, and Key could not
>> count unmatched rows.
>> 
>> As it stands, I fear I am duplicating effort to find the nubs in preparing
>> the summaries, and again if I have to use i. to calculate the scores. If I
>> get a vector result when I use key on vectors, might I expect a table
>> result (including the counts and the nub) when key is applied to tables?
>> 
>> Or is there a more appropriate approach? (In access and VBA, I used
>> dictionary objects with 3 character keys, as I recall. But I was very
>> pleasantly surprised at how well the 3 character trigrams recognized text
>> similarities.)
>> 
>> I really appreciate any insights you might have, Ric, and thanks for
>> tolerating my ignorance.
>> 
>>>> On Oct 11, 2019, at 10:23 PM, Ric Sherlock <tikk...@gmail.com> wrote:
>>> 
>>> Not sure I'm understanding your questions. Maybe including some of the
>>> expressions you've tried to illustrate your points would help?
>> 
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to