Sure, thanks. I'm working to re-implement a text comparison program I did using 
VBA & Microsoft Access a number of years back. 

The object is to compare two text documents and see how similar one is to the 
other by  comparing the number of unique trigrams that are found in each. 
For each text string a table of trigrams is constructed with the expression 
3,\x. The resulting table of 3-character samples m is then tallied using #/.~m 
. This yields a vector of counts of each unique trigram corresponding to (an 
unseen) nub of m. The count, and a copy of the nub of m, represent a summary of 
the text in string x. 
This same process then repeated to creat a smry for the second string, y. 

The next step in the process is to assign a score of 0 to 1 based on a 
comparison of the two string summaries. It would seem sensible to compare the 
nub of the two text strings to each other. What is the difference in counts 
between the trigrams they have in common, and how many trigram hits for each 
are unique?
That is where using nub1 #/. nub2 would be attractive, were it not required 
that the arguments had the same row counts, and Key could not count unmatched 
rows. 

As it stands, I fear I am duplicating effort to find the nubs in preparing the 
summaries, and again if I have to use i. to calculate the scores. If I get a 
vector result when I use key on vectors, might I expect a table result 
(including the counts and the nub) when key is applied to tables?

Or is there a more appropriate approach? (In access and VBA, I used dictionary 
objects with 3 character keys, as I recall. But I was very pleasantly surprised 
at how well the 3 character trigrams recognized text similarities.)
 
I really appreciate any insights you might have, Ric, and thanks for tolerating 
my ignorance.

> On Oct 11, 2019, at 10:23 PM, Ric Sherlock <tikk...@gmail.com> wrote:
> 
> Not sure I'm understanding your questions. Maybe including some of the
> expressions you've tried to illustrate your points would help?

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to