I use the "Tanimoto correlation" often. John Randall had sent me his notes on the subject over a year ago, and it proved quite useful (Thanks John!). Note: this operates on binary arrays, not sets. TanimotoCor=: (+/@:,)@:*. % (+/@:,)@:+.
The "Tanimoto metric" discussed is related quite simply. TanimotoMet=: 1: - TanimotoCor And if you wanted to operate on arrays as sets instead of binary arrays, TanimotoMetSets=: (~.@, e. [) TanimotoMet (~.@, e. ]) This looks to me like it acts the same as Henry Rich's verb. Jordan -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Henry Rich Sent: Monday, December 21, 2009 11:18 AM To: Programming forum Subject: [Jprogramming] Tanimoto distance tanimoto =: (+&# %/@:- 2 1&*@:(+/@:e.))&~."1 seems to do what that paper suggests. Henry Rich Jan Jacobs wrote: > Henry, David, > I'm sorry, the distance is better known as "Tanimoto metric" and is > described in > http://www.dcs.gla.ac.uk/~girolami/Machine_Learning_Module_2006/week_5/L ectures/lect_1.pdf, > page 14(3/3) > ((almost) similar to David's Jaccard index). In my applications I use > Levensthein for catching typing/OCR errors and I'm interested to test the > Tanimoto distance for document spaces where each document is expressed in a > "bag of words" vector. > > The urgence of the Tanimoto distance is not that high. I was merely > suggesting that several standard distance or "difference" metrics could be > integrated in the minus (-) or s: derrivatives instead of coding them > explicitely in J. > > If you can give a nice tacit solution that would be fine but don't spent too > much time on it. > Thanks for all your help, > Jan. > > On Mon, Dec 21, 2009 at 10:07 AM, David Mitchell <[email protected]>wrote: > >> Perhaps this is what Jan had in mind: >> >> Tanimoto coefficient (extended Jaccard coefficient) >> >> "Cosine similarity is a measure of similarity between two vectors of n >> dimensions by finding the angle between them, often used to compare >> documents in >> text mining." >> >> http://en.wikipedia.org/wiki/Jaccard_index >> >> -- >> David Mitchell >> >> On 12/20/2009 22:13, Henry Rich wrote: >>> Jan, >>> >>> I have looked for a description of Tanimoto distance but have not found >>> anything useful. Can you describe what it is or point to a description >>> of it? >>> >>> Henry Rich >>> >>> Jan Jacobs wrote: >>>> Henry, >>>> very good. For longer strings it is more than double so fast as the >> previous >>>> version. In my test cases it even consumes less memory. >>>> Is it possible to include this as a native function in J (e.g. >> overloading >>>> -. or s:)? >>>> Same question but now for Tanimoto distance? >>>> Jan. >>>> >>>> >>>> On 12/19/09, R.E. Boss<[email protected]> wrote: >>>>> Smart analysis. Chapeau! >>>>> >>>>> >>>>> R.E. Boss >>>>> >>>>> >>>>> -----Oorspronkelijk bericht----- >>>>> Van: [email protected] >>>>> [mailto:[email protected]] Namens Henry Rich >>>>> Verzonden: zaterdag 19 december 2009 3:00 >>>>> Aan: Programming forum >>>>> Onderwerp: [Jprogramming] Levenshtein distance >>>>> >>>>> I was working with R. E.'s compact implementation of the Levenshtein >>>>> distance and I found an interesting equivalence: >>>>> >>>>> (<.>:)/\.&.|. >>>>> >>>>> can be replaced by >>>>> >>>>> (<./\@:- + ]) i...@# >>>>> >>>>> which uses a little more space but is quite a bit faster for large >>>>> operands. So now I have the version: >>>>> >>>>> NB. Levenshtein distance between two strings >>>>> levdist=: 4 : 0 >>>>> 'a b'=. (/: #&>)x;y >>>>> z=. i.>:#b >>>>> for_j. a do. >>>>> z=. ((<./\@:- + ]) i...@#) ((j ~: b) + }:z) ({...@] , (<. }.))>:z >>>>> end. >>>>> {:z >>>>> ) >>>>> >>>>> >>>>> Henry Rich >>>>> ---------------------------------------------------------------------- >>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>> >>>>> ---------------------------------------------------------------------- >>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>> >>>> >>>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
