Jordan, Henry, thank you very much for the help. More than I expected :-) Jan.
On 12/22/09, Tirrell, Jordan (Consultant) <[email protected]> wrote: > > > I use the "Tanimoto correlation" often. John Randall had sent me his > notes on the subject over a year ago, and it proved quite useful (Thanks > John!). Note: this operates on binary arrays, not sets. > TanimotoCor=: (+/@:,)@:*. % (+/@:,)@:+. > > The "Tanimoto metric" discussed is related quite simply. > TanimotoMet=: 1: - TanimotoCor > > And if you wanted to operate on arrays as sets instead of binary arrays, > TanimotoMetSets=: (~.@, e. [) TanimotoMet (~.@, e. ]) > This looks to me like it acts the same as Henry Rich's verb. > > Jordan > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Henry Rich > Sent: Monday, December 21, 2009 11:18 AM > To: Programming forum > Subject: [Jprogramming] Tanimoto distance > > tanimoto =: (+&# %/@:- 2 1&*@:(+/@:e.))&~."1 > > seems to do what that paper suggests. > > Henry Rich > > Jan Jacobs wrote: > > Henry, David, > > I'm sorry, the distance is better known as "Tanimoto metric" and is > > described in > > > http://www.dcs.gla.ac.uk/~girolami/Machine_Learning_Module_2006/week_5/L > ectures/lect_1.pdf, > > page 14(3/3) > > ((almost) similar to David's Jaccard index). In my applications I use > > Levensthein for catching typing/OCR errors and I'm interested to test > the > > Tanimoto distance for document spaces where each document is expressed > in a > > "bag of words" vector. > > > > The urgence of the Tanimoto distance is not that high. I was merely > > suggesting that several standard distance or "difference" metrics > could be > > integrated in the minus (-) or s: derrivatives instead of coding them > > explicitely in J. > > > > If you can give a nice tacit solution that would be fine but don't > spent too > > much time on it. > > Thanks for all your help, > > Jan. > > > > On Mon, Dec 21, 2009 at 10:07 AM, David Mitchell > <[email protected]>wrote: > > > >> Perhaps this is what Jan had in mind: > >> > >> Tanimoto coefficient (extended Jaccard coefficient) > >> > >> "Cosine similarity is a measure of similarity between two vectors of > n > >> dimensions by finding the angle between them, often used to compare > >> documents in > >> text mining." > >> > >> http://en.wikipedia.org/wiki/Jaccard_index > >> > >> -- > >> David Mitchell > >> > >> On 12/20/2009 22:13, Henry Rich wrote: > >>> Jan, > >>> > >>> I have looked for a description of Tanimoto distance but have not > found > >>> anything useful. Can you describe what it is or point to a > description > >>> of it? > >>> > >>> Henry Rich > >>> > >>> Jan Jacobs wrote: > >>>> Henry, > >>>> very good. For longer strings it is more than double so fast as the > >> previous > >>>> version. In my test cases it even consumes less memory. > >>>> Is it possible to include this as a native function in J (e.g. > >> overloading > >>>> -. or s:)? > >>>> Same question but now for Tanimoto distance? > >>>> Jan. > >>>> > >>>> > >>>> On 12/19/09, R.E. Boss<[email protected]> wrote: > >>>>> Smart analysis. Chapeau! > >>>>> > >>>>> > >>>>> R.E. Boss > >>>>> > >>>>> > >>>>> -----Oorspronkelijk bericht----- > >>>>> Van: [email protected] > >>>>> [mailto:[email protected]] Namens Henry Rich > >>>>> Verzonden: zaterdag 19 december 2009 3:00 > >>>>> Aan: Programming forum > >>>>> Onderwerp: [Jprogramming] Levenshtein distance > >>>>> > >>>>> I was working with R. E.'s compact implementation of the > Levenshtein > >>>>> distance and I found an interesting equivalence: > >>>>> > >>>>> (<.>:)/\.&.|. > >>>>> > >>>>> can be replaced by > >>>>> > >>>>> (<./\@:- + ]) i...@# > >>>>> > >>>>> which uses a little more space but is quite a bit faster for large > >>>>> operands. So now I have the version: > >>>>> > >>>>> NB. Levenshtein distance between two strings > >>>>> levdist=: 4 : 0 > >>>>> 'a b'=. (/: #&>)x;y > >>>>> z=. i.>:#b > >>>>> for_j. a do. > >>>>> z=. ((<./\@:- + ]) i...@#) ((j ~: b) + }:z) ({...@] , (<. }.))>:z > >>>>> end. > >>>>> {:z > >>>>> ) > >>>>> > >>>>> > >>>>> Henry Rich > >>>>> > ---------------------------------------------------------------------- > >>>>> For information about J forums see > http://www.jsoftware.com/forums.htm > >>>>> > >>>>> > ---------------------------------------------------------------------- > >>>>> For information about J forums see > http://www.jsoftware.com/forums.htm > >>>>> > >>>> > >>>> > >>> > ---------------------------------------------------------------------- > >>> For information about J forums see > http://www.jsoftware.com/forums.htm > >>> > >> > ---------------------------------------------------------------------- > >> For information about J forums see > http://www.jsoftware.com/forums.htm > >> > > > > > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Jan Jacobs Esdoornstraat 33 5995AN Kessel T: +31 77 462 1887 M: +31 6 23 82 55 21 E: [email protected] ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
