Jordan, Henry,
thank you very much for the help. More than I expected :-)
Jan.


On 12/22/09, Tirrell, Jordan (Consultant) <[email protected]> wrote:
>
>
> I use the "Tanimoto correlation" often. John Randall had sent me his
> notes on the subject over a year ago, and it proved quite useful (Thanks
> John!). Note: this operates on binary arrays, not sets.
> TanimotoCor=: (+/@:,)@:*. % (+/@:,)@:+.
>
> The "Tanimoto metric" discussed is related quite simply.
> TanimotoMet=: 1: - TanimotoCor
>
> And if you wanted to operate on arrays as sets instead of binary arrays,
> TanimotoMetSets=: (~.@, e. [) TanimotoMet (~.@, e. ])
> This looks to me like it acts the same as Henry Rich's verb.
>
> Jordan
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Henry Rich
> Sent: Monday, December 21, 2009 11:18 AM
> To: Programming forum
> Subject: [Jprogramming] Tanimoto distance
>
> tanimoto =: (+&#   %/@:-   2 1&*@:(+/@:e.))&~."1
>
> seems to do what that paper suggests.
>
> Henry Rich
>
> Jan Jacobs wrote:
> > Henry, David,
> > I'm sorry, the distance is better known as "Tanimoto metric" and is
> > described in
> >
> http://www.dcs.gla.ac.uk/~girolami/Machine_Learning_Module_2006/week_5/L
> ectures/lect_1.pdf,
> > page 14(3/3)
> > ((almost) similar to David's Jaccard index). In my applications I use
> > Levensthein for catching typing/OCR errors and I'm interested to test
> the
> > Tanimoto distance for document spaces where each document is expressed
> in a
> > "bag of words" vector.
> >
> > The urgence of the Tanimoto distance is not that high. I was merely
> > suggesting that several standard distance or "difference" metrics
> could be
> > integrated in the minus (-) or s: derrivatives instead of coding them
> > explicitely in J.
> >
> > If you can give a nice tacit solution that would be fine but don't
> spent too
> > much time on it.
> > Thanks for all your help,
> > Jan.
> >
> > On Mon, Dec 21, 2009 at 10:07 AM, David Mitchell
> <[email protected]>wrote:
> >
> >> Perhaps this is what Jan had in mind:
> >>
> >> Tanimoto coefficient (extended Jaccard coefficient)
> >>
> >> "Cosine similarity is a measure of similarity between two vectors of
> n
> >> dimensions by finding the angle between them, often used to compare
> >> documents in
> >> text mining."
> >>
> >> http://en.wikipedia.org/wiki/Jaccard_index
> >>
> >> --
> >> David Mitchell
> >>
> >> On 12/20/2009 22:13, Henry Rich wrote:
> >>> Jan,
> >>>
> >>> I have looked for a description of Tanimoto distance but have not
> found
> >>> anything useful.  Can you describe what it is or point to a
> description
> >>> of it?
> >>>
> >>> Henry Rich
> >>>
> >>> Jan Jacobs wrote:
> >>>> Henry,
> >>>> very good. For longer strings it is more than double so fast as the
> >> previous
> >>>> version. In my test cases it even consumes less memory.
> >>>> Is it possible to include this as a native function in J (e.g.
> >> overloading
> >>>> -. or s:)?
> >>>> Same question but now for Tanimoto distance?
> >>>> Jan.
> >>>>
> >>>>
> >>>> On 12/19/09, R.E. Boss<[email protected]>  wrote:
> >>>>> Smart analysis. Chapeau!
> >>>>>
> >>>>>
> >>>>> R.E. Boss
> >>>>>
> >>>>>
> >>>>> -----Oorspronkelijk bericht-----
> >>>>> Van: [email protected]
> >>>>> [mailto:[email protected]] Namens Henry Rich
> >>>>> Verzonden: zaterdag 19 december 2009 3:00
> >>>>> Aan: Programming forum
> >>>>> Onderwerp: [Jprogramming] Levenshtein distance
> >>>>>
> >>>>> I was working with R. E.'s compact implementation of the
> Levenshtein
> >>>>> distance and I found an interesting equivalence:
> >>>>>
> >>>>> (<.>:)/\.&.|.
> >>>>>
> >>>>> can be replaced by
> >>>>>
> >>>>> (<./\@:- + ]) i...@#
> >>>>>
> >>>>> which uses a little more space but is quite a bit faster for large
> >>>>> operands.  So now I have the version:
> >>>>>
> >>>>> NB. Levenshtein distance between two strings
> >>>>> levdist=: 4 : 0
> >>>>> 'a b'=. (/: #&>)x;y
> >>>>> z=. i.>:#b
> >>>>> for_j. a do.
> >>>>> z=. ((<./\@:- + ]) i...@#) ((j ~: b) + }:z) ({...@] , (<. }.))>:z
> >>>>> end.
> >>>>> {:z
> >>>>> )
> >>>>>
> >>>>>
> >>>>> Henry Rich
> >>>>>
> ----------------------------------------------------------------------
> >>>>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>>>>
> >>>>>
> ----------------------------------------------------------------------
> >>>>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>>>>
> >>>>
> >>>>
> >>>
> ----------------------------------------------------------------------
> >>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>>
> >>
> ----------------------------------------------------------------------
> >> For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>
> >
> >
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Jan Jacobs
Esdoornstraat 33
5995AN Kessel
T: +31 77 462 1887
M: +31 6 23 82 55 21
E: [email protected]
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to