Henry, David,
I'm sorry, the distance is better known as "Tanimoto metric" and is
described in
http://www.dcs.gla.ac.uk/~girolami/Machine_Learning_Module_2006/week_5/Lectures/lect_1.pdf,
page 14(3/3)
((almost) similar to David's Jaccard index). In my applications I use
Levensthein for catching typing/OCR errors and I'm interested to test the
Tanimoto distance for document spaces where each document is expressed in a
"bag of words" vector.

The urgence of the Tanimoto distance is not that high. I was merely
suggesting that several standard distance or "difference" metrics could be
integrated in the minus (-) or s: derrivatives instead of coding them
explicitely in J.

If you can give a nice tacit solution that would be fine but don't spent too
much time on it.
Thanks for all your help,
Jan.

On Mon, Dec 21, 2009 at 10:07 AM, David Mitchell <davidmitch...@att.net>wrote:

> Perhaps this is what Jan had in mind:
>
> Tanimoto coefficient (extended Jaccard coefficient)
>
> "Cosine similarity is a measure of similarity between two vectors of n
> dimensions by finding the angle between them, often used to compare
> documents in
> text mining."
>
> http://en.wikipedia.org/wiki/Jaccard_index
>
> --
> David Mitchell
>
> On 12/20/2009 22:13, Henry Rich wrote:
> > Jan,
> >
> > I have looked for a description of Tanimoto distance but have not found
> > anything useful.  Can you describe what it is or point to a description
> > of it?
> >
> > Henry Rich
> >
> > Jan Jacobs wrote:
> >> Henry,
> >> very good. For longer strings it is more than double so fast as the
> previous
> >> version. In my test cases it even consumes less memory.
> >> Is it possible to include this as a native function in J (e.g.
> overloading
> >> -. or s:)?
> >> Same question but now for Tanimoto distance?
> >> Jan.
> >>
> >>
> >> On 12/19/09, R.E. Boss<r.e.b...@planet.nl>  wrote:
> >>> Smart analysis. Chapeau!
> >>>
> >>>
> >>> R.E. Boss
> >>>
> >>>
> >>> -----Oorspronkelijk bericht-----
> >>> Van: programming-boun...@jsoftware.com
> >>> [mailto:programming-boun...@jsoftware.com] Namens Henry Rich
> >>> Verzonden: zaterdag 19 december 2009 3:00
> >>> Aan: Programming forum
> >>> Onderwerp: [Jprogramming] Levenshtein distance
> >>>
> >>> I was working with R. E.'s compact implementation of the Levenshtein
> >>> distance and I found an interesting equivalence:
> >>>
> >>> (<.>:)/\.&.|.
> >>>
> >>> can be replaced by
> >>>
> >>> (<./\@:- + ]) i...@#
> >>>
> >>> which uses a little more space but is quite a bit faster for large
> >>> operands.  So now I have the version:
> >>>
> >>> NB. Levenshtein distance between two strings
> >>> levdist=: 4 : 0
> >>> 'a b'=. (/: #&>)x;y
> >>> z=. i.>:#b
> >>> for_j. a do.
> >>> z=. ((<./\@:- + ]) i...@#) ((j ~: b) + }:z) ({...@] , (<. }.))>:z
> >>> end.
> >>> {:z
> >>> )
> >>>
> >>>
> >>> Henry Rich
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >>>
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >>>
> >>
> >>
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Jan Jacobs
Esdoornstraat 33
5995AN Kessel
T: +31 77 462 1887
M: +31 6 23 82 55 21
E: jan.jac...@sommaps.com
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to