Henry, David, I'm sorry, the distance is better known as "Tanimoto metric" and is described in http://www.dcs.gla.ac.uk/~girolami/Machine_Learning_Module_2006/week_5/Lectures/lect_1.pdf, page 14(3/3) ((almost) similar to David's Jaccard index). In my applications I use Levensthein for catching typing/OCR errors and I'm interested to test the Tanimoto distance for document spaces where each document is expressed in a "bag of words" vector.
The urgence of the Tanimoto distance is not that high. I was merely suggesting that several standard distance or "difference" metrics could be integrated in the minus (-) or s: derrivatives instead of coding them explicitely in J. If you can give a nice tacit solution that would be fine but don't spent too much time on it. Thanks for all your help, Jan. On Mon, Dec 21, 2009 at 10:07 AM, David Mitchell <davidmitch...@att.net>wrote: > Perhaps this is what Jan had in mind: > > Tanimoto coefficient (extended Jaccard coefficient) > > "Cosine similarity is a measure of similarity between two vectors of n > dimensions by finding the angle between them, often used to compare > documents in > text mining." > > http://en.wikipedia.org/wiki/Jaccard_index > > -- > David Mitchell > > On 12/20/2009 22:13, Henry Rich wrote: > > Jan, > > > > I have looked for a description of Tanimoto distance but have not found > > anything useful. Can you describe what it is or point to a description > > of it? > > > > Henry Rich > > > > Jan Jacobs wrote: > >> Henry, > >> very good. For longer strings it is more than double so fast as the > previous > >> version. In my test cases it even consumes less memory. > >> Is it possible to include this as a native function in J (e.g. > overloading > >> -. or s:)? > >> Same question but now for Tanimoto distance? > >> Jan. > >> > >> > >> On 12/19/09, R.E. Boss<r.e.b...@planet.nl> wrote: > >>> Smart analysis. Chapeau! > >>> > >>> > >>> R.E. Boss > >>> > >>> > >>> -----Oorspronkelijk bericht----- > >>> Van: programming-boun...@jsoftware.com > >>> [mailto:programming-boun...@jsoftware.com] Namens Henry Rich > >>> Verzonden: zaterdag 19 december 2009 3:00 > >>> Aan: Programming forum > >>> Onderwerp: [Jprogramming] Levenshtein distance > >>> > >>> I was working with R. E.'s compact implementation of the Levenshtein > >>> distance and I found an interesting equivalence: > >>> > >>> (<.>:)/\.&.|. > >>> > >>> can be replaced by > >>> > >>> (<./\@:- + ]) i...@# > >>> > >>> which uses a little more space but is quite a bit faster for large > >>> operands. So now I have the version: > >>> > >>> NB. Levenshtein distance between two strings > >>> levdist=: 4 : 0 > >>> 'a b'=. (/: #&>)x;y > >>> z=. i.>:#b > >>> for_j. a do. > >>> z=. ((<./\@:- + ]) i...@#) ((j ~: b) + }:z) ({...@] , (<. }.))>:z > >>> end. > >>> {:z > >>> ) > >>> > >>> > >>> Henry Rich > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >>> > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >>> > >> > >> > >> > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Jan Jacobs Esdoornstraat 33 5995AN Kessel T: +31 77 462 1887 M: +31 6 23 82 55 21 E: jan.jac...@sommaps.com ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm