I use the "Tanimoto correlation" often. John Randall had sent me his
notes on the subject over a year ago, and it proved quite useful (Thanks
John!). Note: this operates on binary arrays, not sets.
TanimotoCor=: (+/@:,)@:*. % (+/@:,)@:+.

The "Tanimoto metric" discussed is related quite simply.
TanimotoMet=: 1: - TanimotoCor

And if you wanted to operate on arrays as sets instead of binary arrays,
TanimotoMetSets=: (~.@, e. [) TanimotoMet (~.@, e. ])
This looks to me like it acts the same as Henry Rich's verb.

Jordan

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Henry Rich
Sent: Monday, December 21, 2009 11:18 AM
To: Programming forum
Subject: [Jprogramming] Tanimoto distance

tanimoto =: (+&#   %/@:-   2 1&*@:(+/@:e.))&~."1

seems to do what that paper suggests.

Henry Rich

Jan Jacobs wrote:
> Henry, David,
> I'm sorry, the distance is better known as "Tanimoto metric" and is
> described in
>
http://www.dcs.gla.ac.uk/~girolami/Machine_Learning_Module_2006/week_5/L
ectures/lect_1.pdf,
> page 14(3/3)
> ((almost) similar to David's Jaccard index). In my applications I use
> Levensthein for catching typing/OCR errors and I'm interested to test
the
> Tanimoto distance for document spaces where each document is expressed
in a
> "bag of words" vector.
> 
> The urgence of the Tanimoto distance is not that high. I was merely
> suggesting that several standard distance or "difference" metrics
could be
> integrated in the minus (-) or s: derrivatives instead of coding them
> explicitely in J.
> 
> If you can give a nice tacit solution that would be fine but don't
spent too
> much time on it.
> Thanks for all your help,
> Jan.
> 
> On Mon, Dec 21, 2009 at 10:07 AM, David Mitchell
<[email protected]>wrote:
> 
>> Perhaps this is what Jan had in mind:
>>
>> Tanimoto coefficient (extended Jaccard coefficient)
>>
>> "Cosine similarity is a measure of similarity between two vectors of
n
>> dimensions by finding the angle between them, often used to compare
>> documents in
>> text mining."
>>
>> http://en.wikipedia.org/wiki/Jaccard_index
>>
>> --
>> David Mitchell
>>
>> On 12/20/2009 22:13, Henry Rich wrote:
>>> Jan,
>>>
>>> I have looked for a description of Tanimoto distance but have not
found
>>> anything useful.  Can you describe what it is or point to a
description
>>> of it?
>>>
>>> Henry Rich
>>>
>>> Jan Jacobs wrote:
>>>> Henry,
>>>> very good. For longer strings it is more than double so fast as the
>> previous
>>>> version. In my test cases it even consumes less memory.
>>>> Is it possible to include this as a native function in J (e.g.
>> overloading
>>>> -. or s:)?
>>>> Same question but now for Tanimoto distance?
>>>> Jan.
>>>>
>>>>
>>>> On 12/19/09, R.E. Boss<[email protected]>  wrote:
>>>>> Smart analysis. Chapeau!
>>>>>
>>>>>
>>>>> R.E. Boss
>>>>>
>>>>>
>>>>> -----Oorspronkelijk bericht-----
>>>>> Van: [email protected]
>>>>> [mailto:[email protected]] Namens Henry Rich
>>>>> Verzonden: zaterdag 19 december 2009 3:00
>>>>> Aan: Programming forum
>>>>> Onderwerp: [Jprogramming] Levenshtein distance
>>>>>
>>>>> I was working with R. E.'s compact implementation of the
Levenshtein
>>>>> distance and I found an interesting equivalence:
>>>>>
>>>>> (<.>:)/\.&.|.
>>>>>
>>>>> can be replaced by
>>>>>
>>>>> (<./\@:- + ]) i...@#
>>>>>
>>>>> which uses a little more space but is quite a bit faster for large
>>>>> operands.  So now I have the version:
>>>>>
>>>>> NB. Levenshtein distance between two strings
>>>>> levdist=: 4 : 0
>>>>> 'a b'=. (/: #&>)x;y
>>>>> z=. i.>:#b
>>>>> for_j. a do.
>>>>> z=. ((<./\@:- + ]) i...@#) ((j ~: b) + }:z) ({...@] , (<. }.))>:z
>>>>> end.
>>>>> {:z
>>>>> )
>>>>>
>>>>>
>>>>> Henry Rich
>>>>>
----------------------------------------------------------------------
>>>>> For information about J forums see
http://www.jsoftware.com/forums.htm
>>>>>
>>>>>
----------------------------------------------------------------------
>>>>> For information about J forums see
http://www.jsoftware.com/forums.htm
>>>>>
>>>>
>>>>
>>>
----------------------------------------------------------------------
>>> For information about J forums see
http://www.jsoftware.com/forums.htm
>>>
>>
----------------------------------------------------------------------
>> For information about J forums see
http://www.jsoftware.com/forums.htm
>>
> 
> 
> 
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to