Re: [Jprogramming] Vector Similarity

Raul Miller Tue, 20 Feb 2018 14:52:24 -0800

Oops, sorry about that.

Here's a fixed implementation:


   1 1 1 (prod % %:@*&(prod~)) 0 3 3
0.816497

Thanks,

-- 
Raul


On Tue, Feb 20, 2018 at 4:41 PM, Skip Cave <[email protected]> wrote:
> Very nice! Thanks Raul.
>
> However, there is something wrong about the cosine similarity,
> which should always be between 0 & 1
>
> prod=:+/ .*
>
> 1 1 1 (prod % %:@*&prod) 0 3 3
>
> 1.41421
>
> Skip
>
> On Tue, Feb 20, 2018 at 2:27 PM, Raul Miller <[email protected]> wrote:
>
>> I don't know about blog entries - I think there are probably some that
>> partially cover this topic.
>>
>> But it shouldn't be hard to implement most of these operations:
>>
>> Euclidean distance:
>>
>>    1 0 0 +/&.:*:@:- 0 1 0
>> 1.41421
>>
>> Manhattan distance:
>>
>>    1 0 0 +/@:|@:- 0 1 0
>> 2
>>
>> Minkowski distances:
>>
>>    minkowski=: 1 :'m %: [:+/ m ^~ [:| -'
>>    1 0 0 (1 minkowski) 0 1 0
>> 2
>>    1 0 0 (2 minkowski) 0 1 0
>> 1.41421
>>
>> Cosine similarity:
>>
>>    prod=:+/ .*
>>    1 0 0 (prod % %:@*&prod) 0 1 0
>> 0
>>
>> Jacard Similarity:
>>
>>    union=: ~.@,
>>    intersect=: [ ~.@:-. -.
>>    1 0 0 (intersect %&# union) 0 1 0
>> 1
>>
>> You'll probably want to use these at rank 1 ("1) if you're operating
>> on collections of vectors.
>>
>> But, I'm a little dubious about the usefulness of Jacard Similarity,
>> because of the assumptions it brings to bear (you're basically
>> encoding sets as vectors, which means your multidimensional vector
>> space is just a way of encoding a single unordered dimension).
>>
>> Anyways, I hope this helps,
>>
>> --
>> Raul
>>
>>
>>
>> On Tue, Feb 20, 2018 at 2:08 PM, Skip Cave <[email protected]>
>> wrote:
>> > One of the hottest topics in data science today is the representation of
>> > data characteristics using large multi-dimensional arrays. Each datum is
>> > represented as a data point or multi-element vector in an array that can
>> > have hundreds of dimensions. In these arrays, each dimension represents a
>> > different attribute of the data.
>> >
>> > Much useful information can be gleaned by examining the similarity, or
>> > distance between vectors in the array. However, there are many different
>> > ways to measure the similarity of two or more vectors in a
>> multidimensional
>> > space.
>> >
>> > Some common similarity/distance measures:
>> >
>> > 1. Euclidean distance <https://en.wikipedia.org/wiki/Euclidean_distance
>> >:
>> > The length of the line between two data points
>> >
>> > 2. Manhattan distance <https://en.wikipedia.org/wiki/Taxicab_geometry>:
>> Also
>> > known as Manhattan length, rectilinear distance, L1 distance or L1 norm,
>> > city block distance, Minkowski’s L1 distance, taxi-cab metric, or city
>> > block distance.
>> >
>> > 3. Minkowski distance: <https://en.wikipedia.org/wiki/Minkowski_distance>
>> a
>> > generalized metric form of Euclidean distance and Manhattan distance.
>> >
>> > 4. Cosine similarity: <https://en.wikipedia.org/wiki/Cosine_similarity>
>> The
>> > cosine of the angle between two vectors. The cosine will be between 0 &1,
>> > where 1 is alike, and 0 is not alike.
>> >
>> > 5
>> > <https://i2.wp.com/dataaspirant.com/wp-content/
>> uploads/2015/04/minkowski.png>.
>> > Jacard Similarity: <https://en.wikipedia.org/wiki/Jaccard_index> The
>> > cardinality of
>> > the intersection of sets divided by the cardinality of the union of the
>> > sample sets.
>> >
>> > Each of these metrics is useful in specific data analysis situations.
>> >
>> > In many cases, one also wants to know the similarity between clusters of
>> > points, or a point and a cluster of points. In these cases, the centroid
>> of
>> > a set of points is also a useful metric to have, which can then be used
>> > with the various distance/similarity measurements.
>> >
>> > Is there any essay or blog covering these common metrics using the J
>> > language? I would seem that J is perfectly suited for calculating these
>> > metrics, but I haven't been able to find anything much on this topic on
>> the
>> > J software site. I thought I would ask on this forum, before I go off to
>> > see what my rather rudimentary J skills can come up with.
>> >
>> > Skip
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Vector Similarity

Reply via email to