Oops, sorry about that. Here's a fixed implementation:
1 1 1 (prod % %:@*&(prod~)) 0 3 3 0.816497 Thanks, -- Raul On Tue, Feb 20, 2018 at 4:41 PM, Skip Cave <[email protected]> wrote: > Very nice! Thanks Raul. > > However, there is something wrong about the cosine similarity, > which should always be between 0 & 1 > > prod=:+/ .* > > 1 1 1 (prod % %:@*&prod) 0 3 3 > > 1.41421 > > Skip > > On Tue, Feb 20, 2018 at 2:27 PM, Raul Miller <[email protected]> wrote: > >> I don't know about blog entries - I think there are probably some that >> partially cover this topic. >> >> But it shouldn't be hard to implement most of these operations: >> >> Euclidean distance: >> >> 1 0 0 +/&.:*:@:- 0 1 0 >> 1.41421 >> >> Manhattan distance: >> >> 1 0 0 +/@:|@:- 0 1 0 >> 2 >> >> Minkowski distances: >> >> minkowski=: 1 :'m %: [:+/ m ^~ [:| -' >> 1 0 0 (1 minkowski) 0 1 0 >> 2 >> 1 0 0 (2 minkowski) 0 1 0 >> 1.41421 >> >> Cosine similarity: >> >> prod=:+/ .* >> 1 0 0 (prod % %:@*&prod) 0 1 0 >> 0 >> >> Jacard Similarity: >> >> union=: ~.@, >> intersect=: [ ~.@:-. -. >> 1 0 0 (intersect %&# union) 0 1 0 >> 1 >> >> You'll probably want to use these at rank 1 ("1) if you're operating >> on collections of vectors. >> >> But, I'm a little dubious about the usefulness of Jacard Similarity, >> because of the assumptions it brings to bear (you're basically >> encoding sets as vectors, which means your multidimensional vector >> space is just a way of encoding a single unordered dimension). >> >> Anyways, I hope this helps, >> >> -- >> Raul >> >> >> >> On Tue, Feb 20, 2018 at 2:08 PM, Skip Cave <[email protected]> >> wrote: >> > One of the hottest topics in data science today is the representation of >> > data characteristics using large multi-dimensional arrays. Each datum is >> > represented as a data point or multi-element vector in an array that can >> > have hundreds of dimensions. In these arrays, each dimension represents a >> > different attribute of the data. >> > >> > Much useful information can be gleaned by examining the similarity, or >> > distance between vectors in the array. However, there are many different >> > ways to measure the similarity of two or more vectors in a >> multidimensional >> > space. >> > >> > Some common similarity/distance measures: >> > >> > 1. Euclidean distance <https://en.wikipedia.org/wiki/Euclidean_distance >> >: >> > The length of the line between two data points >> > >> > 2. Manhattan distance <https://en.wikipedia.org/wiki/Taxicab_geometry>: >> Also >> > known as Manhattan length, rectilinear distance, L1 distance or L1 norm, >> > city block distance, Minkowski’s L1 distance, taxi-cab metric, or city >> > block distance. >> > >> > 3. Minkowski distance: <https://en.wikipedia.org/wiki/Minkowski_distance> >> a >> > generalized metric form of Euclidean distance and Manhattan distance. >> > >> > 4. Cosine similarity: <https://en.wikipedia.org/wiki/Cosine_similarity> >> The >> > cosine of the angle between two vectors. The cosine will be between 0 &1, >> > where 1 is alike, and 0 is not alike. >> > >> > 5 >> > <https://i2.wp.com/dataaspirant.com/wp-content/ >> uploads/2015/04/minkowski.png>. >> > Jacard Similarity: <https://en.wikipedia.org/wiki/Jaccard_index> The >> > cardinality of >> > the intersection of sets divided by the cardinality of the union of the >> > sample sets. >> > >> > Each of these metrics is useful in specific data analysis situations. >> > >> > In many cases, one also wants to know the similarity between clusters of >> > points, or a point and a cluster of points. In these cases, the centroid >> of >> > a set of points is also a useful metric to have, which can then be used >> > with the various distance/similarity measurements. >> > >> > Is there any essay or blog covering these common metrics using the J >> > language? I would seem that J is perfectly suited for calculating these >> > metrics, but I haven't been able to find anything much on this topic on >> the >> > J software site. I thought I would ask on this forum, before I go off to >> > see what my rather rudimentary J skills can come up with. >> > >> > Skip >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
