One of the hottest topics in data science today is the representation of
data characteristics using large multi-dimensional arrays. Each datum is
represented as a data point or multi-element vector in an array that can
have hundreds of dimensions. In these arrays, each dimension represents a
different attribute of the data.

Much useful information can be gleaned by examining the similarity, or
distance between vectors in the array. However, there are many different
ways to measure the similarity of two or more vectors in a multidimensional
space.

Some common similarity/distance measures:

1. Euclidean distance <https://en.wikipedia.org/wiki/Euclidean_distance>:
The length of the line between two data points

2. Manhattan distance <https://en.wikipedia.org/wiki/Taxicab_geometry>: Also
known as Manhattan length, rectilinear distance, L1 distance or L1 norm,
city block distance, Minkowski’s L1 distance, taxi-cab metric, or city
block distance.

3. Minkowski distance: <https://en.wikipedia.org/wiki/Minkowski_distance> a
generalized metric form of Euclidean distance and Manhattan distance.

4. Cosine similarity: <https://en.wikipedia.org/wiki/Cosine_similarity> The
cosine of the angle between two vectors. The cosine will be between 0 &1,
where 1 is alike, and 0 is not alike.

5
<https://i2.wp.com/dataaspirant.com/wp-content/uploads/2015/04/minkowski.png>.
Jacard Similarity: <https://en.wikipedia.org/wiki/Jaccard_index> The
cardinality of
the intersection of sets divided by the cardinality of the union of the
sample sets.

Each of these metrics is useful in specific data analysis situations.

In many cases, one also wants to know the similarity between clusters of
points, or a point and a cluster of points. In these cases, the centroid of
a set of points is also a useful metric to have, which can then be used
with the various distance/similarity measurements.

Is there any essay or blog covering these common metrics using the J
language? I would seem that J is perfectly suited for calculating these
metrics, but I haven't been able to find anything much on this topic on the
J software site. I thought I would ask on this forum, before I go off to
see what my rather rudimentary J skills can come up with.

Skip
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to