One of the hottest topics in data science today is the representation of data characteristics using large multi-dimensional arrays. Each datum is represented as a data point or multi-element vector in an array that can have hundreds of dimensions. In these arrays, each dimension represents a different attribute of the data.
Much useful information can be gleaned by examining the similarity, or distance between vectors in the array. However, there are many different ways to measure the similarity of two or more vectors in a multidimensional space. Some common similarity/distance measures: 1. Euclidean distance <https://en.wikipedia.org/wiki/Euclidean_distance>: The length of the line between two data points 2. Manhattan distance <https://en.wikipedia.org/wiki/Taxicab_geometry>: Also known as Manhattan length, rectilinear distance, L1 distance or L1 norm, city block distance, Minkowski’s L1 distance, taxi-cab metric, or city block distance. 3. Minkowski distance: <https://en.wikipedia.org/wiki/Minkowski_distance> a generalized metric form of Euclidean distance and Manhattan distance. 4. Cosine similarity: <https://en.wikipedia.org/wiki/Cosine_similarity> The cosine of the angle between two vectors. The cosine will be between 0 &1, where 1 is alike, and 0 is not alike. 5 <https://i2.wp.com/dataaspirant.com/wp-content/uploads/2015/04/minkowski.png>. Jacard Similarity: <https://en.wikipedia.org/wiki/Jaccard_index> The cardinality of the intersection of sets divided by the cardinality of the union of the sample sets. Each of these metrics is useful in specific data analysis situations. In many cases, one also wants to know the similarity between clusters of points, or a point and a cluster of points. In these cases, the centroid of a set of points is also a useful metric to have, which can then be used with the various distance/similarity measurements. Is there any essay or blog covering these common metrics using the J language? I would seem that J is perfectly suited for calculating these metrics, but I haven't been able to find anything much on this topic on the J software site. I thought I would ask on this forum, before I go off to see what my rather rudimentary J skills can come up with. Skip ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
