Bo, <https://www.academia.edu/>.edu Advanced Search found 9,227papers containing “ORDINAL FRACTIONS” Search within the full text of 20 million papers
Skip Cave Cave Consulting LLC On Tue, Feb 20, 2018 at 4:20 PM, 'Bo Jacoby' via Programming < [email protected]> wrote: > ORDINAL FRACTIONS - the algebra of data > > | > | > | > | | | > > | > > | > | > | | > ORDINAL FRACTIONS - the algebra of data > This paper was submitted to the 10th World Computer Congress, IFIP 1986 > conference, but rejected by the referee.... | | > > | > > | > > > > > Den 22:42 tirsdag den 20. februar 2018 skrev Skip Cave < > [email protected]>: > > > Very nice! Thanks Raul. > > However, there is something wrong about the cosine similarity, > which should always be between 0 & 1 > > prod=:+/ .* > > 1 1 1 (prod % %:@*&prod) 0 3 3 > > 1.41421 > > Skip > > On Tue, Feb 20, 2018 at 2:27 PM, Raul Miller <[email protected]> > wrote: > > > I don't know about blog entries - I think there are probably some that > > partially cover this topic. > > > > But it shouldn't be hard to implement most of these operations: > > > > Euclidean distance: > > > > 1 0 0 +/&.:*:@:- 0 1 0 > > 1.41421 > > > > Manhattan distance: > > > > 1 0 0 +/@:|@:- 0 1 0 > > 2 > > > > Minkowski distances: > > > > minkowski=: 1 :'m %: [:+/ m ^~ [:| -' > > 1 0 0 (1 minkowski) 0 1 0 > > 2 > > 1 0 0 (2 minkowski) 0 1 0 > > 1.41421 > > > > Cosine similarity: > > > > prod=:+/ .* > > 1 0 0 (prod % %:@*&prod) 0 1 0 > > 0 > > > > Jacard Similarity: > > > > union=: ~.@, > > intersect=: [ ~.@:-. -. > > 1 0 0 (intersect %&# union) 0 1 0 > > 1 > > > > You'll probably want to use these at rank 1 ("1) if you're operating > > on collections of vectors. > > > > But, I'm a little dubious about the usefulness of Jacard Similarity, > > because of the assumptions it brings to bear (you're basically > > encoding sets as vectors, which means your multidimensional vector > > space is just a way of encoding a single unordered dimension). > > > > Anyways, I hope this helps, > > > > -- > > Raul > > > > > > > > On Tue, Feb 20, 2018 at 2:08 PM, Skip Cave <[email protected]> > > wrote: > > > One of the hottest topics in data science today is the representation > of > > > data characteristics using large multi-dimensional arrays. Each datum > is > > > represented as a data point or multi-element vector in an array that > can > > > have hundreds of dimensions. In these arrays, each dimension > represents a > > > different attribute of the data. > > > > > > Much useful information can be gleaned by examining the similarity, or > > > distance between vectors in the array. However, there are many > different > > > ways to measure the similarity of two or more vectors in a > > multidimensional > > > space. > > > > > > Some common similarity/distance measures: > > > > > > 1. Euclidean distance <https://en.wikipedia.org/ > wiki/Euclidean_distance > > >: > > > The length of the line between two data points > > > > > > 2. Manhattan distance <https://en.wikipedia.org/wiki/Taxicab_geometry > >: > > Also > > > known as Manhattan length, rectilinear distance, L1 distance or L1 > norm, > > > city block distance, Minkowski’s L1 distance, taxi-cab metric, or city > > > block distance. > > > > > > 3. Minkowski distance: <https://en.wikipedia.org/ > wiki/Minkowski_distance> > > a > > > generalized metric form of Euclidean distance and Manhattan distance. > > > > > > 4. Cosine similarity: <https://en.wikipedia.org/wiki/Cosine_similarity > > > > The > > > cosine of the angle between two vectors. The cosine will be between 0 > &1, > > > where 1 is alike, and 0 is not alike. > > > > > > 5 > > > <https://i2.wp.com/dataaspirant.com/wp-content/ > > uploads/2015/04/minkowski.png>. > > > Jacard Similarity: <https://en.wikipedia.org/wiki/Jaccard_index> The > > > cardinality of > > > the intersection of sets divided by the cardinality of the union of the > > > sample sets. > > > > > > Each of these metrics is useful in specific data analysis situations. > > > > > > In many cases, one also wants to know the similarity between clusters > of > > > points, or a point and a cluster of points. In these cases, the > centroid > > of > > > a set of points is also a useful metric to have, which can then be used > > > with the various distance/similarity measurements. > > > > > > Is there any essay or blog covering these common metrics using the J > > > language? I would seem that J is perfectly suited for calculating these > > > metrics, but I haven't been able to find anything much on this topic on > > the > > > J software site. I thought I would ask on this forum, before I go off > to > > > see what my rather rudimentary J skills can come up with. > > > > > > Skip > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
