ORDINAL FRACTIONS - the algebra of data
|
|
|
| | |
|
|
|
| |
ORDINAL FRACTIONS - the algebra of data
This paper was submitted to the 10th World Computer Congress, IFIP 1986
conference, but rejected by the referee.... | |
|
|
Den 22:42 tirsdag den 20. februar 2018 skrev Skip Cave
<[email protected]>:
Very nice! Thanks Raul.
However, there is something wrong about the cosine similarity,
which should always be between 0 & 1
prod=:+/ .*
1 1 1 (prod % %:@*&prod) 0 3 3
1.41421
Skip
On Tue, Feb 20, 2018 at 2:27 PM, Raul Miller <[email protected]> wrote:
> I don't know about blog entries - I think there are probably some that
> partially cover this topic.
>
> But it shouldn't be hard to implement most of these operations:
>
> Euclidean distance:
>
> 1 0 0 +/&.:*:@:- 0 1 0
> 1.41421
>
> Manhattan distance:
>
> 1 0 0 +/@:|@:- 0 1 0
> 2
>
> Minkowski distances:
>
> minkowski=: 1 :'m %: [:+/ m ^~ [:| -'
> 1 0 0 (1 minkowski) 0 1 0
> 2
> 1 0 0 (2 minkowski) 0 1 0
> 1.41421
>
> Cosine similarity:
>
> prod=:+/ .*
> 1 0 0 (prod % %:@*&prod) 0 1 0
> 0
>
> Jacard Similarity:
>
> union=: ~.@,
> intersect=: [ ~.@:-. -.
> 1 0 0 (intersect %&# union) 0 1 0
> 1
>
> You'll probably want to use these at rank 1 ("1) if you're operating
> on collections of vectors.
>
> But, I'm a little dubious about the usefulness of Jacard Similarity,
> because of the assumptions it brings to bear (you're basically
> encoding sets as vectors, which means your multidimensional vector
> space is just a way of encoding a single unordered dimension).
>
> Anyways, I hope this helps,
>
> --
> Raul
>
>
>
> On Tue, Feb 20, 2018 at 2:08 PM, Skip Cave <[email protected]>
> wrote:
> > One of the hottest topics in data science today is the representation of
> > data characteristics using large multi-dimensional arrays. Each datum is
> > represented as a data point or multi-element vector in an array that can
> > have hundreds of dimensions. In these arrays, each dimension represents a
> > different attribute of the data.
> >
> > Much useful information can be gleaned by examining the similarity, or
> > distance between vectors in the array. However, there are many different
> > ways to measure the similarity of two or more vectors in a
> multidimensional
> > space.
> >
> > Some common similarity/distance measures:
> >
> > 1. Euclidean distance <https://en.wikipedia.org/wiki/Euclidean_distance
> >:
> > The length of the line between two data points
> >
> > 2. Manhattan distance <https://en.wikipedia.org/wiki/Taxicab_geometry>:
> Also
> > known as Manhattan length, rectilinear distance, L1 distance or L1 norm,
> > city block distance, Minkowski’s L1 distance, taxi-cab metric, or city
> > block distance.
> >
> > 3. Minkowski distance: <https://en.wikipedia.org/wiki/Minkowski_distance>
> a
> > generalized metric form of Euclidean distance and Manhattan distance.
> >
> > 4. Cosine similarity: <https://en.wikipedia.org/wiki/Cosine_similarity>
> The
> > cosine of the angle between two vectors. The cosine will be between 0 &1,
> > where 1 is alike, and 0 is not alike.
> >
> > 5
> > <https://i2.wp.com/dataaspirant.com/wp-content/
> uploads/2015/04/minkowski.png>.
> > Jacard Similarity: <https://en.wikipedia.org/wiki/Jaccard_index> The
> > cardinality of
> > the intersection of sets divided by the cardinality of the union of the
> > sample sets.
> >
> > Each of these metrics is useful in specific data analysis situations.
> >
> > In many cases, one also wants to know the similarity between clusters of
> > points, or a point and a cluster of points. In these cases, the centroid
> of
> > a set of points is also a useful metric to have, which can then be used
> > with the various distance/similarity measurements.
> >
> > Is there any essay or blog covering these common metrics using the J
> > language? I would seem that J is perfectly suited for calculating these
> > metrics, but I haven't been able to find anything much on this topic on
> the
> > J software site. I thought I would ask on this forum, before I go off to
> > see what my rather rudimentary J skills can come up with.
> >
> > Skip
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm