Re: [Jprogramming] Vector Similarity

Skip Cave Wed, 28 Feb 2018 06:26:26 -0800

Bo,

<https://www.academia.edu/>.edu
Advanced Search found 9,227papers containing “ORDINAL FRACTIONS”
Search within the full text of 20 million papers



Skip Cave
Cave Consulting LLC

On Tue, Feb 20, 2018 at 4:20 PM, 'Bo Jacoby' via Programming <
[email protected]> wrote:

> ORDINAL FRACTIONS - the algebra of data
>
> |
> |
> |
> |   |    |
>
>    |
>
>   |
> |
> |    |
> ORDINAL FRACTIONS - the algebra of data
>  This paper was submitted to the 10th World Computer Congress, IFIP 1986
> conference, but rejected by the referee....  |   |
>
>   |
>
>   |
>
>
>
>
>     Den 22:42 tirsdag den 20. februar 2018 skrev Skip Cave <
> [email protected]>:
>
>
>  Very nice! Thanks Raul.
>
> However, there is something wrong about the cosine similarity,
> which should always be between 0 & 1
>
> prod=:+/ .*
>
> 1 1 1 (prod % %:@*&prod) 0 3 3
>
> 1.41421
>
> Skip
>
> On Tue, Feb 20, 2018 at 2:27 PM, Raul Miller <[email protected]>
> wrote:
>
> > I don't know about blog entries - I think there are probably some that
> > partially cover this topic.
> >
> > But it shouldn't be hard to implement most of these operations:
> >
> > Euclidean distance:
> >
> >    1 0 0 +/&.:*:@:- 0 1 0
> > 1.41421
> >
> > Manhattan distance:
> >
> >    1 0 0 +/@:|@:- 0 1 0
> > 2
> >
> > Minkowski distances:
> >
> >    minkowski=: 1 :'m %: [:+/ m ^~ [:| -'
> >    1 0 0 (1 minkowski) 0 1 0
> > 2
> >    1 0 0 (2 minkowski) 0 1 0
> > 1.41421
> >
> > Cosine similarity:
> >
> >    prod=:+/ .*
> >    1 0 0 (prod % %:@*&prod) 0 1 0
> > 0
> >
> > Jacard Similarity:
> >
> >    union=: ~.@,
> >    intersect=: [ ~.@:-. -.
> >    1 0 0 (intersect %&# union) 0 1 0
> > 1
> >
> > You'll probably want to use these at rank 1 ("1) if you're operating
> > on collections of vectors.
> >
> > But, I'm a little dubious about the usefulness of Jacard Similarity,
> > because of the assumptions it brings to bear (you're basically
> > encoding sets as vectors, which means your multidimensional vector
> > space is just a way of encoding a single unordered dimension).
> >
> > Anyways, I hope this helps,
> >
> > --
> > Raul
> >
> >
> >
> > On Tue, Feb 20, 2018 at 2:08 PM, Skip Cave <[email protected]>
> > wrote:
> > > One of the hottest topics in data science today is the representation
> of
> > > data characteristics using large multi-dimensional arrays. Each datum
> is
> > > represented as a data point or multi-element vector in an array that
> can
> > > have hundreds of dimensions. In these arrays, each dimension
> represents a
> > > different attribute of the data.
> > >
> > > Much useful information can be gleaned by examining the similarity, or
> > > distance between vectors in the array. However, there are many
> different
> > > ways to measure the similarity of two or more vectors in a
> > multidimensional
> > > space.
> > >
> > > Some common similarity/distance measures:
> > >
> > > 1. Euclidean distance <https://en.wikipedia.org/
> wiki/Euclidean_distance
> > >:
> > > The length of the line between two data points
> > >
> > > 2. Manhattan distance <https://en.wikipedia.org/wiki/Taxicab_geometry
> >:
> > Also
> > > known as Manhattan length, rectilinear distance, L1 distance or L1
> norm,
> > > city block distance, Minkowski’s L1 distance, taxi-cab metric, or city
> > > block distance.
> > >
> > > 3. Minkowski distance: <https://en.wikipedia.org/
> wiki/Minkowski_distance>
> > a
> > > generalized metric form of Euclidean distance and Manhattan distance.
> > >
> > > 4. Cosine similarity: <https://en.wikipedia.org/wiki/Cosine_similarity
> >
> > The
> > > cosine of the angle between two vectors. The cosine will be between 0
> &1,
> > > where 1 is alike, and 0 is not alike.
> > >
> > > 5
> > > <https://i2.wp.com/dataaspirant.com/wp-content/
> > uploads/2015/04/minkowski.png>.
> > > Jacard Similarity: <https://en.wikipedia.org/wiki/Jaccard_index> The
> > > cardinality of
> > > the intersection of sets divided by the cardinality of the union of the
> > > sample sets.
> > >
> > > Each of these metrics is useful in specific data analysis situations.
> > >
> > > In many cases, one also wants to know the similarity between clusters
> of
> > > points, or a point and a cluster of points. In these cases, the
> centroid
> > of
> > > a set of points is also a useful metric to have, which can then be used
> > > with the various distance/similarity measurements.
> > >
> > > Is there any essay or blog covering these common metrics using the J
> > > language? I would seem that J is perfectly suited for calculating these
> > > metrics, but I haven't been able to find anything much on this topic on
> > the
> > > J software site. I thought I would ask on this forum, before I go off
> to
> > > see what my rather rudimentary J skills can come up with.
> > >
> > > Skip
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Vector Similarity

Reply via email to