Re: Cosine distances to Random Vector basis

Randall McRee Mon, 25 Apr 2011 22:22:37 -0700

Charikar is the definitive reference for this method. See

[1]     Charikar, M., Similarity estimation techniques from
rounding*.* In *Proceedings
of the Symposium on Theory of Computing*, 2002.

I also created a simple LSH NN method based on this idea (refined, I think)
which you can find here: Mcree, Symbolic Regression using Nearest Neighbor
Indexing<http://portal.acm.org/citation.cfm?id=1830841&dl=ACM&coll=DL&CFID=17946055&CFTOKEN=18630366>Access
to the paper requires ACM membership--let me know if you want me to
send you a copy.

The basic idea I introduced was to ignore random projections that do not
fall into the TopN (funny how these ideas repeat in different contexts!).
Random projections that are not in the TopN act like noise, and are not
useful IMHO.

Randy McRee

On Sun, Apr 24, 2011 at 9:41 PM, Ted Dunning <[email protected]> wrote:

> Sounds like a variant of LSH to me.
>
> See Wikipedia article on LSH with random projections.
>
> On Sun, Apr 24, 2011 at 8:56 PM, Lance Norskog <[email protected]> wrote:
>
> > I just found this vector distance idea in a technical paper:
> >
> > Create a space defined by  X random vectors. For you data vectors,
> > take the cosine distance to each random vector and save the sign of
> > the value as a bit. This gives a bit set of X bits.
> > There could be another distance and algorithm for picking the bit value.
> >
> > The effect is to cease using numerical vectors as a "carrier signal"
> > for the concept of "positions and distances". This is a different,
> > more focused representation. And, Hamming distance is somewhat faster
> > than Euclidean :) Of course, picking enough bits is a problem.
> >
> > However, I lost the paper. Does this ring a bell with anyone?
> >
> > --
> > Lance Norskog
> > [email protected]
> >
>

Re: Cosine distances to Random Vector basis

Reply via email to