This is the starting point of the way I've always seen people do Locality Sensitive Hashing with floating point vectors. Once you have these bit vectors, you can do minhash stuff on them to complete LSH.
On Sun, Apr 24, 2011 at 8:56 PM, Lance Norskog <[email protected]> wrote: > I just found this vector distance idea in a technical paper: > > Create a space defined by X random vectors. For you data vectors, > take the cosine distance to each random vector and save the sign of > the value as a bit. This gives a bit set of X bits. > There could be another distance and algorithm for picking the bit value. > > The effect is to cease using numerical vectors as a "carrier signal" > for the concept of "positions and distances". This is a different, > more focused representation. And, Hamming distance is somewhat faster > than Euclidean :) Of course, picking enough bits is a problem. > > However, I lost the paper. Does this ring a bell with anyone? > > -- > Lance Norskog > [email protected] >
