Here is code to create a difference basis and also code to do LSH encoding.
# creates an LSH basis set by picking pairs of vectors at random
# note that a significant amount of the time this will pick the same
# vector both times and the result will be zero. This would be easy
# to fix, but I am lazy so I just live with it.
lshBasis = function(vectors, m) {
n = dim(vectors)[2]
v1 = vectors[ceiling(runif(m) * n), ]
v2 = vectors[ceiling(runif(m) * n), ]
return(v1-v2)
}
# encode a data point into a binary vector based on a given
# LSH basis
lshCode = function(basis, v) {
if (!is.null(dim(v))) {
# handle a matrix full of row vectors
return(basis %*% t(v) > 0)
} else {
# handle a single vector
return(basis %*% v > 0)
}
}
On Mon, May 16, 2011 at 4:08 PM, Lance Norskog <[email protected]> wrote:
> Please send or post your R code.
>
> On Sun, May 15, 2011 at 10:35 PM, Ted Dunning <[email protected]>
> wrote:
> > I think I was the source of this expectation.
> >
> > And I also think I was wrong.
> >
> > I just did some experiments myself in R and random cut vectors seem to
> work
> > about as well as non-random ones for positive orthant vectors. For oddly
> > distributed vectors, it still might be good to use difference vectors as
> a
> > basis for LSH, but I am much less convinced than before.
> >
> > On Sun, May 15, 2011 at 8:54 PM, Lance Norskog <[email protected]>
> wrote:
> >
> >> Test data: 1000 random vectors as samples. All values 0->1, linear
> >> distribution.
> >> This test data gives no negative cosine distances, and so all bits are
> >> 0. This is expected (from previous mail threads).
> >>
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>