Here is code to create a difference basis and also code to do LSH encoding.


# creates an LSH basis set by picking pairs of vectors at random
# note that a significant amount of the time this will pick the same
# vector both times and the result will be zero.  This would be easy
# to fix, but I am lazy so I just live with it.
lshBasis = function(vectors, m) {
  n = dim(vectors)[2]
  v1 = vectors[ceiling(runif(m) * n), ]
  v2 = vectors[ceiling(runif(m) * n), ]
  return(v1-v2)
}

# encode a data point into a binary vector based on a given
# LSH basis
lshCode = function(basis, v) {
  if (!is.null(dim(v))) {
    # handle a matrix full of row vectors
    return(basis %*% t(v) > 0)
  } else {
    # handle a single vector
    return(basis %*% v > 0)
  }
}


On Mon, May 16, 2011 at 4:08 PM, Lance Norskog <[email protected]> wrote:

> Please send or post your R code.
>
> On Sun, May 15, 2011 at 10:35 PM, Ted Dunning <[email protected]>
> wrote:
> > I think I was the source of this expectation.
> >
> > And I also think I was wrong.
> >
> > I just did some experiments myself in R and random cut vectors seem to
> work
> > about as well as non-random ones for positive orthant vectors.  For oddly
> > distributed vectors, it still might be good to use difference vectors as
> a
> > basis for LSH, but I am much less convinced than before.
> >
> > On Sun, May 15, 2011 at 8:54 PM, Lance Norskog <[email protected]>
> wrote:
> >
> >> Test data: 1000 random vectors as samples. All values 0->1, linear
> >> distribution.
> >> This test data gives no negative cosine distances, and so all bits are
> >> 0. This is expected (from previous mail threads).
> >>
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to