You can do LSH on real-valued vectors - the 1's and 0's are just the
+/- signs of projections onto randomly chosen hyperplanes.

Ullman's book is a great reference for this, and also goes over how
to do all the parameter choosing.

On Wed, Apr 13, 2011 at 12:43 AM, ke xie <[email protected]> wrote:

> Ok, I would try to implement a none-distributed one. Actually I have a
> python version now.
>
> But I have a problem. When doing min-hash, the matrix should be either 1 or
> 0, and then do the hash functions. Then how about rating data? If the
> matrix
> is filled with 1~5 numbers, should we convert them use some treshould and
> convert the rating to 1 if the rating is more than the treshould?
>
> This is the reference I read about LSH. check it out (chapter 3)
> http://infolab.stanford.edu/~ullman/mmds.html
>
> On Wed, Apr 13, 2011 at 3:25 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Sure.
> >
> > LSH is a fine candidate for parallelism and scaling.
> >
> > I would recommend starting small and testing as you go rather than
> leaping
> > into a parallelized full-fledged implementation.  Look for other
> open-source
> > implementaions of LSH algorithms.
> >
> > Be warned that the parameter selection for LSH can be pretty tricky (so I
> > hear, anyway).  You should pick a reasonable and realistic test problem
> so
> > that you can experiment with that.
> >
> >
> > On Wed, Apr 13, 2011 at 12:19 AM, ke xie <[email protected]> wrote:
> >
> >> Can we implement one and contribute into the mahout project? Any
> >> suggestions?
> >>
> >
> >
>
>
> --
> Name: Ke Xie   Eddy
> Research Group of Information Retrieval
> State Key Laboratory of Intelligent Technology and Systems
> Tsinghua University
>

Reply via email to