2012/1/23 Mathieu Blondel <[email protected]>:
> On Tue, Jan 24, 2012 at 12:15 AM, Olivier Grisel
> <[email protected]> wrote:
>
>> LSH is just using a binary thresholded random projections in 32 (or 64
>> or 128...) dim space. That leads to 32bit (or 64bit...) vectors
>> castable as integers and doing Hamming radius queries instead of
>> Euclidean queries in that boolean space.
>
> So this is only one instantiation of LSH, right? I thought that LSH is
> a family of algorithms and that there exits different algorithms to
> support different metrics (cosine similarity, hamming distance,
> jaccard index, ...).

My understanding it that the H in LSH was all about using compact
binary codes to lookup similar codes through collisions or using the
Hamming distance to extend the collision radius usually with a max
distance of 2 in practice.

People might have other definitions though, but there are not
reflected in the Wikipedia article where the collisions are part of
the 2 definitions. With Jaccard index and cosine similarity you won't
get the collisions unless you apply some kind of binary thresholding
on the output.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to