2012/1/23 Mathieu Blondel <[email protected]>: > On Tue, Jan 24, 2012 at 12:15 AM, Olivier Grisel > <[email protected]> wrote: > >> LSH is just using a binary thresholded random projections in 32 (or 64 >> or 128...) dim space. That leads to 32bit (or 64bit...) vectors >> castable as integers and doing Hamming radius queries instead of >> Euclidean queries in that boolean space. > > So this is only one instantiation of LSH, right? I thought that LSH is > a family of algorithms and that there exits different algorithms to > support different metrics (cosine similarity, hamming distance, > jaccard index, ...).
My understanding it that the H in LSH was all about using compact binary codes to lookup similar codes through collisions or using the Hamming distance to extend the collision radius usually with a max distance of 2 in practice. People might have other definitions though, but there are not reflected in the Wikipedia article where the collisions are part of the 2 definitions. With Jaccard index and cosine similarity you won't get the collisions unless you apply some kind of binary thresholding on the output. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
