Hi Daniel,
These alternative strategies of LSH are a good way to start using the
existing implementations of Scikit-learn. But why do you say the standard
algorithms are impractical?
Maheshakya
On Wed, Mar 5, 2014 at 1:31 AM, Daniel Vainsencher <
[email protected]> wrote:
> Hi scikit-learn, Maheshakya,
>
> In my experience implementing and using LSH, the vanilla algorithms are
> very impractical. Two extensions that help a lot (and combine nicely) are:
> - Multi-probe LSH [1], where hash values neighboring the target's hash are
> also used.
> - LSH Forest [2], where fixed length hashes are replaced by length bounded
> hashes; the hash used is just long enough to get sufficiently few
> candidates for the target's nearest neighbor.
>
> Together they make it feasible to limit the number of independent copies
> of the index to a reasonable number. Incidentally, the LSH Forest approach
> may be implemented using binary trees.
>
> Daniel Vainsencher
>
> [1] www.cs.princeton.edu/cass/papers/mp*lsh*_vldb07.pdf
> [2] www2005.org/docs/p651.pdf
>
>
> On 03/04/2014 06:09 PM, Maheshakya Wijewardena wrote:
>
> Hi,
>
> Currently, unsupervised neighbor search is implemented in
> sklearn.neighbors. There are three algorithms used to perform neighbor
> search.
>
> - kd tree
> - ball tree
> - brute force method
>
> kd tree and ball tree are implemented separately as cython
> implementations. Both of them use binary tree(binary_tree.pxi). In the
> NeighborBass class, above implementations are used.
>
> As Locality Sensitivity Hashing will also be used in approximating
> nearest neighbors, its' implementation should also be a part of
> sklearn.neighbors. But Locality sensitivity hashing algorithms do not
> involve binary tree implementation. Therefore it has to be implemented
> separately and be used in sklearn.neighbors as another algorithm.Is this
> approach correct?
>
> I'm trying to implement a base LSH class and the hashing algorithms as
> cython implementations. To approximate neighbor search using those
> algorithms, multiple hash tables will be created with those hash functions
> so, for that an efficient way to store is required.
>
> These are some of the issues I found disturbing while I was trying to
> implement LSH to approximate neighbor search. Is this an appropriate
> approach to implement this or are there any other better ways?
>
>
>
> Regards,
> Maheshakya,
> --
> Undergraduate,
> Department of Computer Science and Engineering,
> Faculty of Engineering.
> University of Moratuwa,
> Sri Lanka
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries. Built-in WAN optimization and the
> freedom to use Git, Perforce or both. Make the move to
> Perforce.http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Scikit-learn-general mailing
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to
> Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries. Built-in WAN optimization and
> the
> freedom to use Git, Perforce or both. Make the move to Perforce.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general