> > I would also rather focus on non-binary representations.
Even when using Random Projection method for hashing, only sign of the result of dot product is considered. So that, in that situation also, there will be a binary representation( or +1s and -1s). What is your idea about this method? Nearest neighbor search has been implemented in Scikit-learn in sklearn.neighbors. In unsupervised.py, NeighborsBase class is used and NeighborsBase (in base.py) contains following methods to perform the search. - brute - a brute force linear search - kd_tree - KD tree search - ball_tree - binary tree search So we can add LSH based search as another algorithm type in NearestNeighbors. In order to perform neighbor search using LSH, those hashing methods should be implemented separately(In another file). There will be multiple hash tables built by concatenating hash functions. Here, I notice an issue. As we generated a significantly large number of hash tables, there must be a way to store them efficiently. Is there a way to do this in the Scikit-learn way? This part will also have to be implemented outside the NeighborBase class. The logic for performing the search using computed computed hash tables should be included in the NeighborBase. This is my basic opinion on how to implement LSH based neighbor search in Scikit-learn. Your feedback and suggestions for improvements are welcome. [?] Regards, Maheshakya. On Thu, Feb 27, 2014 at 12:28 AM, Andy <t3k...@gmail.com> wrote: > On 02/26/2014 10:13 AM, Maheshakya Wijewardena wrote: > > The method "Bit sampling for Hamming distance" is already included in > "brute" algorithm as the metric "hamming" in Nearest neighbor search. > Hence, I think that does not need to be implemented as a LSH algorithm > > I would also rather focus on non-binary representations. > There is no efficient way to work with binary data in numpy afaik -- at > least none that is supported in sklearn. > > I'm very interested in this project but unfortunately I don't have the > time to mentor. > > Cheers, > Andy > > > On Wed, Feb 26, 2014 at 12:46 AM, Maheshakya Wijewardena < > pmaheshak...@gmail.com> wrote: > >> Approximating Nearest neighbor search is one of the application of >> locality sensitive hashing.There are five major methods. >> >> - Bit sampling for Hamming distance >> - Min-wise independent permutations >> - Nilsimsa Hash >> - Random projection >> - Stable distributions >> >> Bit sampling method is fairly straight forward. A reference for the >> implementation of Random projection method can be taken from *lshash >> <https://pypi.python.org/pypi/lshash>* library. >> I'm looking forward to see comments for this from prospective mentors >> of this project. >> >> Thank you. >> Maheshakya. >> >> >> >> On Tue, Feb 25, 2014 at 8:24 AM, Maheshakya Wijewardena < >> pmaheshak...@gmail.com> wrote: >> >>> Hi, >>> I have looked into this project idea. I have studied this method and I >>> like to discuss further on this. >>> I would like to know who the mentors for this project are and to get >>> some insight on how to begin. >>> >>> Regards, >>> Maheshakya, >>> -- >>> Undergraduate, >>> Department of Computer Science and Engineering, >>> Faculty of Engineering. >>> University of Moratuwa, >>> Sri Lanka >>> >> >> >> >> -- >> Undergraduate, >> Department of Computer Science and Engineering, >> Faculty of Engineering. >> University of Moratuwa, >> Sri Lanka >> > > > > -- > Undergraduate, > Department of Computer Science and Engineering, > Faculty of Engineering. > University of Moratuwa, > Sri Lanka > > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one > tool.http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Scikit-learn-general mailing > listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- Undergraduate, Department of Computer Science and Engineering, Faculty of Engineering. University of Moratuwa, Sri Lanka
<<330.png>>
------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general