>
> I would also rather focus on non-binary representations.

Even when using Random Projection method for hashing, only sign of the
result of dot product is considered. So that, in that situation also, there
will be a binary representation( or +1s and -1s). What is your idea about
this method?

Nearest neighbor search has been implemented in Scikit-learn in
sklearn.neighbors. In unsupervised.py, NeighborsBase class is used
and NeighborsBase (in base.py) contains following methods to perform the
search.

   - brute  - a brute force linear search
   - kd_tree - KD tree search
   - ball_tree - binary tree search

So we can add LSH based search as another algorithm type in
NearestNeighbors.

In order to perform neighbor search using LSH, those hashing methods should
be implemented separately(In another file). There will be multiple hash
tables built by concatenating hash functions. Here, I notice an issue. As
we generated a significantly large number of hash tables, there must be a
way to store them efficiently. Is there a way to do this in the
Scikit-learn way? This part will also have to be implemented outside the
NeighborBase class.
The logic for performing the search using computed computed hash tables
should be included in the NeighborBase.

This is my basic opinion on how to implement LSH based neighbor search in
Scikit-learn. Your feedback and suggestions for improvements are welcome. [?]

Regards,
Maheshakya.


On Thu, Feb 27, 2014 at 12:28 AM, Andy <t3k...@gmail.com> wrote:

>  On 02/26/2014 10:13 AM, Maheshakya Wijewardena wrote:
>
> The method "Bit sampling for Hamming distance" is already included in
> "brute" algorithm as the metric "hamming" in Nearest neighbor search.
> Hence, I think that does not need to be implemented as a LSH algorithm
>
> I would also rather focus on non-binary representations.
> There is no efficient way to work with binary data in numpy afaik -- at
> least none that is supported in sklearn.
>
> I'm very interested in this project but unfortunately I don't have the
> time to mentor.
>
> Cheers,
> Andy
>
>
> On Wed, Feb 26, 2014 at 12:46 AM, Maheshakya Wijewardena <
> pmaheshak...@gmail.com> wrote:
>
>> Approximating Nearest neighbor search is one of the application of
>> locality sensitive hashing.There are five major methods.
>>
>>    - Bit sampling for Hamming distance
>>     - Min-wise independent permutations
>>     - Nilsimsa Hash
>>     - Random projection
>>     - Stable distributions
>>
>> Bit sampling method is fairly straight forward. A reference for the
>> implementation of Random projection method can be taken from *lshash
>> <https://pypi.python.org/pypi/lshash>* library.
>>  I'm looking forward to see comments for this from prospective mentors
>> of this project.
>>
>>  Thank you.
>>  Maheshakya.
>>
>>
>>
>>  On Tue, Feb 25, 2014 at 8:24 AM, Maheshakya Wijewardena <
>> pmaheshak...@gmail.com> wrote:
>>
>>> Hi,
>>> I have looked into this project idea. I have studied this method and I
>>> like to discuss further on this.
>>>  I would like to know who the mentors for this project are and to get
>>> some insight on how to begin.
>>>
>>>  Regards,
>>> Maheshakya,
>>> --
>>> Undergraduate,
>>> Department of Computer Science and Engineering,
>>> Faculty of Engineering.
>>> University of Moratuwa,
>>>  Sri Lanka
>>>
>>
>>
>>
>>  --
>> Undergraduate,
>> Department of Computer Science and Engineering,
>> Faculty of Engineering.
>> University of Moratuwa,
>>  Sri Lanka
>>
>
>
>
>  --
> Undergraduate,
> Department of Computer Science and Engineering,
> Faculty of Engineering.
> University of Moratuwa,
>  Sri Lanka
>
>
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one 
> tool.http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one tool.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka

<<330.png>>

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to