Emanuele,
This is exciting!  I've played with Cover Trees in the past, and found 
similar to you that the insertion time is very slow, and query time is 
comparable to that of Ball Tree.  From my reading and brief 
experimentation on the subject, it seems that cover trees seem to be 
nice in that you can analytically prove worst-case run-time bounds, but 
they are not necessarily better than other algorithms in practice 
(though I'd be happy to be proven wrong).

I found that the larges overhead in construction was the need to 
maintain priority queues of unused nodes as the tree is constructed.  If 
this aspect can be streamlined, there could certainly be room for 
improvement.

I'm looking forward to seeing your progress on this!
  Jake

Emanuele Olivetti wrote:
> On 01/04/2012 04:22 PM, Mathias Verbeke wrote:
>   
>> Dear all,
>>
>> I just started working with Scikit Learn and I'm currently using the Nearest 
>> Neighbors
>> module. In the documentation is stated that it currently only supports the 
>> Euclidean
>> distance metric, and I was wondering if it would be easy to extend it with 
>> other
>> distance metrics? Since it uses the scipy.sparse matrices as input, I was 
>> thinking about
>> the distance metrics in scipy.distance.spatial. Would that be possible, or 
>> were there
>> certain considerations to only allow for Euclidean distance?
>>
>>
>>     
>
> Hi,
>
> Recently I started to investigate cover trees [0] which are another
> data structure like the ball tree or kd-tree with some interesting
> properties. I stumbled upon PyCoverTree which is a pure python
> implementation of cover trees which I am tweaking to make
> it more usable (e.g. initially it worked only on Python >=v2.7) and
> eventually faster:
>
>      https://github.com/emanuele/PyCoverTree
>
> This algorithm might be of interest to you because it is like the BallTree
> class but it accepts a generic distance functionm thus non-vectorial input
> too. Of course it is (much) slower than the BallTree during insertion.
> But if that is not a big problem, then queries are not that much slower :-)
>
> Time permitting I'd like to improve PyCoverTree and make it sklearn-compliant.
> But do not expect too much in the short term ;-)
>
> Note: the current LICENSE.txt of PyCoverTree says "all rights reserved"
> but I am in the process of getting an appropriate open source license
> from the original authors. If there is interest I'll post news here.
>
> Best,
>
> Emanuele
>
> [0]: http://hunch.net/~jl/projects/cover_tree/cover_tree.html
>
> ------------------------------------------------------------------------------
> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
> infrastructure or vast IT resources to deliver seamless, secure access to
> virtual desktops. With this all-in-one solution, easily deploy virtual 
> desktops for less than the cost of PCs and save 60% on VDI infrastructure 
> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>   

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to