It would be very straightforward to extend neighbors to use any 
minkowski distance (see 
https://github.com/scikit-learn/scikit-learn/issues/351 ).
BallTree does not yet work for arbitrary distance metrics, but this is 
not due to any inherent limitation: it will work for any metric which 
satisfies the triangle inequality.

I've been thinking about how to extend BallTree to work with other 
metrics: for speed, it should use C-style function pointers to compute 
the distance (python functions are too slow).  This is difficult because 
different distance metrics have different ancillary data associated 
(i.e. integer p for minkowski distance, matrix V for mahalanobis 
distance, etc).  Using C++ style class abstraction, this sort of thing 
could be accomplished fairly easily and readably.  But that doesn't feel 
"cythony" to me.

If someone has a good idea about how one could specify these distance 
metrics from python code, with optional ancillary parameters, and 
convert these specifications into code for fast distance computation 
within cython, I think Mathias' suggestion could be accomplished with a 
bit of effort.
   Jake

Olivier Grisel wrote:
> 2012/1/4 Mathias Verbeke <[email protected]>:
>   
>> Dear all,
>>
>> I just started working with Scikit Learn and I'm currently using the Nearest
>> Neighbors module. In the documentation is stated that it currently only
>> supports the Euclidean distance metric, and I was wondering if it would be
>> easy to extend it with other distance metrics? Since it uses the
>> scipy.sparse matrices as input, I was thinking about the distance metrics in
>> scipy.distance.spatial.
>>     
>
> scipy.spatial.distance does not work on scipy.sparse matrices, only on
> numpy arrays AFAIK. The kNN classifier only works with sparse matrices
> with the "bruteforce" mode as BallTree and kd-tree do not work with
> scipy.sparse matrices either.
>
>   
>> Would that be possible, or were there certain
>> considerations to only allow for Euclidean distance?
>>     
>
> Would be great to make this pluggable indeed. This should be quite
> easy for the brute force mode. For the ball tree mode that will
> require to dive into the cython code and read the reference paper to
> check whether any assumption on the metrics is used or not (or just
> ask Jake :).
>
>   

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to