It would be very straightforward to extend neighbors to use any minkowski distance (see https://github.com/scikit-learn/scikit-learn/issues/351 ). BallTree does not yet work for arbitrary distance metrics, but this is not due to any inherent limitation: it will work for any metric which satisfies the triangle inequality.
I've been thinking about how to extend BallTree to work with other metrics: for speed, it should use C-style function pointers to compute the distance (python functions are too slow). This is difficult because different distance metrics have different ancillary data associated (i.e. integer p for minkowski distance, matrix V for mahalanobis distance, etc). Using C++ style class abstraction, this sort of thing could be accomplished fairly easily and readably. But that doesn't feel "cythony" to me. If someone has a good idea about how one could specify these distance metrics from python code, with optional ancillary parameters, and convert these specifications into code for fast distance computation within cython, I think Mathias' suggestion could be accomplished with a bit of effort. Jake Olivier Grisel wrote: > 2012/1/4 Mathias Verbeke <[email protected]>: > >> Dear all, >> >> I just started working with Scikit Learn and I'm currently using the Nearest >> Neighbors module. In the documentation is stated that it currently only >> supports the Euclidean distance metric, and I was wondering if it would be >> easy to extend it with other distance metrics? Since it uses the >> scipy.sparse matrices as input, I was thinking about the distance metrics in >> scipy.distance.spatial. >> > > scipy.spatial.distance does not work on scipy.sparse matrices, only on > numpy arrays AFAIK. The kNN classifier only works with sparse matrices > with the "bruteforce" mode as BallTree and kd-tree do not work with > scipy.sparse matrices either. > > >> Would that be possible, or were there certain >> considerations to only allow for Euclidean distance? >> > > Would be great to make this pluggable indeed. This should be quite > easy for the brute force mode. For the ball tree mode that will > require to dive into the cython code and read the reference paper to > check whether any assumption on the metrics is used or not (or just > ask Jake :). > > ------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
