Mathias, I'm glad you're excited to work on this! I think starting with just the minkowski p-distance in this case is a good idea, and it would be a great way for you to gain experience with the code. I'd do the following:
- in sklearn/neighbors/base.py, add a parameter `p` to the NeighborsBase class. This class holds the basic parameters needed for all distance computations. - Also in this class, when BallTree is constructed, this parameter `p` needs to be passed to it. - Next in the places in this file where the distances are actually computed, you need to make use of this parameter `p` to make sure the correct distance metric is returned. For Ball Tree, this is accomplished when it is constructed. For KDtree, `p` is passed as a parameter when finding the nearest neighbors. For brute, you'll need to modify the code to use minkowski rather than euclidean distance. - Next the rest of the neighbors module needs to be modified to accept a keyword `p` with default value 2, and pass it to the _init_params function. - Finally, some tests and examples should be created. If you'd like to start on this, go ahead and fork scikit-learn on github and start a new branch. This has the advantage that I can look over your code if you have questions! If anything is unclear, feel free to email me off-list or to get in touch with me within github: my username is jakevdp. Jake Mathias Verbeke wrote: > Hi, > > First, thanks for all the answers! Waauw, really interesting > discussion. I have only basic Python skills, and never programmed in > Cython (together with a lot of time constraints, as most of you > probably), but I would like to give it a try to add new distance > metrics to the brute force method. Would it be possible to give some > pointers to what should need to be done/changed to add e.g. the > keyword p as was mentioned in Jake's first reply? > > Cheers and thanks, > > Mathias > > > On Thu, Jan 5, 2012 at 5:33 PM, Jacob VanderPlas > <[email protected] > <mailto:[email protected]>> wrote: > > Here's a small example I coded up that shows how I envision including > multiple distance metrics in BallTree > > https://gist.github.com/1565998 > > The idea is that you create functions to compute distance which > expose C > function pointers, so that the ball tree cython code can call these > without python overhead. I'd be curious to hear peoples' thoughts > Jake > > Gael Varoquaux wrote: > > On Wed, Jan 04, 2012 at 07:59:04AM -0800, Jacob VanderPlas wrote: > > > >> If someone has a good idea about how one could specify these > distance > >> metrics from python code, with optional ancillary parameters, and > >> convert these specifications into code for fast distance > computation > >> within cython, > >> > > > > How about using small Cython classes with one or two methods? > This is > > what ended up working well for the decision trees (see _tree.pyx). > > > > Gael > > > > > > ------------------------------------------------------------------------------ > > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need > a complex > > infrastructure or vast IT resources to deliver seamless, secure > access to > > virtual desktops. With this all-in-one solution, easily deploy > virtual > > desktops for less than the cost of PCs and save 60% on VDI > infrastructure > > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox > > _______________________________________________ > > Scikit-learn-general mailing list > > [email protected] > <mailto:[email protected]> > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a > complex > infrastructure or vast IT resources to deliver seamless, secure > access to > virtual desktops. With this all-in-one solution, easily deploy virtual > desktops for less than the cost of PCs and save 60% on VDI > infrastructure > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > <mailto:[email protected]> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex > infrastructure or vast IT resources to deliver seamless, secure access to > virtual desktops. With this all-in-one solution, easily deploy virtual > desktops for less than the cost of PCs and save 60% on VDI infrastructure > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox > ------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
