Mathias,
I'm glad you're excited to work on this!  I think starting with just the 
minkowski p-distance in this case is a good idea, and it would be a 
great way for you to gain experience with the code.  I'd do the following:

- in sklearn/neighbors/base.py, add a parameter `p` to the NeighborsBase 
class.  This class holds the basic parameters needed for all distance 
computations. 
- Also in this class, when BallTree is constructed, this parameter `p` 
needs to be passed to it.
- Next in the places in this file where the distances are actually 
computed, you need to make use of this parameter `p` to make  sure the 
correct distance metric is returned. For Ball Tree, this is accomplished 
when it is constructed.  For KDtree, `p` is passed as a parameter when 
finding the nearest neighbors.  For brute, you'll need to modify the 
code to use minkowski rather than euclidean distance.
- Next the rest of the neighbors module needs to be modified to accept a 
keyword `p` with default value 2, and pass it to the _init_params function.
- Finally, some tests and examples should be created.

If you'd like to start on this, go ahead and fork scikit-learn on github 
and start a new branch.  This has the advantage that I can look over 
your code if you have questions!  If anything is unclear, feel free to 
email me off-list or to get in touch with me within github: my username 
is jakevdp.
   Jake

Mathias Verbeke wrote:
> Hi,
>
> First, thanks for all the answers! Waauw, really interesting 
> discussion. I have only basic Python skills, and never programmed in 
> Cython (together with a lot of time constraints, as most of you 
> probably), but I would like to give it a try to add new distance 
> metrics to the brute force method. Would it be possible to give some 
> pointers to what should need to be done/changed to add e.g. the 
> keyword p as was mentioned in Jake's first reply?
>
> Cheers and thanks,
>
> Mathias
>
>
> On Thu, Jan 5, 2012 at 5:33 PM, Jacob VanderPlas 
> <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     Here's a small example I coded up that shows how I envision including
>     multiple distance metrics in BallTree
>
>     https://gist.github.com/1565998
>
>     The idea is that you create functions to compute distance which
>     expose C
>     function pointers, so that the ball tree cython code can call these
>     without python overhead.  I'd be curious to hear peoples' thoughts
>       Jake
>
>     Gael Varoquaux wrote:
>     > On Wed, Jan 04, 2012 at 07:59:04AM -0800, Jacob VanderPlas wrote:
>     >
>     >> If someone has a good idea about how one could specify these
>     distance
>     >> metrics from python code, with optional ancillary parameters, and
>     >> convert these specifications into code for fast distance
>     computation
>     >> within cython,
>     >>
>     >
>     > How about using small Cython classes with one or two methods?
>     This is
>     > what ended up working well for the decision trees (see _tree.pyx).
>     >
>     > Gael
>     >
>     >
>     
> ------------------------------------------------------------------------------
>     > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need
>     a complex
>     > infrastructure or vast IT resources to deliver seamless, secure
>     access to
>     > virtual desktops. With this all-in-one solution, easily deploy
>     virtual
>     > desktops for less than the cost of PCs and save 60% on VDI
>     infrastructure
>     > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
>     > _______________________________________________
>     > Scikit-learn-general mailing list
>     > [email protected]
>     <mailto:[email protected]>
>     > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>     >
>
>     
> ------------------------------------------------------------------------------
>     Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a
>     complex
>     infrastructure or vast IT resources to deliver seamless, secure
>     access to
>     virtual desktops. With this all-in-one solution, easily deploy virtual
>     desktops for less than the cost of PCs and save 60% on VDI
>     infrastructure
>     costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
>     _______________________________________________
>     Scikit-learn-general mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
> infrastructure or vast IT resources to deliver seamless, secure access to
> virtual desktops. With this all-in-one solution, easily deploy virtual 
> desktops for less than the cost of PCs and save 60% on VDI infrastructure 
> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
> ------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>   

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to