Hi Jacob
I went through the code. The 'fit' method in nearest neighbors does not do
any distance calculations. It only initializes the class variables. In that
case this is probably not a bug.
--
sp
On Wed, Feb 24, 2016 at 12:26 AM, Jacob Vanderplas <
jake...@cs.washington.edu> wrote:
> I have been experimenting with the above code. I have noticed the
>> following things:
>>
>>
>> 1. If we set algorithm = 'brute' the algorithm does not enter the
>> function tan, i.e., putting a breakpoint at the print statement does not
>> stop execution on it during the fit method. It does however use this
>> function when using kneighbors method. I think one cannot use the user
>> defined metric with 'brute'.
>>
>> This sounds like a bug – can you open an issue?
>
>>
>> 1. On the other hand if we set the algorithm = 'ball_tree' the
>> execution does go through the tan function during the fit method. But if
>> you see the values of x and y at this time it will be different from the
>> values of x and y that you entered. Clearly, the ball_tree algorithm is
>> doing some weird stuff. I don't think it is using the defined metric tan
>> for making the tree.
>>
>> Probably related to the bug fixed here:
> https://github.com/scikit-learn/scikit-learn/pull/6288
>
>
>>
>>
>>
>> --
>> sp
>>
>> On Thu, Jan 14, 2016 at 2:42 AM, Sebastian Raschka <se.rasc...@gmail.com>
>>> wrote:
>>>
>>>> I guess I got it now! This behavior (see below) is indeed a bit strange:
>>>>
>>>> from sklearn.neighbors import NearestNeighbors
>>>> import numpy as np
>>>>
>>>> X = np.array([[1.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 0.0], [1.0, 1.0,
>>>> 1.0, 1.0]])
>>>>
>>>> def tan(x, y):
>>>> print(y)
>>>> return 1
>>>>
>>>> nbrs = NearestNeighbors(n_neighbors=1, algorithm='ball_tree',
>>>> metric=tan).fit(X)
>>>> distances, indices = nbrs.kneighbors(X)
>>>>
>>>> [ 0.51786272 0.53042315 0.87815766 0.90239616 0.34253599 0.98631925
>>>> 0.29768794 0.36593595 0.28956526 0.24720931]
>>>> [ 1. 0. 1. 1.]
>>>> [ 0. 0. 1. 0.]
>>>> [ 1. 1. 1. 1.]
>>>> [ 0.66666667 0.33333333 1. 0.66666667]
>>>> [ 1. 0. 1. 1.]
>>>> [ 0. 0. 1. 0.]
>>>> [ 1. 1. 1. 1.]
>>>> [ 0.66666667 0.33333333 1. 0.66666667]
>>>> [ 1. 0. 1. 1.]
>>>> [ 0. 0. 1. 0.]
>>>> [ 1. 1. 1. 1.]
>>>> [ 0.66666667 0.33333333 1. 0.66666667]
>>>> [ 1. 0. 1. 1.]
>>>> [ 0. 0. 1. 0.]
>>>> [ 1. 1. 1. 1.]
>>>>
>>>>
>>>> It seems to be due to the partitioning via the ball tree algorithm; I
>>>> am not sure if this is intended. It would be nice to get some feedback on
>>>> this ...
>>>>
>>>> Switching to "brute" seems to return the expected results:
>>>>
>>>> from sklearn.neighbors import NearestNeighbors
>>>> import numpy as np
>>>>
>>>> X = np.array([[1.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 0.0], [1.0, 1.0,
>>>> 1.0, 1.0]])
>>>>
>>>> def tan(x, y):
>>>> print(y)
>>>> return 1
>>>>
>>>> nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute',
>>>> metric=tan).fit(X)
>>>> distances, indices = nbrs.kneighbors(X)
>>>>
>>>> [ 0. 0. 1. 0.]
>>>> [ 1. 1. 1. 1.]
>>>> [ 1. 1. 1. 1.]
>>>> [ 1. 0. 1. 1.]
>>>> [ 0. 0. 1. 0.]
>>>> [ 1. 1. 1. 1.]
>>>>
>>>>
>>>>
>>>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general