Hi Jacob

I went through the code. The 'fit' method in nearest neighbors does not do
any distance calculations. It only initializes the class variables. In that
case this is probably not a bug.

--
sp

On Wed, Feb 24, 2016 at 12:26 AM, Jacob Vanderplas <
jake...@cs.washington.edu> wrote:

> I have been experimenting with the above code. I have noticed the
>> following things:
>>
>>
>>    1. If we set algorithm = 'brute' the algorithm does not enter the
>>    function tan, i.e., putting a breakpoint at the print statement does not
>>    stop execution on it during the fit method. It does however use this
>>    function when using kneighbors method. I think one cannot use the user
>>    defined metric with 'brute'.
>>
>>  This sounds like a bug – can you open an issue?
>
>>
>>    1. On the other hand if we set the algorithm = 'ball_tree' the
>>    execution does go through the tan function during the fit method. But if
>>    you see the values of x and y at this time it will be different from the
>>    values of x and y that you entered. Clearly, the ball_tree algorithm is
>>    doing some weird stuff. I don't think it is using the defined metric tan
>>    for making the tree.
>>
>> Probably related to the bug fixed here:
> https://github.com/scikit-learn/scikit-learn/pull/6288
>
>
>>
>>
>>
>> --
>> sp
>>
>> On Thu, Jan 14, 2016 at 2:42 AM, Sebastian Raschka <se.rasc...@gmail.com>
>>> wrote:
>>>
>>>> I guess I got it now! This behavior (see below) is indeed a bit strange:
>>>>
>>>> from sklearn.neighbors import NearestNeighbors
>>>> import numpy as np
>>>>
>>>> X = np.array([[1.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 0.0], [1.0, 1.0,
>>>> 1.0, 1.0]])
>>>>
>>>> def tan(x, y):
>>>>     print(y)
>>>>     return 1
>>>>
>>>> nbrs = NearestNeighbors(n_neighbors=1, algorithm='ball_tree',
>>>> metric=tan).fit(X)
>>>> distances, indices = nbrs.kneighbors(X)
>>>>
>>>> [ 0.51786272  0.53042315  0.87815766  0.90239616  0.34253599  0.98631925
>>>>   0.29768794  0.36593595  0.28956526  0.24720931]
>>>> [ 1.  0.  1.  1.]
>>>> [ 0.  0.  1.  0.]
>>>> [ 1.  1.  1.  1.]
>>>> [ 0.66666667  0.33333333  1.          0.66666667]
>>>> [ 1.  0.  1.  1.]
>>>> [ 0.  0.  1.  0.]
>>>> [ 1.  1.  1.  1.]
>>>> [ 0.66666667  0.33333333  1.          0.66666667]
>>>> [ 1.  0.  1.  1.]
>>>> [ 0.  0.  1.  0.]
>>>> [ 1.  1.  1.  1.]
>>>> [ 0.66666667  0.33333333  1.          0.66666667]
>>>> [ 1.  0.  1.  1.]
>>>> [ 0.  0.  1.  0.]
>>>> [ 1.  1.  1.  1.]
>>>>
>>>>
>>>> It seems to be due to the partitioning via the ball tree algorithm; I
>>>> am not sure if this is intended. It would be nice to get some feedback on
>>>> this ...
>>>>
>>>> Switching to "brute" seems to return the expected results:
>>>>
>>>> from sklearn.neighbors import NearestNeighbors
>>>> import numpy as np
>>>>
>>>> X = np.array([[1.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 0.0], [1.0, 1.0,
>>>> 1.0, 1.0]])
>>>>
>>>> def tan(x, y):
>>>>     print(y)
>>>>     return 1
>>>>
>>>> nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute',
>>>> metric=tan).fit(X)
>>>> distances, indices = nbrs.kneighbors(X)
>>>>
>>>> [ 0.  0.  1.  0.]
>>>> [ 1.  1.  1.  1.]
>>>> [ 1.  1.  1.  1.]
>>>> [ 1.  0.  1.  1.]
>>>> [ 0.  0.  1.  0.]
>>>> [ 1.  1.  1.  1.]
>>>>
>>>>
>>>>
>>>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to