I have been experimenting with the above code. I have noticed the following
things:
1. If we set algorithm = 'brute' the algorithm does not enter the
function tan, i.e., putting a breakpoint at the print statement does not
stop execution on it during the fit method. It does however use this
function when using kneighbors method.
2. I think one cannot use the user defined metric with 'brute'.
3. On the other hand if we set the algorithm = 'ball_tree' the execution
does go through the tan function during the fit method. But if you see the
values of x and y at this time it will be different from the values of x
and y that you entered.
4. Clearly, the ball_tree algorithm is doing some weird stuff. I don't
think it is using the defined metric tan for making the tree.
--
sp
On Thu, Jan 14, 2016 at 2:42 AM, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
>
>> I guess I got it now! This behavior (see below) is indeed a bit strange:
>>
>> from sklearn.neighbors import NearestNeighbors
>> import numpy as np
>>
>> X = np.array([[1.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 0.0], [1.0, 1.0, 1.0,
>> 1.0]])
>>
>> def tan(x, y):
>> print(y)
>> return 1
>>
>> nbrs = NearestNeighbors(n_neighbors=1, algorithm='ball_tree',
>> metric=tan).fit(X)
>> distances, indices = nbrs.kneighbors(X)
>>
>> [ 0.51786272 0.53042315 0.87815766 0.90239616 0.34253599 0.98631925
>> 0.29768794 0.36593595 0.28956526 0.24720931]
>> [ 1. 0. 1. 1.]
>> [ 0. 0. 1. 0.]
>> [ 1. 1. 1. 1.]
>> [ 0.66666667 0.33333333 1. 0.66666667]
>> [ 1. 0. 1. 1.]
>> [ 0. 0. 1. 0.]
>> [ 1. 1. 1. 1.]
>> [ 0.66666667 0.33333333 1. 0.66666667]
>> [ 1. 0. 1. 1.]
>> [ 0. 0. 1. 0.]
>> [ 1. 1. 1. 1.]
>> [ 0.66666667 0.33333333 1. 0.66666667]
>> [ 1. 0. 1. 1.]
>> [ 0. 0. 1. 0.]
>> [ 1. 1. 1. 1.]
>>
>>
>> It seems to be due to the partitioning via the ball tree algorithm; I am
>> not sure if this is intended. It would be nice to get some feedback on this
>> ...
>>
>> Switching to "brute" seems to return the expected results:
>>
>> from sklearn.neighbors import NearestNeighbors
>> import numpy as np
>>
>> X = np.array([[1.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 0.0], [1.0, 1.0, 1.0,
>> 1.0]])
>>
>> def tan(x, y):
>> print(y)
>> return 1
>>
>> nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute',
>> metric=tan).fit(X)
>> distances, indices = nbrs.kneighbors(X)
>>
>> [ 0. 0. 1. 0.]
>> [ 1. 1. 1. 1.]
>> [ 1. 1. 1. 1.]
>> [ 1. 0. 1. 1.]
>> [ 0. 0. 1. 0.]
>> [ 1. 1. 1. 1.]
>>
>>
>>
>>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general