Re: [Scikit-learn-general] k-NN user defined distance

Shishir Pandey Tue, 23 Feb 2016 02:46:16 -0800

I have been experimenting with the above code. I have noticed the following
things:



   1. If we set algorithm = 'brute' the algorithm does not enter the
   function tan, i.e., putting a breakpoint at the print statement does not
   stop execution on it during the fit method. It does however use this
   function when using kneighbors method.
   2. I think one cannot use the user defined metric with 'brute'.
   3. On the other hand if we set the algorithm = 'ball_tree' the execution
   does go through the tan function during the fit method. But if you see the
   values of x and y at this time it will be different from the values of x
   and y that you entered.
   4. Clearly, the ball_tree algorithm is doing some weird stuff. I don't
   think it is using the defined metric tan for making the tree.




--
sp

On Thu, Jan 14, 2016 at 2:42 AM, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
>
>> I guess I got it now! This behavior (see below) is indeed a bit strange:
>>
>> from sklearn.neighbors import NearestNeighbors
>> import numpy as np
>>
>> X = np.array([[1.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 0.0], [1.0, 1.0, 1.0,
>> 1.0]])
>>
>> def tan(x, y):
>>     print(y)
>>     return 1
>>
>> nbrs = NearestNeighbors(n_neighbors=1, algorithm='ball_tree',
>> metric=tan).fit(X)
>> distances, indices = nbrs.kneighbors(X)
>>
>> [ 0.51786272  0.53042315  0.87815766  0.90239616  0.34253599  0.98631925
>>   0.29768794  0.36593595  0.28956526  0.24720931]
>> [ 1.  0.  1.  1.]
>> [ 0.  0.  1.  0.]
>> [ 1.  1.  1.  1.]
>> [ 0.66666667  0.33333333  1.          0.66666667]
>> [ 1.  0.  1.  1.]
>> [ 0.  0.  1.  0.]
>> [ 1.  1.  1.  1.]
>> [ 0.66666667  0.33333333  1.          0.66666667]
>> [ 1.  0.  1.  1.]
>> [ 0.  0.  1.  0.]
>> [ 1.  1.  1.  1.]
>> [ 0.66666667  0.33333333  1.          0.66666667]
>> [ 1.  0.  1.  1.]
>> [ 0.  0.  1.  0.]
>> [ 1.  1.  1.  1.]
>>
>>
>> It seems to be due to the partitioning via the ball tree algorithm; I am
>> not sure if this is intended. It would be nice to get some feedback on this
>> ...
>>
>> Switching to "brute" seems to return the expected results:
>>
>> from sklearn.neighbors import NearestNeighbors
>> import numpy as np
>>
>> X = np.array([[1.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 0.0], [1.0, 1.0, 1.0,
>> 1.0]])
>>
>> def tan(x, y):
>>     print(y)
>>     return 1
>>
>> nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute',
>> metric=tan).fit(X)
>> distances, indices = nbrs.kneighbors(X)
>>
>> [ 0.  0.  1.  0.]
>> [ 1.  1.  1.  1.]
>> [ 1.  1.  1.  1.]
>> [ 1.  0.  1.  1.]
>> [ 0.  0.  1.  0.]
>> [ 1.  1.  1.  1.]
>>
>>
>>
>>

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] k-NN user defined distance

Reply via email to