Re: [Scikit-learn-general] k-NN user defined distance

Sebastian Raschka Tue, 12 Jan 2016 10:34:12 -0800

Isn't the tanimoto coeff a continuous number between 0 and 1?

> how are the skalars are occring in the X vektor?



You are returning a fraction ->  "return float(c)/(a1 + b1 - c)" 
instead of the count, maybe that's an misunderstanding?




> On Jan 12, 2016, at 1:24 PM, A neuman <themagenta...@gmail.com> wrote:
> 
> The custom metric, ist just calculating the tanimoto coef.
> 
>     a=x.tolist()
>     b=y.tolist()
> 
>     c=np.count_nonzero(x==y)
>     a1=a.count(1.0)
>     b1=b.count(1.0)
> 
>     return float(c)/(a1 + b1 - c)
> 
> 
> so im Just counting 1's in x and 1's in y
> 
> c= are the numer, where 1's are matching ( matching == on the same index   
> x=[1,0,0,1] and y=[1,0,1,0]  would be only matching the first 1.
> 
> So in generall all the data samples are 1's and 0's. how are the skalars are 
> occring in the X vektor?  There should be no skalar ins the X vector. Or am i 
> understanding something wrong?
> 
> best,
> 
> 
> 
> 
> On 12 January 2016 at 19:18, Sebastian Raschka <se.rasc...@gmail.com 
> <mailto:se.rasc...@gmail.com>> wrote:
> Hi, I am not sure how your custom metric works, but would a np.where(x >= 
> 0.5, 1., 0.) work in your case?
> 
>> On Jan 12, 2016, at 1:08 PM, A neuman <themagenta...@gmail.com 
>> <mailto:themagenta...@gmail.com>> wrote:
>> 
>> Sorry, thats not right what I wrote:
>> X:
>> [ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238
>>   0.52445514  0.63379164  0.71873681  0.55008567] 
>> 
>> Y:
>> [ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238
>>   0.52445514  0.63379164  0.71873681  0.55008567] 
>> 
>> X:
>> [ 0.          0.          0.          0.02358491  0.00471698  0.          0.
>>   0.          0.          0.00471698  0.00471698  0.00471698  0.02830189
>>   0.00943396  0.     .............................52358491  0.53773585
>>   0.63207547  0.51886792  0.66037736  0.75        0.57075472  0.59433962
>>   0.63679245  0.8490566   0.71698113  0.02358491] 
>> 
>> Y:
>> [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.
>>   1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  1.  0.  0.  1.  0.  1.  0.  0.  1.  0.  1.  1.  0.
>>   1.  1.  1.  1.  0.] 
>> 
>> and so on..
>> 
>> but X should be also containing 1's and 0's.  
>> 
>> best,
>> 
>> On 12 January 2016 at 19:04, A neuman <themagenta...@gmail.com 
>> <mailto:themagenta...@gmail.com>> wrote:
>> Hey, 
>> 
>> I Have an another problem,
>> 
>> if I'm using my own metric, there are not only the samples in x and y.
>> I'm using a 10 fold cv with k-NN Classifier.
>> My Attributes are only 1's and 0's, but if im printing them out, I'll get:
>> 
>> KNeighborsClassifier(metric=myFunc)
>> 
>> def myFunc(x,y):
>>     
>>     print x,'\n'
>>     print y,'\n'
>> 
>> I Cutted some values due to the size:
>> 
>> Thats for x:
>> 
>> [ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238
>>   0.52445514  0.63379164  0.71873681  0.55008567] 
>> 
>> [ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238
>>   0.52445514  0.63379164  0.71873681  0.55008567] 
>> 
>> [ 0.          0.          0.          0.02358491  0.00471698  0.          0.
>>   0.          0.          0.00471698  0.00471698  0.00471698  0.02830189
>>   0.00943396  0.     .............................52358491  0.53773585
>>   0.63207547  0.51886792  0.66037736  0.75        0.57075472  0.59433962
>>   0.63679245  0.8490566   0.71698113  0.02358491] 
>> 
>> [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.
>>   1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  1.  0.  0.  1.  0.  1.  0.  0.  1.  0.  1.  1.  0.
>>   1.  1.  1.  1.  0.] 
>> 
>> 
>> 
>> and for y
>> 
>> [ 0.          0.          0.          0.02358491  0.00471698  0.          0.
>>   0.          0.          0.00471698  0.00471698  0.00471698  0.02830189
>>   0.          ..........
>>   0.63207547  0.51886792  0.66037736  0.75        0.57075472  0.59433962
>>   0.63679245  0.8490566   0.71698113  0.02358491] 
>> 
>> [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  1.  0.  0.
>>   0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  1.  0.  0.  1.  0.  1.  1.  0.  0.  0.  0.  0.  1.  0.  0.  1.  0.
>>   0.  1.  1.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  1.  0.
>>   0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  0.  0.  1.  1.  0.  0.  1.  1.
>>   0.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  1.  0.
>>   0.  0.  0.  0.  1.  0.  0.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  0.
>>   0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  0.  1.  0.  1.  1.  1.  1.
>>   1.  1.  1.  1.  0.] 
>> 
>> 
>> The problem is, I have to count the occurences from 0's and 1's in x and y. 
>> And if there are some other arrays
>> lik 0.636..... I dont get the right solution. So in general, i only want the 
>> array with 1 and 0
>> 
>> best,
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 9 January 2016 at 03:58, A neuman <themagenta...@gmail.com 
>> <mailto:themagenta...@gmail.com>> wrote:
>> Ah, that helped me a lot!!!
>> 
>> So i just write my own function that returns an skalar. This function is 
>> used in the metric parameter of the kNN function. 
>> 
>> Thank you!!!
>> 
>> 
>> On 9 January 2016 at 03:41, Sebastian Raschka <se.rasc...@gmail.com 
>> <mailto:se.rasc...@gmail.com>> wrote:
>> You could just need “regular" Python function that outputs a scalar. For 
>> example, consider the following example:
>> 
>> >>> from sklearn.neighbors import NearestNeighbors
>> >>> import numpy as np
>> >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>> >>> nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
>> >>> distances, indices = nbrs.kneighbors(X)
>> >>> distances
>> array([[ 0.        ,  1.        ],
>>        [ 0.        ,  1.        ],
>>        [ 0.        ,  1.41421356],
>>        [ 0.        ,  1.        ],
>>        [ 0.        ,  1.        ],
>>        [ 0.        ,  1.41421356]])
>> 
>> (note that I am using the NearestNeighbors class here, but the same applies 
>> to the KNeighborsClassifier)
>> 
>> For example, to compute the distances between samples as Euclidean distance 
>> (the default) you could just define a Python function
>> 
>> >>> def eucldist(x, y):
>> ...    return np.sqrt(np.sum((x-y)**2))
>> >>> nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree', 
>> >>> metric=eucldist).fit(X)
>> >>> distances, indices = nbrs.kneighbors(X)
>> >>> distances
>> array([[ 0.        ,  1.        ],
>>        [ 0.        ,  1.        ],
>>        [ 0.        ,  1.41421356],
>>        [ 0.        ,  1.        ],
>>        [ 0.        ,  1.        ],
>>        [ 0.        ,  1.41421356]])
>> 
>> (alt. you could provide it as lambda function)
>> 
>> Best,
>> Sebastian
>> 
>> > On Jan 8, 2016, at 9:19 PM, A neuman <themagenta...@gmail.com 
>> > <mailto:themagenta...@gmail.com>> wrote:
>> >
>> > Hello everyone,
>> >
>> > I actually want to use the KNeighboursClassifier, with my own distances.
>> >
>> > in the Documentation stands the following:
>> >
>> > [callable] : a user-defined function which accepts an array of distances, 
>> > and returns an array of the same shape containing the weights.
>> >
>> > I just dont know, how should the array looks like?
>> >
>> > For example, if I have 100 Samples, the array has a size 100*100?
>> > So for every samples there is a distance to the other 99 samples.
>> >
>> > [[0.4, 0.2, ...],[0.3,0.1,...]........[0.9,0.6,...]]   something like this?
>> >
>> > I would appreciate your help.
>> >
>> > best,
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> > Monitor end-to-end web transactions and take corrective actions now
>> > Troubleshoot faster and improve end-user experience. Signup Now!
>> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
>> >  
>> > <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________>
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net 
>> > <mailto:Scikit-learn-general@lists.sourceforge.net>
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> 
>> 
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 
>> <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
>>  
>> <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________>
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 
> <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] k-NN user defined distance

Reply via email to