Re: [Scikit-learn-general] k-NN user defined distance

A neuman Tue, 12 Jan 2016 10:25:57 -0800

The custom metric, ist just calculating the tanimoto coef.

    a=x.tolist()
    b=y.tolist()


    c=np.count_nonzero(x==y)
    a1=a.count(1.0)
    b1=b.count(1.0)

    return float(c)/(a1 + b1 - c)


so im Just counting 1's in x and 1's in y

c= are the numer, where 1's are matching ( matching == on the same index
x=[1,0,0,1] and y=[1,0,1,0]  would be only matching the first 1.

So in generall all the data samples are 1's and 0's. how are the skalars
are occring in the X vektor?  There should be no skalar ins the X vector.
Or am i understanding something wrong?

best,




On 12 January 2016 at 19:18, Sebastian Raschka <[email protected]> wrote:

> Hi, I am not sure how your custom metric works, but would a np.where(x >=
> 0.5, 1., 0.) work in your case?
>
> On Jan 12, 2016, at 1:08 PM, A neuman <[email protected]> wrote:
>
> Sorry, thats not right what I wrote:
> X:
> [ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238
>   0.52445514  0.63379164  0.71873681  0.55008567]
>
> Y:
> [ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238
>   0.52445514  0.63379164  0.71873681  0.55008567]
>
> X:
> [ 0.          0.          0.          0.02358491  0.00471698  0.
> 0.
>   0.          0.          0.00471698  0.00471698  0.00471698  0.02830189
>   0.00943396  0.     .............................52358491  0.53773585
>   0.63207547  0.51886792  0.66037736  0.75        0.57075472  0.59433962
>   0.63679245  0.8490566   0.71698113  0.02358491]
>
> Y:
> [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.
>   1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>   0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.
>   0.  0.  0.  0.  0.  1.  0.  0.  1.  0.  1.  0.  0.  1.  0.  1.  1.  0.
>   1.  1.  1.  1.  0.]
>
> and so on..
>
> but X should be also containing 1's and 0's.
>
> best,
>
> On 12 January 2016 at 19:04, A neuman <[email protected]> wrote:
>
>> Hey,
>>
>> I Have an another problem,
>>
>> if I'm using my own metric, there are not only the samples in x and y.
>> I'm using a 10 fold cv with k-NN Classifier.
>> My Attributes are only 1's and 0's, but if im printing them out, I'll get:
>>
>> KNeighborsClassifier(metric=myFunc)
>>
>> def myFunc(x,y):
>>
>>     print x,'\n'
>>     print y,'\n'
>>
>> I Cutted some values due to the size:
>>
>> Thats for x:
>>
>> [ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238
>>   0.52445514  0.63379164  0.71873681  0.55008567]
>>
>> [ 0.6371319   0.54557285  0.30214217  0.14690307  0.49778446  0.89183238
>>   0.52445514  0.63379164  0.71873681  0.55008567]
>>
>> [ 0.          0.          0.          0.02358491  0.00471698  0.
>> 0.
>>   0.          0.          0.00471698  0.00471698  0.00471698  0.02830189
>>   0.00943396  0.     .............................52358491  0.53773585
>>   0.63207547  0.51886792  0.66037736  0.75        0.57075472  0.59433962
>>   0.63679245  0.8490566   0.71698113  0.02358491]
>>
>> [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.
>>   1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  1.  0.  0.  1.  0.  1.  0.  0.  1.  0.  1.  1.  0.
>>   1.  1.  1.  1.  0.]
>>
>>
>>
>> and for y
>>
>> [ 0.          0.          0.          0.02358491  0.00471698  0.
>> 0.
>>   0.          0.          0.00471698  0.00471698  0.00471698  0.02830189
>>   0.          ..........
>>   0.63207547  0.51886792  0.66037736  0.75        0.57075472  0.59433962
>>   0.63679245  0.8490566   0.71698113  0.02358491]
>>
>> [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  1.  0.  0.
>>   0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
>>   0.  1.  0.  0.  1.  0.  1.  1.  0.  0.  0.  0.  0.  1.  0.  0.  1.  0.
>>   0.  1.  1.  0.  0.  1.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  1.  0.
>>   0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  0.  0.  1.  1.  0.  0.  1.  1.
>>   0.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  0.  0.  0.  1.  0.  1.  0.
>>   0.  0.  0.  0.  1.  0.  0.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  0.
>>   0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  0.  1.  0.  1.  1.  1.  1.
>>   1.  1.  1.  1.  0.]
>>
>>
>> The problem is, I have to count the occurences from 0's and 1's in x and
>> y. And if there are some other arrays
>> lik 0.636..... I dont get the right solution. So in general, i only want
>> the array with 1 and 0
>>
>> best,
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 9 January 2016 at 03:58, A neuman <[email protected]> wrote:
>>
>>> Ah, that helped me a lot!!!
>>>
>>> So i just write my own function that returns an skalar. This function is
>>> used in the metric parameter of the kNN function.
>>>
>>> Thank you!!!
>>>
>>>
>>> On 9 January 2016 at 03:41, Sebastian Raschka <[email protected]>
>>> wrote:
>>>
>>>> You could just need “regular" Python function that outputs a scalar.
>>>> For example, consider the following example:
>>>>
>>>> >>> from sklearn.neighbors import NearestNeighbors
>>>> >>> import numpy as np
>>>> >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>>> >>> nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
>>>> >>> distances, indices = nbrs.kneighbors(X)
>>>> >>> distances
>>>> array([[ 0.        ,  1.        ],
>>>>        [ 0.        ,  1.        ],
>>>>        [ 0.        ,  1.41421356],
>>>>        [ 0.        ,  1.        ],
>>>>        [ 0.        ,  1.        ],
>>>>        [ 0.        ,  1.41421356]])
>>>>
>>>> (note that I am using the NearestNeighbors class here, but the same
>>>> applies to the KNeighborsClassifier)
>>>>
>>>> For example, to compute the distances between samples as Euclidean
>>>> distance (the default) you could just define a Python function
>>>>
>>>> >>> def eucldist(x, y):
>>>> ...    return np.sqrt(np.sum((x-y)**2))
>>>> >>> nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree',
>>>> metric=eucldist).fit(X)
>>>> >>> distances, indices = nbrs.kneighbors(X)
>>>> >>> distances
>>>> array([[ 0.        ,  1.        ],
>>>>        [ 0.        ,  1.        ],
>>>>        [ 0.        ,  1.41421356],
>>>>        [ 0.        ,  1.        ],
>>>>        [ 0.        ,  1.        ],
>>>>        [ 0.        ,  1.41421356]])
>>>>
>>>> (alt. you could provide it as lambda function)
>>>>
>>>> Best,
>>>> Sebastian
>>>>
>>>> > On Jan 8, 2016, at 9:19 PM, A neuman <[email protected]> wrote:
>>>> >
>>>> > Hello everyone,
>>>> >
>>>> > I actually want to use the KNeighboursClassifier, with my own
>>>> distances.
>>>> >
>>>> > in the Documentation stands the following:
>>>> >
>>>> > [callable] : a user-defined function which accepts an array of
>>>> distances, and returns an array of the same shape containing the weights.
>>>> >
>>>> > I just dont know, how should the array looks like?
>>>> >
>>>> > For example, if I have 100 Samples, the array has a size 100*100?
>>>> > So for every samples there is a distance to the other 99 samples.
>>>> >
>>>> > [[0.4, 0.2, ...],[0.3,0.1,...]........[0.9,0.6,...]]   something like
>>>> this?
>>>> >
>>>> > I would appreciate your help.
>>>> >
>>>> > best,
>>>> >
>>>> >
>>>> >
>>>> ------------------------------------------------------------------------------
>>>> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> > Monitor end-to-end web transactions and take corrective actions now
>>>> > Troubleshoot faster and improve end-user experience. Signup Now!
>>>> >
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
>>>> > Scikit-learn-general mailing list
>>>> > [email protected]
>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> Monitor end-to-end web transactions and take corrective actions now
>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
>
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] k-NN user defined distance

Reply via email to