Re: [Scikit-learn-general] AUC realy low

Artem Wed, 05 Aug 2015 02:24:06 -0700

>
>  for i in range(len(predicted)):
>             auc.append(predicted[i][0])



This is the source of the error. predict_proba returns a matrix (numpy
array, to be precise) of shape (n_samples, n_classes). Obviously, in your
case n_classes = 2.

A cell at a given row and column is the probability that the sample
corresponding to this row belongs to the class corresponding to this column.
You are considering 0th column only (which per se is not a problem, rows
always sum up to 1), which means that your auc list contains probabilities
of class 0: the higher the probability — the more likely sample to belong
to the class 0.
Now, documentation
<http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html>
says (emphasis mine):

y_score : array, shape = [n_samples] or [n_samples, n_classes]
> Target scores, can either be probability estimates of the *positive*
> class, confidence values, or binary decisions.


class 0 is not considered positive in any way.

TL;DR
1. Use column 1 of predict_proba, not 0
2. You can just do auc = predicted[:, 1] instead of that loop. Vectorized
operations are way more concise and fast.

On Wed, Aug 5, 2015 at 11:54 AM, Herbert Schulz <hrbrt....@gmail.com> wrote:

> Maybe i didn't explained it very well sorry.
>
> I just have 1 column as a target. The last "post" i did, was just a
> converting from all 0's to 1's and all 1's to 0's. But the auc and the
> expected are from the same date which is converted. So actually it should
> be
>
> auc is [0.9777752710670069, 0.01890450385597026, 0.0059624156214325846,
> 0.05391726570661811]
> expected is [0.0, 1.0, 1.0, 1.0]
>
> here for the auc and the 2-4 values something like 0.97....  and on the
> first value 0.01...
>
>
>
>         predicted=clf.predict_proba(X_test)
>         predi=[]
>
>         classi=[]
>
>
>         for i in range(len(predicted)):
>             auc.append(predicted[i][0])
>
>         print "auc is",auc
>         print "expected is", y_test
>         roc= metrics.roc_auc_score(y_test, auc)
>
>         print roc
>
> So there should be a failure in my data preprocessing or?
>
> or can i just turn the expected vector? I think that would be a good idea
> if I'm using the normal data.
>
> best
>
>
>
>
>
>
> On 4 August 2015 at 17:38, Andreas Mueller <t3k...@gmail.com> wrote:
>
>> You should select the other column from predict_proba for auc.
>>
>>
>>
>> On 08/04/2015 10:54 AM, Herbert Schulz wrote:
>>
>> Thanks for the answer!
>>
>> hmm its possible, I just make a little example:
>>
>> auc is [0.9777752710670069, 0.01890450385597026, 0.0059624156214325846,
>> 0.05391726570661811]
>> expected is [0.0, 1.0, 1.0, 1.0]
>>  but this is already with changed values, in the test set i set every
>> value 0->1  and 1 to 0.
>>
>> SO there is the misstake? it seems that i should "turn" the expected
>> vector y_test ?
>>
>> On 4 August 2015 at 16:36, Artem <barmaley....@gmail.com> wrote:
>>
>>> Hi Herbert
>>>
>>> The worst value for AUC is 0.5 actually. Having values close to 0 means
>>> than you can get a value as close to 1 by just changing your predictions
>>> (predict class 1 when you think it's 0 and vice versa). Are you sure you
>>> didn't confuse classes somewhere along the lines? (You might have chosen
>>> the wrong column from predict_proba's result, for example)
>>>
>>> On Tue, Aug 4, 2015 at 4:51 PM, Herbert Schulz <hrbrt....@gmail.com>
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> I'm computing the AUC for some data...
>>>>
>>>>
>>>> The classification target is 1 or 0. And i have a lot of 0's ( 5600)
>>>> and just 700 1's as a target.
>>>>
>>>> My AUC is about 0.097...
>>>>
>>>> where y_test are a vector containing 1's and 0's  and auc is containg
>>>> the predict_proba values
>>>>
>>>>  roc= metrics.roc_auc_score(y_test, auc).
>>>>
>>>>
>>>> Actually this value seems way to bad, because my ballance accuracy is
>>>> about 0.77... i thought that I'm Doing maybe something wrong.
>>>>
>>>>
>>>> report:
>>>>
>>>>              precision    recall  f1-score   support
>>>>
>>>>         0.0       0.95      0.91      0.93       537
>>>>         1.0       0.49      0.63      0.55        73
>>>>
>>>> avg / total       0.89      0.88      0.88       610
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] AUC realy low

Reply via email to