Re: [scikit-learn] Scikit Learn Random Classifier - TPR and FPR plotted on matplotlib

2016-12-14 Thread Jacob Schreiber
To make a proper ROC curve you need to test all possible thresholds, not
just a subset of them. You can do this easily in sklearn.

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, roc_auc_score

...  ...

y_pred = clf.predict_proba(X)
fpr, tpr, _ = roc_curve(y_true, y_pred)
auc = roc_auc_score(y_true, y_pred)
plt.plot(fpr, tpr, label=auc)

On Wed, Dec 14, 2016 at 8:52 AM, Stuart Reynolds 
wrote:

> You're looking at a tiny subset of the possible cutoff thresholds for this
> classifier.
> Lower thresholds will give higher tot at the expense of tpr.
> Usually, AUC is computed at the integral of this graph over the whole
> range of FPRs (from zero to one).
>
> If you have your classifier output probabilities or activations, the
> maximum and minimum of these values will tell you what the largest and
> smallest thresholds should be. Scikit also has a function to directly
> receive the activations and true classes and compute the AUC and tpr/fpr
> curve.
>
> On Wed, Dec 14, 2016 at 5:12 AM Dale T Smith 
> wrote:
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> I think you need to look at the examples.
>>
>>
>>
>>
>>
>>
>>
>>
>> 
>> 
>> __
>>
>>
>> *Dale T. Smith*
>>
>> *|* Macy's Systems and Technology
>>
>> *|* IFS eCom CSE Data Science
>>
>>
>>
>>
>> 5985 State Bridge Road, Johns Creek, GA 30097 *|* dale.t.sm...@macys.com
>>
>>
>>
>>
>>
>> *From:* scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=
>> macys@python.org]
>>
>> *On Behalf Of *Debabrata Ghosh
>>
>>
>> *Sent:* Wednesday, December 14, 2016 3:13 AM
>>
>>
>> *To:* Scikit-learn user and developer mailing list
>>
>>
>> *Subject:* [scikit-learn] Scikit Learn Random Classifier - TPR and FPR
>> plotted on matplotlib
>>
>>
>>
>>
>>
>>
>>
>> ⚠ EXT MSG:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Hi All,
>>
>>
>>
>>
>>   I have run scikit-learn Random Forest Classifier
>> algorithm against a dataset and here is my TPR and FPR against various
>> thresholds:
>>
>>
>>
>>
>>
>> [image: Inline image 1]
>>
>>
>>
>>
>> Further I have plotted the above values in matplotlib and am getting a
>> very low AUC. Here is my matplotlib code. Can I understand the
>> interpretation of the graph from you please.Is my model Ok or is there
>> something wrong ? Appreciate for
>>
>> a quick response please.
>>
>>
>>
>>
>>
>> import matplotlib.pyplot as plt
>>
>>
>> import numpy as np
>>
>>
>> from sklearn import metrics
>>
>>
>> plt.title('Receiver Operating Characteristic')
>>
>>
>> plt.ylabel('True Positive Rate')
>>
>>
>> plt.xlabel('False Positive Rate')
>>
>>
>> fpr = [0.0002337345394340,0.0001924870472260,0.0001626973851550,0.
>> 950977673794,
>>
>>
>>0.721826427097,0.538505429739,0.389557119386,0.
>> 263523933702,
>>
>>
>>0.137490748018]
>>
>>
>>
>>
>>
>> tpr = [0.196736382441,0.189841415766,0.
>> 181222707424,
>>
>>
>>0.170555108608,0.164348925411,0.
>> 157894736842,
>>
>>
>>0.151344518501,0.144104803493,0.
>> 132383360147]
>>
>>
>>
>>
>>
>> roc_auc = metrics.auc(fpr, tpr)
>>
>>
>>
>>
>>
>> plt.plot([0, 1], [0, 1],'r--')
>>
>>
>> plt.plot(fpr, tpr, 'bo-', label = 'AUC = %0.9f' % roc_auc)
>>
>>
>> plt.legend(loc = 'lower right')
>>
>>
>>
>>
>>
>> plt.show()
>>
>>
>>
>>
>>
>> [image: Inline image 2]
>>
>>
>>
>>
>>
>>
>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or
>> opening attachments.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
>>
>> scikit-learn mailing list
>>
>> scikit-learn@python.org
>>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Scikit Learn Random Classifier - TPR and FPR plotted on matplotlib

2016-12-14 Thread Stuart Reynolds
You're looking at a tiny subset of the possible cutoff thresholds for this
classifier.
Lower thresholds will give higher tot at the expense of tpr.
Usually, AUC is computed at the integral of this graph over the whole range
of FPRs (from zero to one).

If you have your classifier output probabilities or activations, the
maximum and minimum of these values will tell you what the largest and
smallest thresholds should be. Scikit also has a function to directly
receive the activations and true classes and compute the AUC and tpr/fpr
curve.

On Wed, Dec 14, 2016 at 5:12 AM Dale T Smith  wrote:

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> I think you need to look at the examples.
>
>
>
>
>
>
>
>
>
> __
>
>
> *Dale T. Smith*
>
> *|* Macy's Systems and Technology
>
> *|* IFS eCom CSE Data Science
>
>
>
>
> 5985 State Bridge Road, Johns Creek, GA 30097 *|* dale.t.sm...@macys.com
>
>
>
>
>
> *From:* scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=
> macys@python.org]
>
> *On Behalf Of *Debabrata Ghosh
>
>
> *Sent:* Wednesday, December 14, 2016 3:13 AM
>
>
> *To:* Scikit-learn user and developer mailing list
>
>
> *Subject:* [scikit-learn] Scikit Learn Random Classifier - TPR and FPR
> plotted on matplotlib
>
>
>
>
>
>
>
> ⚠ EXT MSG:
>
>
>
>
>
>
>
>
>
>
> Hi All,
>
>
>
>
>   I have run scikit-learn Random Forest Classifier
> algorithm against a dataset and here is my TPR and FPR against various
> thresholds:
>
>
>
>
>
> [image: Inline image 1]
>
>
>
>
> Further I have plotted the above values in matplotlib and am getting a
> very low AUC. Here is my matplotlib code. Can I understand the
> interpretation of the graph from you please.Is my model Ok or is there
> something wrong ? Appreciate for
>
> a quick response please.
>
>
>
>
>
> import matplotlib.pyplot as plt
>
>
> import numpy as np
>
>
> from sklearn import metrics
>
>
> plt.title('Receiver Operating Characteristic')
>
>
> plt.ylabel('True Positive Rate')
>
>
> plt.xlabel('False Positive Rate')
>
>
> fpr =
> [0.0002337345394340,0.0001924870472260,0.0001626973851550,0.950977673794,
>
>
>
> 0.721826427097,0.538505429739,0.389557119386,0.263523933702,
>
>
>0.137490748018]
>
>
>
>
>
> tpr =
> [0.196736382441,0.189841415766,0.181222707424,
>
>
>
> 0.170555108608,0.164348925411,0.157894736842,
>
>
>
> 0.151344518501,0.144104803493,0.132383360147]
>
>
>
>
>
> roc_auc = metrics.auc(fpr, tpr)
>
>
>
>
>
> plt.plot([0, 1], [0, 1],'r--')
>
>
> plt.plot(fpr, tpr, 'bo-', label = 'AUC = %0.9f' % roc_auc)
>
>
> plt.legend(loc = 'lower right')
>
>
>
>
>
> plt.show()
>
>
>
>
>
> [image: Inline image 2]
>
>
>
>
>
>
> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or
> opening attachments.
>
>
>
>
>
>
>
>
>
>
>
> ___
>
> scikit-learn mailing list
>
> scikit-learn@python.org
>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Renaming subject lines if you get a digest

2016-12-14 Thread Dale T Smith
Please rename subjects if you use the digest – now the thread is not complete 
in the archive. Others will have a harder time benefitting from answers.


__
Dale T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science
5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.sm...@macys.com

From: scikit-learn 
[mailto:scikit-learn-bounces+dale.t.smith=macys@python.org] On Behalf Of 
Graham Arthur Mackenzie
Sent: Tuesday, December 13, 2016 5:02 PM
To: scikit-learn@python.org
Subject: Re: [scikit-learn] scikit-learn Digest, Vol 9, Issue 42

⚠ EXT MSG:
Thanks for the speedy and helpful responses!

Actually, the thrust of my question was, "I'm assuming the fit() method for all 
three modules work the same way, so how come the example code for DTs differs 
from NB, SVMs?" Since you seem to be saying that it'll work either way, I'm 
assuming there's no real reason behind it, which was my suspicion, but just 
wanted to have it confirmed, as the inconsistency was conspicuous.

Thanks!
GAM

ps, My apologies if this is the improper way to respond to responses. I am 
receiving the Digest rather than individual messages, so this was the best I 
could think to do...

On Tue, Dec 13, 2016 at 12:38 PM, 
> wrote:
Send scikit-learn mailing list submissions to
scikit-learn@python.org

To subscribe or unsubscribe via the World Wide Web, visit
https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
scikit-learn-requ...@python.org

You can reach the person managing the list at
scikit-learn-ow...@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. Why do DTs have a different fit protocol than NB and SVMs?
  (Graham Arthur Mackenzie)
   2. Re: Why do DTs have a different fit protocol than NB and
  SVMs? (Jacob Schreiber)
   3. Re: Why do DTs have a different fit protocol than NB and
  SVMs? (Stuart Reynolds)
   4. Re: Why do DTs have a different fit protocol than NB and
  SVMs? (Vlad Niculae)


--

Message: 1
Date: Tue, 13 Dec 2016 12:14:43 -0800
From: Graham Arthur Mackenzie 
>
To: scikit-learn@python.org
Subject: [scikit-learn] Why do DTs have a different fit protocol than
NB and SVMs?
Message-ID:

>
Content-Type: text/plain; charset="utf-8"

Hello All,

I hope this is the right way to ask a question about documentation.

In the doc for Decision Trees
, the fit statement
is assigned back to the classifier:

clf = clf.fit(X, Y)

Whereas, for Naive Bayes

 and Support Vector Machines
,
it's just:

clf.fit(X, Y)

I assumed this was a typo, but thought I should try and verify such before
proceeding under that assumption. I appreciate any feedback you can provide.

Thank You and Be Well,
Graham
-- next part --
An HTML attachment was scrubbed...
URL: 


--

Message: 2
Date: Tue, 13 Dec 2016 12:23:00 -0800
From: Jacob Schreiber >
To: Scikit-learn user and developer mailing list
>
Subject: Re: [scikit-learn] Why do DTs have a different fit protocol
than NB and SVMs?
Message-ID:

>
Content-Type: text/plain; charset="utf-8"

The fit method returns the object itself, so regardless of which way you do
it, it will work. The reason the fit method returns itself is so that you
can chain methods, like "preds = clf.fit(X, y).predict(X)"

On Tue, Dec 13, 2016 at 12:14 PM, Graham Arthur Mackenzie <
graham.arthur.macken...@gmail.com> 
wrote:

> Hello All,
>
> I hope this is the right way to ask a question about documentation.
>
> In the doc for Decision Trees
> , the fit
> statement is assigned back to the 

Re: [scikit-learn] Scikit Learn Random Classifier - TPR and FPR plotted on matplotlib

2016-12-14 Thread Dale T Smith
I think you need to look at the examples.


__
Dale T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science
5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.sm...@macys.com

From: scikit-learn 
[mailto:scikit-learn-bounces+dale.t.smith=macys@python.org] On Behalf Of 
Debabrata Ghosh
Sent: Wednesday, December 14, 2016 3:13 AM
To: Scikit-learn user and developer mailing list
Subject: [scikit-learn] Scikit Learn Random Classifier - TPR and FPR plotted on 
matplotlib

⚠ EXT MSG:
Hi All,
  I have run scikit-learn Random Forest Classifier 
algorithm against a dataset and here is my TPR and FPR against various 
thresholds:

[Inline image 1]
Further I have plotted the above values in matplotlib and am getting a very low 
AUC. Here is my matplotlib code. Can I understand the interpretation of the 
graph from you please.Is my model Ok or is there something wrong ? Appreciate 
for a quick response please.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import metrics
plt.title('Receiver Operating Characteristic')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
fpr = 
[0.0002337345394340,0.0001924870472260,0.0001626973851550,0.950977673794,
   
0.721826427097,0.538505429739,0.389557119386,0.263523933702,
   0.137490748018]

tpr = [0.196736382441,0.189841415766,0.181222707424,
   0.170555108608,0.164348925411,0.157894736842,
   0.151344518501,0.144104803493,0.132383360147]

roc_auc = metrics.auc(fpr, tpr)

plt.plot([0, 1], [0, 1],'r--')
plt.plot(fpr, tpr, 'bo-', label = 'AUC = %0.9f' % roc_auc)
plt.legend(loc = 'lower right')

plt.show()

[Inline image 2]
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening 
attachments.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Scikit Learn Random Classifier - TPR and FPR plotted on matplotlib

2016-12-14 Thread Debabrata Ghosh
Hi All,
  I have run scikit-learn Random Forest Classifier
algorithm against a dataset and here is my TPR and FPR against various
thresholds:

[image: Inline image 1]

Further I have plotted the above values in matplotlib and am getting a very
low AUC. Here is my matplotlib code. Can I understand the interpretation of
the graph from you please.Is my model Ok or is there something wrong ?
Appreciate for a quick response please.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import metrics
plt.title('Receiver Operating Characteristic')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
fpr =
[0.0002337345394340,0.0001924870472260,0.0001626973851550,0.950977673794,

0.721826427097,0.538505429739,0.389557119386,0.263523933702,
   0.137490748018]

tpr = [0.196736382441,0.189841415766,0.181222707424,
   0.170555108608,0.164348925411,0.157894736842,
   0.151344518501,0.144104803493,0.132383360147]

roc_auc = metrics.auc(fpr, tpr)

plt.plot([0, 1], [0, 1],'r--')
plt.plot(fpr, tpr, 'bo-', label = 'AUC = %0.9f' % roc_auc)
plt.legend(loc = 'lower right')

plt.show()

[image: Inline image 2]
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn