Re: [R] Calculate Specificity and Sensitivity for a given threshold value

2008-11-17 Thread Kaliss

Thanks to you both
-- 
View this message in context: 
http://www.nabble.com/Calculate-Specificity-and-Sensitivity-for-a-given-threshold-value-tp20481633p20541110.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculate Specificity and Sensitivity for a given threshold value

2008-11-13 Thread Kaliss

Hi list,


I'm new to R and I'm currently using ROCR package.
Data in input look like this:

DIAGNOSIS   SCORE
1   0.387945
1   0.50405
1   0.435667
1   0.358057
1   0.583512
1   0.387945
1   0.531795
1   0.527148
0   0.526397
0   0.372935
1   0.861097

And I run the following simple code:
d - read.table(inputFile, header=TRUE);
pred - prediction(d$SCORE, d$DIAGNOSIS);
perf - performance( pred, tpr, fpr);
plot(perf)

So building the curve works easily.
My question is: can I have the specificity and the sensitivity for
a score threshold = 0.5 (for example)? How do I compute this ? 

Thank you in advance
-- 
View this message in context: 
http://www.nabble.com/Calculate-Specificity-and-Sensitivity-for-a-given-threshold-value-tp20481633p20481633.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate Specificity and Sensitivity for a given threshold value

2008-11-13 Thread Frank E Harrell Jr

Kaliss wrote:

Hi list,


I'm new to R and I'm currently using ROCR package.
Data in input look like this:

DIAGNOSIS   SCORE
1   0.387945
1   0.50405
1   0.435667
1   0.358057
1   0.583512
1   0.387945
1   0.531795
1   0.527148
0   0.526397
0   0.372935
1   0.861097

And I run the following simple code:
d - read.table(inputFile, header=TRUE);
pred - prediction(d$SCORE, d$DIAGNOSIS);
perf - performance( pred, tpr, fpr);
plot(perf)

So building the curve works easily.
My question is: can I have the specificity and the sensitivity for
a score threshold = 0.5 (for example)? How do I compute this ? 


Thank you in advance


Beware of the utility/loss function you are implicitly assuming with 
this approach.  It is quite oversimplified.  In clinical practice the 
cost of a false positive or false negative (which comes from a cost 
function and the simple forward probability of a positive diagnosis, 
e.g., from a basic logistic regression model if you start with a cohort 
study) vary with the type of patient being diagnosed.


Frank

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate Specificity and Sensitivity for a given threshold value

2008-11-13 Thread Pierre-Jean-EXT.Breton
Hi Frank,

Thank you for your answer. 
In fact, I don't use this for clinical research practice.
I am currently testing several scoring methods and I'd like
to know which one is the most effective and which threshold
value I should apply to discriminate positives and negatives.
So, any idea for my problem ?

Pierre-Jean

-Original Message-
From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 13, 2008 5:00 PM
To: Breton, Pierre-Jean-EXT RD/FR
Cc: r-help@r-project.org
Subject: Re: [R] Calculate Specificity and Sensitivity for a given
threshold value

Kaliss wrote:
 Hi list,
 
 
 I'm new to R and I'm currently using ROCR package.
 Data in input look like this:
 
 DIAGNOSIS SCORE
 1 0.387945
 1 0.50405
 1 0.435667
 1 0.358057
 1 0.583512
 1 0.387945
 1 0.531795
 1 0.527148
 0 0.526397
 0 0.372935
 1 0.861097
 
 And I run the following simple code:
 d - read.table(inputFile, header=TRUE); pred - prediction(d$SCORE,

 d$DIAGNOSIS); perf - performance( pred, tpr, fpr);
 plot(perf)
 
 So building the curve works easily.
 My question is: can I have the specificity and the sensitivity for a 
 score threshold = 0.5 (for example)? How do I compute this ?
 
 Thank you in advance

Beware of the utility/loss function you are implicitly assuming with
this approach.  It is quite oversimplified.  In clinical practice the
cost of a false positive or false negative (which comes from a cost
function and the simple forward probability of a positive diagnosis,
e.g., from a basic logistic regression model if you start with a cohort
study) vary with the type of patient being diagnosed.

Frank

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt
University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate Specificity and Sensitivity for a given threshold value

2008-11-13 Thread N. Lapidus
Hi Pierre-Jean,

Sensitivity (Se) and specificity (Sp) are calculated for cutoffs stored in
the performance x.values of your prediction for Se and Sp:

For example, let's generate the performance for Se and Sp:
sens - performance(pred,sens)
spec - performance(pred,spec)

Now, you can have acces to:
[EMAIL PROTECTED] # (or [EMAIL PROTECTED]), which is the list of cutoffs
[EMAIL PROTECTED] # for the corresponding Se
[EMAIL PROTECTED] # for the corresponding Sp

You can for example sum up this information in a table:
(SeSp - cbind ([EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED]))

You can also write a function to give Se and Sp for a specific cutoff, but
you will have to define what to do for cutoffs not stored in the list. For
example, the following function keeps the closest stored cutoff to give
corresponding Se and Sp (but this is not always the best solution, you may
want to define your own way to interpolate):

se.sp - function (cutoff, performance){
sens - performance(pred,sens)
spec - performance(pred,spec)
num.cutoff - which.min(abs([EMAIL PROTECTED] - cutoff))
return(list([EMAIL PROTECTED],
[EMAIL PROTECTED], [EMAIL PROTECTED]
[[1]][num.cutoff]))
}

se.sp(.5, pred)

Hope this helps,

Nael


On Thu, Nov 13, 2008 at 5:59 PM,
[EMAIL PROTECTED]wrote:

 Hi Frank,

 Thank you for your answer.
 In fact, I don't use this for clinical research practice.
 I am currently testing several scoring methods and I'd like
 to know which one is the most effective and which threshold
 value I should apply to discriminate positives and negatives.
 So, any idea for my problem ?

 Pierre-Jean

 -Original Message-
 From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
 Sent: Thursday, November 13, 2008 5:00 PM
 To: Breton, Pierre-Jean-EXT RD/FR
 Cc: r-help@r-project.org
 Subject: Re: [R] Calculate Specificity and Sensitivity for a given
 threshold value

 Kaliss wrote:
  Hi list,
 
 
  I'm new to R and I'm currently using ROCR package.
  Data in input look like this:
 
  DIAGNOSIS SCORE
  1 0.387945
  1 0.50405
  1 0.435667
  1 0.358057
  1 0.583512
  1 0.387945
  1 0.531795
  1 0.527148
  0 0.526397
  0 0.372935
  1 0.861097
 
  And I run the following simple code:
  d - read.table(inputFile, header=TRUE); pred - prediction(d$SCORE,

  d$DIAGNOSIS); perf - performance( pred, tpr, fpr);
  plot(perf)
 
  So building the curve works easily.
  My question is: can I have the specificity and the sensitivity for a
  score threshold = 0.5 (for example)? How do I compute this ?
 
  Thank you in advance

 Beware of the utility/loss function you are implicitly assuming with
 this approach.  It is quite oversimplified.  In clinical practice the
 cost of a false positive or false negative (which comes from a cost
 function and the simple forward probability of a positive diagnosis,
 e.g., from a basic logistic regression model if you start with a cohort
 study) vary with the type of patient being diagnosed.

 Frank

 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt
 University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate Specificity and Sensitivity for a given threshold value

2008-11-13 Thread Frank E Harrell Jr

[EMAIL PROTECTED] wrote:

Hi Frank,

Thank you for your answer. 
In fact, I don't use this for clinical research practice.

I am currently testing several scoring methods and I'd like
to know which one is the most effective and which threshold
value I should apply to discriminate positives and negatives.
So, any idea for my problem ?


The use of thresholds gets in the way of finding a good solution because 
you will have predictor values in the gray zone.  I tend to rank 
methods by the most sensitive index available such as the log likelihood 
in the binary logistic model.  You can extend ordinary logistic models 
to allow for nonlinear effects on the log odds scale using regression 
splines.


Frank



Pierre-Jean

-Original Message-
From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 13, 2008 5:00 PM

To: Breton, Pierre-Jean-EXT RD/FR
Cc: r-help@r-project.org
Subject: Re: [R] Calculate Specificity and Sensitivity for a given
threshold value

Kaliss wrote:

Hi list,


I'm new to R and I'm currently using ROCR package.
Data in input look like this:

DIAGNOSIS   SCORE
1   0.387945
1   0.50405
1   0.435667
1   0.358057
1   0.583512
1   0.387945
1   0.531795
1   0.527148
0   0.526397
0   0.372935
1   0.861097

And I run the following simple code:
d - read.table(inputFile, header=TRUE); pred - prediction(d$SCORE,



d$DIAGNOSIS); perf - performance( pred, tpr, fpr);
plot(perf)

So building the curve works easily.
My question is: can I have the specificity and the sensitivity for a 
score threshold = 0.5 (for example)? How do I compute this ?


Thank you in advance


Beware of the utility/loss function you are implicitly assuming with
this approach.  It is quite oversimplified.  In clinical practice the
cost of a false positive or false negative (which comes from a cost
function and the simple forward probability of a positive diagnosis,
e.g., from a basic logistic regression model if you start with a cohort
study) vary with the type of patient being diagnosed.

Frank




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate Specificity and Sensitivity for a given threshold value

2008-11-13 Thread Frank E Harrell Jr

N. Lapidus wrote:

Hi Pierre-Jean,

Sensitivity (Se) and specificity (Sp) are calculated for cutoffs stored in
the performance x.values of your prediction for Se and Sp:

For example, let's generate the performance for Se and Sp:
sens - performance(pred,sens)
spec - performance(pred,spec)

Now, you can have acces to:
[EMAIL PROTECTED] # (or [EMAIL PROTECTED]), which is the list of cutoffs
[EMAIL PROTECTED] # for the corresponding Se
[EMAIL PROTECTED] # for the corresponding Sp

You can for example sum up this information in a table:
(SeSp - cbind ([EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED]))

You can also write a function to give Se and Sp for a specific cutoff, but
you will have to define what to do for cutoffs not stored in the list. For
example, the following function keeps the closest stored cutoff to give
corresponding Se and Sp (but this is not always the best solution, you may
want to define your own way to interpolate):

se.sp - function (cutoff, performance){
sens - performance(pred,sens)
spec - performance(pred,spec)
num.cutoff - which.min(abs([EMAIL PROTECTED] - cutoff))
return(list([EMAIL PROTECTED],
[EMAIL PROTECTED], [EMAIL PROTECTED]
[[1]][num.cutoff]))


That is a biased procedure (like how stepwise regression results in 
overfitting).  It also uses a strange loss function.  The bootstrap 
would need to be used to penalize for the uncertainty in the cutoff. 
You are also assuming that a cutoff exists, which is a major assumption.


Frank


}

se.sp(.5, pred)

Hope this helps,

Nael


On Thu, Nov 13, 2008 at 5:59 PM,
[EMAIL PROTECTED]wrote:


Hi Frank,

Thank you for your answer.
In fact, I don't use this for clinical research practice.
I am currently testing several scoring methods and I'd like
to know which one is the most effective and which threshold
value I should apply to discriminate positives and negatives.
So, any idea for my problem ?

Pierre-Jean

-Original Message-
From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 13, 2008 5:00 PM
To: Breton, Pierre-Jean-EXT RD/FR
Cc: r-help@r-project.org
Subject: Re: [R] Calculate Specificity and Sensitivity for a given
threshold value

Kaliss wrote:

Hi list,


I'm new to R and I'm currently using ROCR package.
Data in input look like this:

DIAGNOSIS SCORE
1 0.387945
1 0.50405
1 0.435667
1 0.358057
1 0.583512
1 0.387945
1 0.531795
1 0.527148
0 0.526397
0 0.372935
1 0.861097

And I run the following simple code:
d - read.table(inputFile, header=TRUE); pred - prediction(d$SCORE,
d$DIAGNOSIS); perf - performance( pred, tpr, fpr);
plot(perf)

So building the curve works easily.
My question is: can I have the specificity and the sensitivity for a
score threshold = 0.5 (for example)? How do I compute this ?

Thank you in advance

Beware of the utility/loss function you are implicitly assuming with
this approach.  It is quite oversimplified.  In clinical practice the
cost of a false positive or false negative (which comes from a cost
function and the simple forward probability of a positive diagnosis,
e.g., from a basic logistic regression model if you start with a cohort
study) vary with the type of patient being diagnosed.

Frank

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt
University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.