[R] Logistic regression model + precision/recall

2007-01-24 Thread nitin jindal
Hi,

I am using logistic regression model named lrm(Design)

Rite now I was using Area Under Curve (AUC) for testing my model. But, now I
have to calculate precision/recall of the model on test cases.
For lrm, precision and recal would be simply defined with the help of 2
terms below:
True Positive (TP) - Number of test cases where class 1 is given probability
= 0.5.
False Negative (FP) - Number of test cases where class 0 is given
probability = 0.5.

Precision = TP / (TP + FP)
Recall = TP / ( Number of Positive Samples in test data)

Any help is appreciated.

I an write a long code with for loops and all, but is there any inbuild
function or just few commands that would do the task.

regards,
Nitin

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression model + precision/recall

2007-01-24 Thread nitin jindal
On 1/24/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:

 Why 0.5?


The probability has to adjusted based on some hit and trials. I just
mentioned it as an example


 Those are improper scoring rules that can be tricked.  If the outcome is
 rare (say 0.02 incidence) you could just predict that no one will have
 the outcome and be correct 0.98 of the time.  I suggest validating the
 model for discrimination (e.g., AUC) and calibration.


I just have to calculate precision/recall for rare outcome. If the positive
outcome is rare ( say 0.02 incidence) and I predict it to be negative all
the time, my recall would be 0, which is bad. So, precision and recall can
take care of skewed data.

Frank


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression model + precision/recall

2007-01-24 Thread nitin jindal
On 1/24/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:

 nitin jindal wrote:

 Using a cutoff is not a good idea unless the utility (loss) function is
 discontinuous and is the same for every subject (in the medical field
 utilities are almost never constant).  And if you are using the data to
 find the cutoff, this will require bootstrapping to penalize for the
 cutoff not being pre-specified.


Thnx for this info. If I still have to use cutoff, I will do bootstrapping.
I dont know any alternative to this to compute precision/recall for logistic
regression model.

No, that is not clear.  The overall classification error would only be
 0.02 in that case.  It is true though that one of the two conditional
 probabilities would not be good.


I forgot to mention that for my data, overall classification error is
non-significant. I am only interested in precision/recall for rare outcome.

nitin


  Frank

 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression model + precision/recall

2007-01-24 Thread nitin jindal
Hi.

Thnx a lot. I will try that.

nitin

On 1/24/07, Tobias Sing [EMAIL PROTECTED] wrote:

 Maybe ROCR might help you.
 You can visualize the prec/rec-trade-off across the range of all cutoffs:
 assuming your numerical predictions are in scores and the true class
 labels are in classes:
 pred - prediction( scores, classes )
 perf - performance(pred, 'rec','prec')
 plot(perf)

 HTH,
   Tobias

 On 1/24/07, nitin jindal [EMAIL PROTECTED] wrote:
  Hi,
 
  I am using logistic regression model named lrm(Design)
 
  Rite now I was using Area Under Curve (AUC) for testing my model. But,
 now I
  have to calculate precision/recall of the model on test cases.
  For lrm, precision and recal would be simply defined with the help of 2
  terms below:
  True Positive (TP) - Number of test cases where class 1 is given
 probability
  = 0.5.
  False Negative (FP) - Number of test cases where class 0 is given
  probability = 0.5.
 
  Precision = TP / (TP + FP)
  Recall = TP / ( Number of Positive Samples in test data)
 
  Any help is appreciated.
 
  I an write a long code with for loops and all, but is there any inbuild
  function or just few commands that would do the task.
 
  regards,
  Nitin
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 Tobias Sing
 Computational Biology and Applied Algorithmics
 Max Planck Institute for Informatics
 Saarbrucken, Germany
 Phone: +49 681 9325 315
 Fax: +49 681 9325 399
 http://www.tobiassing.net


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Divergence or Singularity

2007-01-23 Thread nitin jindal
Hi,

I am running a logistic regression model and then cross-validating it. But,
10-fold cross validation returns with following error message.

Divergence or singularity in 5 samples

Actually, I tried reading definition of divergence and singularity, but
could not understand it in context of my error.

Could some one please explain it to me in simpler words.

This also begs another question: Is this right place to ask questions about
general models like logistic regression, etc (nothing fancy, just simple
questions)

regards,
Nitin

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression model + Cross-Validation

2007-01-21 Thread nitin jindal
Hi,

I am trying to cross-validate a logistic regression model.
I am using logistic regression model (lrm) of package Design.

f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
val - validate.lrm(f, method=cross, B=5)

My class cy has values 0 and 1.

val variable will give me indicators like slope and AUC. But, I also need
the vector of predicted values of class variable cy for each record while
cross-validation, so that I can manually look at the results. So, is there
any way to get those probabilities assigned to each class.

regards,
Nitin

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression model + Cross-Validation

2007-01-21 Thread nitin jindal
If validate.lrm does not has this option, do any other function has it.
I will certainly look into your advice on cross validation. Thnx.

nitin

On 1/21/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:

 nitin jindal wrote:
  Hi,
 
  I am trying to cross-validate a logistic regression model.
  I am using logistic regression model (lrm) of package Design.
 
  f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
  val - validate.lrm(f, method=cross, B=5)

 val - validate(f, ...)# .lrm not needed

 
  My class cy has values 0 and 1.
 
  val variable will give me indicators like slope and AUC. But, I also
 need
  the vector of predicted values of class variable cy for each record
 while
  cross-validation, so that I can manually look at the results. So, is
 there
  any way to get those probabilities assigned to each class.
 
  regards,
  Nitin

 No, validate.lrm does not have that option.  Manually looking at the
 results will not be easy when you do enough cross-validations.  A single
 5-fold cross-validation does not provide accurate estimates.  Either use
 the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
   k is often 10 but the optimum value may not be 10.  Code for averaging
 repeated cross-validations is in
 http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
 along with simulations of bootstrap vs. a few cross-validation methods
 for binary logistic models.

 Frank
 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.