[R] Logistic regression model + precision/recall
Hi, I am using logistic regression model named lrm(Design) Rite now I was using Area Under Curve (AUC) for testing my model. But, now I have to calculate precision/recall of the model on test cases. For lrm, precision and recal would be simply defined with the help of 2 terms below: True Positive (TP) - Number of test cases where class 1 is given probability = 0.5. False Negative (FP) - Number of test cases where class 0 is given probability = 0.5. Precision = TP / (TP + FP) Recall = TP / ( Number of Positive Samples in test data) Any help is appreciated. I an write a long code with for loops and all, but is there any inbuild function or just few commands that would do the task. regards, Nitin [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic regression model + precision/recall
On 1/24/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: Why 0.5? The probability has to adjusted based on some hit and trials. I just mentioned it as an example Those are improper scoring rules that can be tricked. If the outcome is rare (say 0.02 incidence) you could just predict that no one will have the outcome and be correct 0.98 of the time. I suggest validating the model for discrimination (e.g., AUC) and calibration. I just have to calculate precision/recall for rare outcome. If the positive outcome is rare ( say 0.02 incidence) and I predict it to be negative all the time, my recall would be 0, which is bad. So, precision and recall can take care of skewed data. Frank [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic regression model + precision/recall
On 1/24/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: nitin jindal wrote: Using a cutoff is not a good idea unless the utility (loss) function is discontinuous and is the same for every subject (in the medical field utilities are almost never constant). And if you are using the data to find the cutoff, this will require bootstrapping to penalize for the cutoff not being pre-specified. Thnx for this info. If I still have to use cutoff, I will do bootstrapping. I dont know any alternative to this to compute precision/recall for logistic regression model. No, that is not clear. The overall classification error would only be 0.02 in that case. It is true though that one of the two conditional probabilities would not be good. I forgot to mention that for my data, overall classification error is non-significant. I am only interested in precision/recall for rare outcome. nitin Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic regression model + precision/recall
Hi. Thnx a lot. I will try that. nitin On 1/24/07, Tobias Sing [EMAIL PROTECTED] wrote: Maybe ROCR might help you. You can visualize the prec/rec-trade-off across the range of all cutoffs: assuming your numerical predictions are in scores and the true class labels are in classes: pred - prediction( scores, classes ) perf - performance(pred, 'rec','prec') plot(perf) HTH, Tobias On 1/24/07, nitin jindal [EMAIL PROTECTED] wrote: Hi, I am using logistic regression model named lrm(Design) Rite now I was using Area Under Curve (AUC) for testing my model. But, now I have to calculate precision/recall of the model on test cases. For lrm, precision and recal would be simply defined with the help of 2 terms below: True Positive (TP) - Number of test cases where class 1 is given probability = 0.5. False Negative (FP) - Number of test cases where class 0 is given probability = 0.5. Precision = TP / (TP + FP) Recall = TP / ( Number of Positive Samples in test data) Any help is appreciated. I an write a long code with for loops and all, but is there any inbuild function or just few commands that would do the task. regards, Nitin [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tobias Sing Computational Biology and Applied Algorithmics Max Planck Institute for Informatics Saarbrucken, Germany Phone: +49 681 9325 315 Fax: +49 681 9325 399 http://www.tobiassing.net [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Divergence or Singularity
Hi, I am running a logistic regression model and then cross-validating it. But, 10-fold cross validation returns with following error message. Divergence or singularity in 5 samples Actually, I tried reading definition of divergence and singularity, but could not understand it in context of my error. Could some one please explain it to me in simpler words. This also begs another question: Is this right place to ask questions about general models like logistic regression, etc (nothing fancy, just simple questions) regards, Nitin [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logistic regression model + Cross-Validation
Hi, I am trying to cross-validate a logistic regression model. I am using logistic regression model (lrm) of package Design. f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE) val - validate.lrm(f, method=cross, B=5) My class cy has values 0 and 1. val variable will give me indicators like slope and AUC. But, I also need the vector of predicted values of class variable cy for each record while cross-validation, so that I can manually look at the results. So, is there any way to get those probabilities assigned to each class. regards, Nitin [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression model + Cross-Validation
If validate.lrm does not has this option, do any other function has it. I will certainly look into your advice on cross validation. Thnx. nitin On 1/21/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: nitin jindal wrote: Hi, I am trying to cross-validate a logistic regression model. I am using logistic regression model (lrm) of package Design. f - lrm( cy ~ x1 + x2, x=TRUE, y=TRUE) val - validate.lrm(f, method=cross, B=5) val - validate(f, ...)# .lrm not needed My class cy has values 0 and 1. val variable will give me indicators like slope and AUC. But, I also need the vector of predicted values of class variable cy for each record while cross-validation, so that I can manually look at the results. So, is there any way to get those probabilities assigned to each class. regards, Nitin No, validate.lrm does not have that option. Manually looking at the results will not be easy when you do enough cross-validations. A single 5-fold cross-validation does not provide accurate estimates. Either use the bootstrap or repeat k-fold cross-validation between 20 and 50 times. k is often 10 but the optimum value may not be 10. Code for averaging repeated cross-validations is in http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf along with simulations of bootstrap vs. a few cross-validation methods for binary logistic models. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.