Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Dear List and Frank, I have calculated the log-odds for my models but maybe i am not getting something but i am not understanding how for a categorical factor this helps? On all the examples i have see it relates to continuous factors where moving from one number to another shows either a increase or decrease, not as in my case a change of catagory. Furthermore, this gives the values for each factor independent of each other, how do i get the log-odds for the entire model? I appreciate i maybe trying to put things in boxes again, i am not i am happy to report the log odds of moving from one response level to the next but would like it for all the factors together not independently. John Low HighDiff. Effect S.E.Lower Upper WO Woody:Non_woody 1 2 NA 0.280.16-0.04 0.6 Odds Ratio 1 2 NA 1.32NA 0.961.82 PD Abiotic:Biotic 2 1 NA -1.21 0.13-1.47 -0.96 Odds Ratio 2 1 NA 0.3 NA 0.230.38 ALT All:Low 3 1 NA 0.470.190.110.84 Odds Ratio 3 1 NA 1.6 NA 1.112.31 ALT High:Low3 2 NA -0.07 0.14-0.35 0.21 Odds Ratio 3 2 NA 0.93NA 0.7 1.24 ALT Mid:Low 3 4 NA 0.390.150.1 0.67 Odds Ratio 3 4 NA 1.48NA 1.111.96 REG Two_plus:One1 2 NA -0.59 0.13-0.84 -0.34 Odds Ratio 1 2 NA 0.55NA 0.430.72 BIO Arctic:Subtropical/Tropical 4 1 NA -1.02 0.81-2.61 0.58 Odds Ratio 4 1 NA 0.36NA 0.071.78 BIO Boreal:Subtropical/Tropical 4 2 NA -1.21 0.81-2.79 0.37 Odds Ratio 4 2 NA 0.3 NA 0.061.44 BIO Mediterranean:Subtropical/Tropical 4 3 NA -1.89 0.48-2.83 -0.95 Odds Ratio 4 3 NA 0.15NA 0.060.39 BIO Temperate:Subtropical/Tropical 4 5 NA -0.09 0.16-0.41 0.23 Odds Ratio 4 5 NA 0.91NA 0.661.26 On 3 Oct 2010, at 15:29, Frank Harrell wrote: You still seem to be hung up on making arbitrary classifications. Instead, look at tendencies using odds ratios or rank correlation measures. My book Regression Modeling Strategies covers this. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
I may be missing a point, but the proportional odds model easily gives you odds ratios for Y>=j (independent of j by PO assumption). Other options include examining a rank correlation between the linear predictor and Y, or (if Y is numeric and spacings between categories are meaningful) you can get predicted mean Y (see the Mean.lrm in the R rms package, a replacement for the Design package). Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2954274.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
You still seem to be hung up on making arbitrary classifications. Instead, look at tendencies using odds ratios or rank correlation measures. My book Regression Modeling Strategies covers this. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Thanks Frank and Greg, This makes alot more sense to me now. I appreciate you are both very busy, but i was wondering if i could trouble you for one last piece of advice. As my data is a little complicated for a first effort at R let alone modelling! The response is on a range from 1-6, which indicates extinction risk - 1 being least concern and 6 being critical - hence using a ordinal model The factors (6) are categorical - FRUIT TYPE - fleshy/dry HABITAT - terrestrial, aquatic, epiphyte etc etc I am asking the question - How do different combinations of factors effect extinction risk. Based on what you have both said i have called > predict(model1, type="fitted") Would this be the best way predicting the probability of falling into each response category - y>=2y>=3 y>=4 y>=5 y>=6 10.502220616 0.410236021 0.2892270912 0.2191420568 0.1774250519 20.745221699 0.668501579 0.5412223837 0.4486151612 0.3847379442 30.720381333 0.639796647 0.5095814746 0.4174618165 0.3551631876 40.752321112 0.676811675 0.5505781183 0.4579680710 0.3937100283 50.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 60.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 70.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 80.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 90.526291649 0.433739868 0.3094355120 0.2360800803 0.1919312111 I have 100 species for which i have their factors and i want to predict their response, so if i do the above and use the newdata function, and present the probabilities as above rather than trying to classify them? I tried polr and that "classified" each response as either 1 or 6 i.e no 2,3,4,5 - as did calling predict(model1, type="fitted.ind") which resulted in the probabilities of being 1 or 6 far outweighing 2,3,4,5 (Below) - this may just be that my model is not powefull enough to discrimate effectively as i know that is incorrect ( Brier score 2.01, AUC 66.9)? EXTINCTION=1 EXTINCTION=2 EXTINCTION=3 EXTINCTION=4 EXTINCTION=5 EXTINCTION=6 1 0.4977794 0.0919845942 0.121008930 0.070085034 0.0417170048 0.1774250519 2 0.2547783 0.0767201200 0.127279196 0.092607223 0.0638772170 0.3847379442 3 0.2796187 0.0805846862 0.130215173 0.092119658 0.0622986289 0.3551631876 4 0.2476789 0.0755094367 0.126233557 0.092610047 0.0642580427 0.3937100283 5 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 6 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 7 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 8 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 9 0.4737084 0.0925517814 0.124304356 0.073355432 0.0441488692 0.1919312111 10 0.2489307 0.0757263892 0.126424896 0.092614323 0.0641934484 0.3921102030 Thanks very much for any advice given, John 10 0.751069260 0.675342871 0.5489179746 0.4563036514 0.3921102030 On 1 Oct 2010, at 23:13, Frank Harrell wrote: Well put Greg. The job of the statistician is to produce good estimates (probabilities in this case). Those cannot be translated into action without subject-specific utility functions. Classification during the analysis or publication stage is not necessary. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Well put Greg. The job of the statistician is to produce good estimates (probabilities in this case). Those cannot be translated into action without subject-specific utility functions. Classification during the analysis or publication stage is not necessary. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
I have this discussion fairly often with doctors that I work with. The issue is that you can certainly predict from a model, but you can predict on different scales. Let's consider the simpler case of just 2 outcomes (disease yes/no): Let's say you have 4 patients that you want to predict their disease status using their symptoms and a model, on the probability scale patient A is predicted to have 5% chance of yes, patient B is 49%, patient C is 51% and patient D is 95% probability of yes. If we collapse this to just a prediction of yes/no then that means that we will treat A and B the same with a prediction of NO and patients C and D the same with a prediction of YES. But does it really make sense to treat B and C so differently (they are only 2 percentage points different) while treating them the same as A or D? If I were one of the patients I would want to know whether my probability of disease was 51% or 95%, not just a yes. With 3 groups wouldn't you want to know the difference between 33%, 33%, 34% and 2%, 8%, 90%? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of peterfran...@me.com > Sent: Friday, October 01, 2010 8:23 AM > To: Frank Harrell > Cc: r-help@r-project.org > Subject: Re: [R] Interpreting the example given by Frank Harrell in the > predict.lrm {Design} help > > The reason I am trying to assign them is because I have a data set > where i have arrived at the most likely model that describes the data > and now I have another dataset where I know the factors but not the > response. > > Therefore, surely I need to assign the predicted values to a response > in order to say something like: > > Based on the model I believe unknown 1 is good, where as unknown 2 is > very good etc? > > Maybe I am missing something or using the wrong approach but I thought > the main purpose of using the predict function on new data was to > "predict" the response? > > Peter > > On 1 Oct 2010, at 14:51, Frank Harrell > wrote: > > > > > Why assign them at all? Is this a "forced choice at gunpoint" > problem? > > Remember what probabilities mean. > > > > Frank > > > > - > > Frank Harrell > > Department of Biostatistics, Vanderbilt University > > -- > > View this message in context: > http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank- > Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html > > Sent from the R help mailing list archive at Nabble.com. > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
The reason I am trying to assign them is because I have a data set where i have arrived at the most likely model that describes the data and now I have another dataset where I know the factors but not the response. Therefore, surely I need to assign the predicted values to a response in order to say something like: Based on the model I believe unknown 1 is good, where as unknown 2 is very good etc? Maybe I am missing something or using the wrong approach but I thought the main purpose of using the predict function on new data was to "predict" the response? Peter On 1 Oct 2010, at 14:51, Frank Harrell wrote: > > Why assign them at all? Is this a "forced choice at gunpoint" problem? > Remember what probabilities mean. > > Frank > > - > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: > http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Why assign them at all? Is this a "forced choice at gunpoint" problem? Remember what probabilities mean. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Frank, Thats great thanks for the advice, i appreciate that brier score, AUC etc are a better method of validation and discrimination but when it comes to predictions of new data > d <- data.frame(x1=c(.1,.5),x2=c(.5,.15)) > predict(f, d, type="fitted.ind") > > y=good y=bettery=best > 1 0.3199710 0.3560355 0.3239935 > 2 0.4153257 0.3437086 0.2409657 > > predict mean(y) using codes 1,2,3 > > >> predict(f, d, type='mean', codes=TRUE) > >12 > 2.004022 1.825640 How do i use this information to assign x1 and x2 into a category on the response scale (good,better,best?) Thanks John On 1 Oct 2010, at 12:14, Frank Harrell wrote: John, Don't conclude that one category is the most probable when its probability of being equaled or exceeded is a maximum. The first category would always be the winner if that were the case. When you say y=best remember that you are dealing with a probability model. Nothing is forcing you to classify an observation, and unless the category's probability is high, this may be dangerous. You might do well to consider a more smooth approach such as using the generalized roc area (C-index) or its related rank correlation measure Dxy. Also there are odds ratios. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
John, Don't conclude that one category is the most probable when its probability of being equaled or exceeded is a maximum. The first category would always be the winner if that were the case. When you say y=best remember that you are dealing with a probability model. Nothing is forcing you to classify an observation, and unless the category's probability is high, this may be dangerous. You might do well to consider a more smooth approach such as using the generalized roc area (C-index) or its related rank correlation measure Dxy. Also there are odds ratios. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.