Thanks Frank and Greg, This makes alot more sense to me now. I appreciate you are both very busy, but i was wondering if i could trouble you for one last piece of advice. As my data is a little complicated for a first effort at R let alone modelling!
The response is on a range from 1-6, which indicates extinction risk - 1 being least concern and 6 being critical - hence using a ordinal model The factors (6) are categorical - FRUIT TYPE - fleshy/dry HABITAT - terrestrial, aquatic, epiphyte etc etc I am asking the question - How do different combinations of factors effect extinction risk. Based on what you have both said i have called > predict(model1, type="fitted") Would this be the best way predicting the probability of falling into each response category - y>=2 y>=3 y>=4 y>=5 y>=6 1 0.502220616 0.410236021 0.2892270912 0.2191420568 0.1774250519 2 0.745221699 0.668501579 0.5412223837 0.4486151612 0.3847379442 3 0.720381333 0.639796647 0.5095814746 0.4174618165 0.3551631876 4 0.752321112 0.676811675 0.5505781183 0.4579680710 0.3937100283 5 0.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 6 0.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 7 0.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 8 0.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585 9 0.526291649 0.433739868 0.3094355120 0.2360800803 0.1919312111 I have 100 species for which i have their factors and i want to predict their response, so if i do the above and use the newdata function, and present the probabilities as above rather than trying to classify them? I tried polr and that "classified" each response as either 1 or 6 i.e no 2,3,4,5 - as did calling predict(model1, type="fitted.ind") which resulted in the probabilities of being 1 or 6 far outweighing 2,3,4,5 (Below) - this may just be that my model is not powefull enough to discrimate effectively as i know that is incorrect ( Brier score 2.01, AUC 66.9)? EXTINCTION=1 EXTINCTION=2 EXTINCTION=3 EXTINCTION=4 EXTINCTION=5 EXTINCTION=6 1 0.4977794 0.0919845942 0.121008930 0.070085034 0.0417170048 0.1774250519 2 0.2547783 0.0767201200 0.127279196 0.092607223 0.0638772170 0.3847379442 3 0.2796187 0.0805846862 0.130215173 0.092119658 0.0622986289 0.3551631876 4 0.2476789 0.0755094367 0.126233557 0.092610047 0.0642580427 0.3937100283 5 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 6 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 7 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 8 0.1756117 0.0604319173 0.109577572 0.088069011 0.0654116601 0.5008981585 9 0.4737084 0.0925517814 0.124304356 0.073355432 0.0441488692 0.1919312111 10 0.2489307 0.0757263892 0.126424896 0.092614323 0.0641934484 0.3921102030 Thanks very much for any advice given, John 10 0.751069260 0.675342871 0.5489179746 0.4563036514 0.3921102030 On 1 Oct 2010, at 23:13, Frank Harrell wrote: Well put Greg. The job of the statistician is to produce good estimates (probabilities in this case). Those cannot be translated into action without subject-specific utility functions. Classification during the analysis or publication stage is not necessary. Frank ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.