Hi Ben and list It predicted 756 out of 1115 correct, with another 150 very close. So i guess that is not too bad (68%) but should it be higher than that based on the fact that species i am testing the effectiveness with were the ones that built the model?
One interesting point is that it got the cutoffs correct even if they were opposite, for example - SPECIES 1 2 3 4 5 6 7 8 9 10 REAL Y Y Y Y Y N N N N N PREDICTED N N N N N Y Y Y Y Y This happened very often? I have different levels of threat so maybe that will allow predictions to a finer scale, would the same method i have used below if i ran a GLM with gaussian rather than binomial and then used the predict function? Thanks for your help, Chris On 26 Aug 2010, at 20:37, Ben Bolker wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10-08-26 12:39 PM, Chris Mcowen wrote: > Dear List, > > I am trying to predict the extinction risk of a species based on its life history. I will detail my method below and would welcome comments as to why the results are not as i expected. > > > First i fit my model - > >> model1 <- glm(THREAT~ HAB*BS + FR + WO + SEA + PD, data=traits, family="binomial") > > Where THREAT is TRUE (1) / FALSE (0). > > Where BS, FR etc are factors with multiple levels. > > > I then predicted the probability of a species being threatened or not using > >> print(predict(model1, type = "response")) > > example output:- > > 1 2 3 4 5 6 7 > 0.44659200 0.65221495 0.71357243 0.71357243 0.71357243 0.71357243 0.71357243 > 8 9 10 11 12 13 14 > 0.71357243 0.65221495 0.65221495 0.65221495 0.65221495 0.65221495 0.65221495 > > I interpret this as species 1 has a 45% chance (probability) of being threatened etc.... > > I then wanted to see how this relates to the "true" threat level so i looked at species 1 and it was classed as threatened, which disagrees with the predict results, although marginally. In fact most of the predict results do not agree with the "real" threat level, some species have a probability of 0.17 which to me says they are non threatened but in "real" they are classed as threatened. > > This is important as if these are not matching, at least most of the time, then how can i confidently predict the response of a species when i don't know its "real" response? How bad is the mismatch? With a probability of 0.44 you don't have much information either way -- not surprising if the species is listed as threatened *or* not threatened. It's hard to say without more detail: if it's really true that making predictions by rounding up or down (i.e. pred prob >0.5 -> 1) gives you more misses than hits, then something sounds screwy. You shouldn't do worse than 50% correct guessing at random ... (I note that many of the entries are giving identical probabilities - -- these points have identical sets of predictors, presumably) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkx2wmkACgkQc5UpGjwzenOHbwCdHgpF9M7vwl+AVLtr6GKrDn4a mmUAnREIgM6MNYcT+6BiBuzL0kx0WH0Q =PgOS -----END PGP SIGNATURE----- _______________________________________________ R-sig-ecology mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
