I guess I should not have been so quick to make that conclusion since it seems that 74% of the values in the holdout set are FALSE so simply guessing FALSE for each one would give us 74% accuracy:
> table(DD[201:254]) FALSE TRUE 40 14 > 40/54 [1] 0.7407407 On 8/9/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Perhaps you don't really need to predict the precise count. > Maybe its good enough to predict whether the count is above > or below average. In that case the model is 74% correct on a > holdout sample of the last 54 points based on a model of the > first 200 points. > > > # create model on first 200 and predict on rest > > DD <- data$D > mean(data$D) > > mod <- glm(DD ~., data[-1], family = binomial, subset = 1:200) > > tab <- table(predict(mod, data[201:254,-1], type = "resp") > .5, > > DD[201:254]) > > sum(tab * diag(2)) / sum(tab) > [1] 0.7407407 > > > On 8/9/07, Matthew and Kim Bowser <[EMAIL PROTECTED]> wrote: > > Dear all, > > > > I am attempting to explain patterns of arthropod family richness > > (count data) using a regression model. It seems to be able to do a > > pretty good job as an explanatory model (i.e. demonstrating > > relationships between dependent and independent variables), but it has > > systematic problems as a predictive model: It is biased high at low > > observed values of family richness and biased low at high observed > > values of family richness (see attached pdf). I have tried diverse > > kinds of reasonable regression models mostly as in Zeileis, et al. > > (2007), as well as transforming my variables, both with only small > > improvements. > > > > Do you have suggestions for making a model that would perform better > > as a predictive model? > > > > Thank you for your time. > > > > Sincerely, > > > > Matthew Bowser > > > > STEP student > > USFWS Kenai National Wildlife Refuge > > Soldotna, Alaska, USA > > > > M.Sc. student > > University of Alaska Fairbanks > > Fairbankse, Alaska, USA > > > > Reference > > > > Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for > > count data in R. Technical Report 53, Department of Statistics and > > Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL > > http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf. > > > > Code > > > > `data` <- > > structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4, > > 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15, > > 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13, > > 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12, > > 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8, > > 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8, > > 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8, > > 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16, > > 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12, > > 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6, > > 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3, > > 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3, > > 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15, > > 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159, > > 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161, > > 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161, > > 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176, > > 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165, > > 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167, > > 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164, > > 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173, > > 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162, > > 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166, > > 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164, > > 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172, > > 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174, > > 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174, > > 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170, > > 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172, > > 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160, > > 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176, > > 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166, > > 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166, > > 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253, > > 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26, > > 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194, > > 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354, > > 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312, > > 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122, > > 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28, > > 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34, > > 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221, > > 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232, > > 0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318, > > 0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2, > > 0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38, > > 0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288, > > 0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162, > > 0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306, > > 0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208, > > 0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25, > > 0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462, > > 0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275, > > 0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135, > > 0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325, > > 0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047, > > -0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0, > > 0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467, > > -0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228, > > 0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 0.376, > > -0.185, 0.342, 0.142, -0.075, 0.186, 0.333, 0.112, 0.272, 0.277, > > 0.203, 0.37, 0.465, 0.425), VegS = c(14, 11, 18, 21, 31, 20, > > 11, 17, 10, 15, 27, 8, 17, 13, 16, 9, 10, 16, 10, 15, 11, 9, > > 14, 11, 11, 10, 24, 18, 12, 6, 25, 21, 25, 8, 14, 18, 11, 16, > > 20, 16, 10, 16, 18, 14, 13, 11, 15, 23, 11, 28, 12, 17, 12, 18, > > 10, 15, 7, 15, 9, 16, 18, 16, 20, 18, 12, 19, 16, 18, 20, 15, > > 24, 9, 15, 9, 16, 14, 17, 14, 7, 9, 9, 12, 13, 15, 14, 11, 17, > > 8, 14, 15, 12, 8, 10, 12, 8, 16, 15, 22, 16, 21, 10, 15, 20, > > 14, 27, 21, 19, 22, 21, 11, 13, 10, 13, 14, 9, 22, 22, 20, 12, > > 16, 20, 19, 26, 14, 13, 23, 14, 22, 19, 15, 28, 16, 20, 25, 10, > > 19, 0, 10, 8, 11, 17, 13, 17, 23, 37, 32, 19, 26, 12, 11, 24, > > 11, 21, 25, 8, 15, 21, 31, 17, 0, 12, 6, 23, 19, 29, 14, 9, 0, > > 18, 23, 20, 15, 15, 17, 27, 17, 2, 24, 17, 16, 26, 11, 23, 24, > > 10, 26, 21, 12, 20, 12, 29, 22, 20, 16, 41, 19, 27, 28, 10, 35, > > 28, 23, 14, 5, 23, 15, 17, 12, 11, 24, 11, 14, 7, 7, 27, 17, > > 15, 10, 16, 2, 11, 21, 18, 15, 8, 17, 10, 18, 15, 18, 31, 9, > > 14, 11, 19, 22, 8, 19, 17, 18, 25, 11, 17, 32, 25, 18, 21, 19, > > 35, 14, 29, 9, 28, 14), T = c(13, 15.4, 15.6, 12.3, 12.7, 13.3, > > 6, 13, 9.2, 17.8, 9.4, 14.2, 8.7, 14, 8.6, 13.9, 9.2, 15.1, 9.4, > > 16.5, 14, 11.5, 15.5, 13.3, 12.7, 14, 10.5, 14, 10.1, 16.7, 15.2, > > 11.2, 11.7, 17.9, 13.3, 13.9, 8.7, 16.7, 7.8, 7.9, 10.9, 15.5, > > 14, 14.1, 14.3, 13.3, 11.6, 16.5, 12.7, 12.6, 8.3, 9, 12.4, 15, > > 11.8, 14.1, 10.5, 12.4, 10.5, 17.5, 9.2, 16.3, 5.3, 5.9, 11.9, > > 8.9, 7.7, 15.2, 8.6, 13.2, 9.5, 15.8, 12.5, 13.8, 10.7, 10.5, > > 7.7, 11, 9.3, 14.6, 12, 15.4, 12.3, 14, 8.3, 19.8, 15.5, 14.3, > > 9, 7.6, 15.3, 12.8, 14.4, 14, 10.5, 8.9, 13.4, 12.8, 12.9, 11.2, > > 13.1, 10, 12.4, 15.4, 7.6, 14.9, 13.1, 11.1, 8.6, 13.6, 8.4, > > 11.5, 12.5, 15.6, 8.3, 9.6, 8.7, 9.7, 10.5, 12.8, 8.6, 12.7, > > 6.7, 8.5, 9.9, 8.3, 12.8, 9.8, 10.5, 10.7, 9.6, 11.1, 13.7, 9.5, > > 8.8, 3.4, 10.2, 3.5, 8.4, 11.9, 12.3, 10.2, 13.1, 8.4, 3.1, 7.2, > > 13.2, 7.6, 11.3, 13.5, 7.8, 5.5, 10.7, 4.8, 7.9, 13.5, 13.5, > > 2.1, 7.2, 9.7, 12, 9.2, 13.2, 12, 17.1, 7.9, 12.7, 11.8, 16, > > 6.4, 12.9, 8.1, 12.6, 10.2, 13.3, 8.5, 9.7, 9.9, 18.2, 11, 8.4, > > 1.5, 7.3, 10.6, 13.6, 8.4, 7.2, 13, 15.5, 9.7, 13.2, 5.9, 9.5, > > 10.4, 12.9, 2.6, 17.2, 15.4, 10.5, 6.7, 6.6, 7.6, 10.5, 15.6, > > 10.4, 5.1, 11, 9.7, 4.2, 3.6, 8.5, 11.5, 8.4, 6.9, 11, 10.4, > > 3.4, 3.2, 5.5, 2.4, 11.2, 2.6, 15.1, 16, 13.7, 10.5, 3.5, 13.4, > > 11.5, 12.3, 13.9, 14.5, 12.8, 16.8, 16.9, 13.5, 17.2, 12.5, 12.4, > > 11.8, 12, 10.9, 6.7, 10.9, 2.3, 5.2, 13.1, 12.1, 13.9, 12.9, > > 7.2, 12.5, 16, 11.7), Hemlock = c(0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, > > 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0), Alpine = c(0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, > > 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, > > 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, > > 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, > > 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, > > 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0), > > Snow = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, > > 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("D", "Day", "NDVI", > > "VegS", "T", "Hemlock", "Alpine", "Snow"), row.names = as.integer(c(NA, > > 254)), class = "data.frame") > > > > > > #Running a regression. > > library(MASS) > > fit <- glm.nb(D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data = > > data) > > summary(fit, correlation = FALSE) > > > > Call: > > glm.nb(formula = D ~ Day + NDVI + VegS + T + Hemlock + Alpine + > > Snow, data = data, init.theta = 11.3494468596771, link = log) > > > > Deviance Residuals: > > Min 1Q Median 3Q Max > > -3.7451 -0.7196 -0.1958 0.5389 2.7096 > > > > Coefficients: > > Estimate Std. Error z value Pr(>|z|) > > (Intercept) -2.882684 0.929598 -3.101 0.001929 ** > > Day 0.020325 0.005540 3.669 0.000244 *** > > NDVI 1.353361 0.221471 6.111 9.91e-10 *** > > VegS 0.016731 0.004931 3.393 0.000691 *** > > T 0.074189 0.009491 7.817 5.42e-15 *** > > Hemlock -0.588858 0.174980 -3.365 0.000765 *** > > Alpine -0.452199 0.099296 -4.554 5.26e-06 *** > > Snow -1.902610 0.735708 -2.586 0.009707 ** > > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > (Dispersion parameter for Negative Binomial(11.3494) family taken to be 1) > > > > Null deviance: 515.16 on 253 degrees of freedom > > Residual deviance: 278.89 on 246 degrees of freedom > > AIC: 1300.1 > > > > Number of Fisher Scoring iterations: 1 > > > > > > Theta: 11.35 > > Std. Err.: 2.71 > > > > 2 x log-likelihood: -1282.075 > > > > > > #Plotting observed versus predicted values. > > pdf(file="ObsVPred.pdf", width=4, height=4, family="Times", pointsize=11) > > par(mar = c(5,5,1,1), pch=1) > > plot(data$D, fit$fitted.values, main="", > > ylab=expression(italic(D)[predicted]), > > xlab=expression(italic(D)[observed])) > > abline(a=0,b=1, lty=2) > > lines(lowess(data$D, fit$fitted.values)) > > dev.off() > > > > > > #This appears to be a decent explanatory model, but as a predictive > > model it is systematically biased. It is biased high at low observed > > values of D and biased low at high values observed values of D. > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.