Re: [R] negative binomial family glm R and STATA
(par = param.ini, fn = minuslogL, hessian = hessian, ...) ## Results param - res$par if(is.null(unlist(fixpar)) == FALSE) param[fixpar[[1]]] - fixpar[[2]] H - NA ; H.singularity - NA ; varparam - NA if(hessian == TRUE){ H - res$hessian if(is.null(unlist(fixpar))) { H.singularity - ifelse(qr(H)$rank nrow(H), TRUE, FALSE) if(!H.singularity) varparam - qr.solve(H) } else{ idparam - 1:(nb.b + nb.phi) idestim - idparam[-fixpar[[1]]] Hr - H[-fixpar[[1]], -fixpar[[1]]] H.singularity - ifelse(qr(Hr)$rank nrow(Hr), TRUE, FALSE) if(!H.singularity) { Vr - solve(Hr) ; dimnames(Vr) - list(idestim, idestim) varparam - matrix(0, nrow = nrow(H), ncol = ncol(H)) ; varparam[idestim, idestim] - Vr } } } if(is.null(unlist(fixpar))) nbpar - length(param) else nbpar - length(param[-fixpar[[1]]]) logL.max - sum(dpois(x = y, lambda = y, log = TRUE)) logL - -res$value dev - -2 * (logL - logL.max) df.residual - length(y) - nbpar iterations - res$counts code - res$convergence res - list( link = link, formula = formula, random = random, param = param, H = H, H.singularity = H.singularity, varparam = varparam, logL = logL, logL.max = logL.max, dev = dev, nbpar = nbpar, df.residual = df.residual, iterations = iterations, code = code, param.ini = param.ini ) res } -- Matthieu Lesnoff International Livestock Research Institute (ILRI) Lab. 8 Old Naivasha Road PO BOX 30709 00100 GPO Nairobi, Kenya Tel: Off: (+254) 20 422 3000 (ext. 4801) Res: (+254) 20 422 3134 Mob: (+254) 725 785 570 Sec: (+254) 20 422 3013 Fax: (+254) 20 422 3001 Email: [EMAIL PROTECTED] -- -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Giraudoux Sent: samedi 6 janvier 2007 14:00 To: r-help@stat.math.ethz.ch Cc: Bertrand SUDRE Subject: [R] negative binomial family glm R and STATA Dear Lister, I am facing a strange problem fitting a GLM of the negative binomial family. Actually, I tried to estimate theta (the scale parameter) through glm.nb from MASS and could get convergence only relaxing the convergence tolerance to 1e-3. With warning messages: glm1-glm.nb(nbcas~.,data=zonesdb4,control=glm.control(epsilon = 1e-3)) There were 25 warnings (use warnings() to see them) warnings() Warning messages: 1: iteration limit reached in: theta.ml(Y, mu, n, w, limit = control$maxit, trace = control$trace... 2: NaNs produced in: sqrt(1/i) etc The estimate of theta was: 0.0939. When trying to compute confidence interval then, I got this message: confint(glm1a) Waiting for profiling to be done... Error in profile.glm(object, which = parm, alpha = (1 - level)/4, trace = trace) : profiling has found a better solution, so original fit had not converged Moreover, trying some other solutions by hand (without warnings produced, here) with glm( family=negative.binomial(1)), I found that theta = 0.7 lead to a much lower AIC (AIC= 1073) than theta = 1 (AIC=1211). Facing such unstable results my first reaction has been to forget about fitting a negative binomial model on this data set. The reader will find the dataset in a dumped format below for trials. A friend of mine tried the same with STATA and got the following result without any warning from STATA : . glm nbcas pop area v_100khab gares ports axe_routier lacs, family(nbinomial) link(log) eform Iteration 0: log likelihood = -616.84031 Iteration 1: log likelihood = -599.77767 Iteration 2: log likelihood = -597.22486 Iteration 3: log likelihood = -597.14784 Iteration 4: log likelihood = -597.14778 Iteration 5: log likelihood = -597.14778 Generalized linear models No. of obs =92 Optimization : ML Residual df =84 Scale parameter = 1 Deviance = 597.0007978(1/df) Deviance = 7.107152 Pearson = 335.6135273(1/df) Pearson = 3.995399 Variance function: V(u) = u+(1)u^2 [Neg. Binomial] Link function: g(u) = ln(u)[Log] AIC = 13.15539 Log likelihood = -597.1477759BIC = 217.1706 -- | OIM nbcas |IRR Std. Err. zP|z| [95% Conf. Interval
[R] negative binomial family glm R and STATA
Dear Patrick Try the package gamlss which allow to fit the negative binomial distrbution. For example with your data I am getting #--- ga1-gamlss(nbcas~.,data=zonesdb4,family=NBI) GAMLSS-RS iteration 1: Global Deviance = 817.9027 GAMLSS-RS iteration 2: Global Deviance = 817.9025 ga1 Family: c(NBI, Negative Binomial type I) Fitting method: RS() Call: gamlss(formula = nbcas ~ ., family = NBI, data = zonesdb4) Mu Coefficients: (Intercept) pop AreaV_100kHab1gares1 3.204e+00 1.114e-05 1.354e-05 9.144e-01 7.946e-01 ports1 axe_routier1 axe_routier2 lacs1 -1.730e+00 1.989e-01NA 3.042e+00 Sigma Coefficients: (Intercept) 2.313 Degrees of Freedom for the fit: 9 Residual Deg. of Freedom 83 Global Deviance: 817.902 AIC: 835.902 SBC: 858.599 #-- Note that the AIC: 835.902 is similar to your fitted model using glm.nb which is AIC: 836.2. The coefficients are not identical but this is not suprissing when you are using x-variables with extreme values as pop and Area. The profile function for sigma can be found using prof.dev(ga1,sigma, min=7, max=16, step=0.1, type=l) Your discrepancy with STATA come from the fact that in STATA you are fitting the model with sigma fixed to 1. You can see that by fitting the same model in GAMLSS. ga2-gamlss(nbcas~.,data=zonesdb4,family=NBI, sigma.fix=T, sigma.start=1) GAMLSS-RS iteration 1: Global Deviance = 1194.299 GAMLSS-RS iteration 2: Global Deviance = 1194.298 This is similar to the log likelihod you are getting in STATA. i.e. -2*-597.14778= 1194.296. You can also use the stepGAIC() function to simplify your model. For example ga2-stepGAIC(ga1) will result to a model with only pop and lacs in the mdel. Note also the the Negative binomial type II fits better to you data. ga3-gamlss(nbcas~.,data=zonesdb4,family=NBII) GAMLSS-RS iteration 1: Global Deviance = 804.5682 ... GAMLSS-RS iteration 10: Global Deviance = 804.4995 Thanks Mikis __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] negative binomial family glm R and STATA
Dear Lister, I am facing a strange problem fitting a GLM of the negative binomial family. Actually, I tried to estimate theta (the scale parameter) through glm.nb from MASS and could get convergence only relaxing the convergence tolerance to 1e-3. With warning messages: glm1-glm.nb(nbcas~.,data=zonesdb4,control=glm.control(epsilon = 1e-3)) There were 25 warnings (use warnings() to see them) warnings() Warning messages: 1: iteration limit reached in: theta.ml(Y, mu, n, w, limit = control$maxit, trace = control$trace... 2: NaNs produced in: sqrt(1/i) etc The estimate of theta was: 0.0939. When trying to compute confidence interval then, I got this message: confint(glm1a) Waiting for profiling to be done... Error in profile.glm(object, which = parm, alpha = (1 - level)/4, trace = trace) : profiling has found a better solution, so original fit had not converged Moreover, trying some other solutions by hand (without warnings produced, here) with glm( family=negative.binomial(1)), I found that theta = 0.7 lead to a much lower AIC (AIC= 1073) than theta = 1 (AIC=1211). Facing such unstable results my first reaction has been to forget about fitting a negative binomial model on this data set. The reader will find the dataset in a dumped format below for trials. A friend of mine tried the same with STATA and got the following result without any warning from STATA : . glm nbcas pop area v_100khab gares ports axe_routier lacs, family(nbinomial) link(log) eform Iteration 0: log likelihood = -616.84031 Iteration 1: log likelihood = -599.77767 Iteration 2: log likelihood = -597.22486 Iteration 3: log likelihood = -597.14784 Iteration 4: log likelihood = -597.14778 Iteration 5: log likelihood = -597.14778 Generalized linear models No. of obs =92 Optimization : ML Residual df =84 Scale parameter = 1 Deviance = 597.0007978(1/df) Deviance = 7.107152 Pearson = 335.6135273(1/df) Pearson = 3.995399 Variance function: V(u) = u+(1)u^2 [Neg. Binomial] Link function: g(u) = ln(u)[Log] AIC = 13.15539 Log likelihood = -597.1477759BIC = 217.1706 -- | OIM nbcas |IRR Std. Err. zP|z| [95% Conf. Interval] -+ pop | 1.11 1.82e-06 6.02 0.000 1.07 1.14 area | 1.14 .244 0.57 0.569 .661 1.62 v_100khab | 2.485394 .7924087 2.86 0.004 1.330485 4.642806 gares | 2.185483 .7648255 2.23 0.025 1.100686 4.339418 ports | .1805793.100423-3.08 0.002 .0607158 .5370744 axe_routier |.828243 .2258397-0.69 0.489 .4853532 1.413376 lacs | 20.50183 12.17126 5.09 0.000 6.404161 65.63311 Has somebody an idea about (1) why the AIC values given are so different between softwares (R = 1211, STATA= 13.15) for the same model and (2) what can explain so different behaviour between R and STATA ? Here below the data.frame: zonesdb4 - structure(list(nbcas = as.integer(c(318, 0, 42, 3011, 6, 911, 45, 273, 0, 0, 89, 122, 407, 83, 0, 1844, 58, 0, 0, 0, 0, 8926, 0, 0, 0, 0, 108, 0, 13, 1884, 0, 0, 0, 0, 963, 0, 199, 735, 0, 2182, 971, 0, 65, 0, 7927, 30, 0, 186, 0, 1363, 808, 0, 0, 0, 0, 135, 0, 1338, 0, 0, 488, 0, 344, 0, 0, 454, 4808, 0, 692, 0, 0, 24, 1301, 0, 0, 474, 228, 0, 0, 98, 44, 0, 0, 0, 1562, 375, 327, 0, 0, 0, 0, 0)), pop = as.integer(c(247215, 55709, 63625, 253153, 51789, 142806, 129839, 95799, 129996, 8, 76043, 267232, 153200, 136333, 264888, 198244, 233600, 89152, 128085, 71803, 81911, 122523, 149806, 122470, 50979, 160773, 80700, 56146, 226965, 245322, 165768, 215129, 46843, 108471, 108690, 188724, 161794, 226965, 95850, 156326, 145291, 51789, 218808, 53189, 245854, 152047, 146509, 243765, 65012, 226830, 66742, 144762, 93858, 73793, 123107, 189793, 91013, 135212, 67487, 105050, 194903, 206606, 62169, 96832, 145570, 167062, 1598576, 146509, 103928, 118334, 91509, 295644, 139650, 106980, 66529, 126126, 257341, 56973, 33793, 164072, 145225, 155638, 131100, 100880, 245482, 166213, 127365, 113713, 57540, 78571, 62499, 81916)), Area = c(10027.1, 9525.3, 638.646, 861.486, 4966.32, 11973, 1823.89, 1327.45, 789.595, 4892.38, 638.908, 15959.8, 2036.56, 7397.62, 4626.03, 10237.5, 9823.24, 4253.59, 2448.78, 12280.2, 910.972, 16365, 2041.92, 4343.46, 1081.42, 1601.11, 4664.47, 335.865, 2818.68, 7348.1, 1148.41, 265.158, 14883.6,