Re: [R] negative binomial family glm R and STATA

2007-01-07 Thread Lesnoff, Matthieu \(ILRI\)
(par = param.ini, fn = minuslogL, hessian = hessian,
...)
## Results
param - res$par
if(is.null(unlist(fixpar)) == FALSE) param[fixpar[[1]]] -
fixpar[[2]]
H - NA ; H.singularity - NA ; varparam - NA
if(hessian == TRUE){
H - res$hessian
if(is.null(unlist(fixpar))) {
H.singularity - ifelse(qr(H)$rank  nrow(H), TRUE, FALSE)
if(!H.singularity) varparam - qr.solve(H)
}
else{
idparam - 1:(nb.b + nb.phi)
idestim - idparam[-fixpar[[1]]]
Hr - H[-fixpar[[1]], -fixpar[[1]]]
H.singularity - ifelse(qr(Hr)$rank  nrow(Hr), TRUE, FALSE)
if(!H.singularity) {
Vr - solve(Hr) ; dimnames(Vr) - list(idestim, idestim)
varparam - matrix(0, nrow = nrow(H), ncol = ncol(H)) ;
varparam[idestim, idestim] - Vr
}
}
}
if(is.null(unlist(fixpar))) nbpar - length(param) else nbpar -
length(param[-fixpar[[1]]])
logL.max - sum(dpois(x = y, lambda = y, log = TRUE))
logL - -res$value
dev - -2 * (logL - logL.max)
df.residual - length(y) - nbpar 
iterations - res$counts
code - res$convergence
res - list(
link = link, formula = formula, random = random, param = param, 
H = H, H.singularity = H.singularity, varparam = varparam, logL
= logL, logL.max = logL.max, 
dev = dev, nbpar = nbpar, df.residual = df.residual, iterations
= iterations, 
code = code, param.ini = param.ini
)
res
}
  

--
Matthieu Lesnoff
International Livestock Research Institute (ILRI)
Lab. 8
Old Naivasha Road
PO BOX 30709
00100 GPO Nairobi, Kenya
Tel:   Off: (+254) 20 422 3000 (ext. 4801)
   Res: (+254) 20 422 3134
   Mob: (+254) 725 785 570
   Sec: (+254) 20 422 3013 
Fax: (+254) 20 422 3001
Email: [EMAIL PROTECTED] 
-- 



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Patrick Giraudoux
 Sent: samedi 6 janvier 2007 14:00
 To: r-help@stat.math.ethz.ch
 Cc: Bertrand SUDRE
 Subject: [R] negative binomial family glm R and STATA
 
 Dear Lister,
 
 I am facing a strange problem fitting a GLM of the negative 
 binomial family. Actually, I tried to estimate theta (the 
 scale parameter) through glm.nb from MASS and could get 
 convergence only relaxing the convergence tolerance to 1e-3. 
 With warning messages:
 
  
 glm1-glm.nb(nbcas~.,data=zonesdb4,control=glm.control(epsilon
  = 1e-3)) There were 25 warnings (use warnings() to see them) 
   warnings() Warning messages:
 1: iteration limit reached in: theta.ml(Y, mu, n, w, limit = 
 control$maxit, trace = control$trace...
 2: NaNs produced in: sqrt(1/i)
 
 etc
 
 The estimate of theta was: 0.0939. When trying to compute 
 confidence interval then, I got this message:
 
   confint(glm1a)
 Waiting for profiling to be done...
 Error in profile.glm(object, which = parm, alpha = (1 - 
 level)/4, trace = trace) :
 profiling has found a better solution, so original 
 fit had not converged
 
 Moreover, trying some other solutions by hand (without 
 warnings produced, here) with glm( 
 family=negative.binomial(1)),  I found that theta = 0.7 
 lead to a much lower AIC (AIC= 1073) than theta = 1 (AIC=1211).
 
 Facing such unstable results my first reaction has been to 
 forget about fitting a negative binomial model on this data 
 set. The reader will find the dataset in a dumped format 
 below for trials.
 
 A friend of mine tried the same with STATA and got the 
 following result without any warning  from STATA :
 
 . glm nbcas pop area v_100khab gares ports axe_routier lacs,
 family(nbinomial) link(log) eform
  
 Iteration 0:   log likelihood = -616.84031 
 Iteration 1:   log likelihood = -599.77767 
 Iteration 2:   log likelihood = -597.22486 
 Iteration 3:   log likelihood = -597.14784 
 Iteration 4:   log likelihood = -597.14778 
 Iteration 5:   log likelihood = -597.14778 
  
 Generalized linear models  No. of obs  
 =92
 Optimization : ML  Residual df 
 =84
Scale parameter 
 = 1
 Deviance =  597.0007978(1/df) Deviance =  
 7.107152
 Pearson  =  335.6135273(1/df) Pearson  =  
 3.995399
  
 Variance function: V(u) = u+(1)u^2 [Neg. Binomial]
 Link function: g(u) = ln(u)[Log]
  
AIC =  
 13.15539
 Log likelihood   = -597.1477759BIC =  
 217.1706
  
 --
  
 
  | OIM
nbcas |IRR   Std. Err.  zP|z| [95% Conf. 
 Interval

[R] negative binomial family glm R and STATA

2007-01-07 Thread Mikis Stasinopoulos
Dear  Patrick

Try the package gamlss which allow to fit the negative binomial 
distrbution. For example  with your data I am getting
#---
ga1-gamlss(nbcas~.,data=zonesdb4,family=NBI)
GAMLSS-RS iteration 1: Global Deviance = 817.9027
GAMLSS-RS iteration 2: Global Deviance = 817.9025
 ga1

Family:  c(NBI, Negative Binomial type I)
Fitting method: RS()

Call:  gamlss(formula = nbcas ~ ., family = NBI, data = zonesdb4)

Mu Coefficients:
 (Intercept)   pop  AreaV_100kHab1gares1 
   3.204e+00 1.114e-05 1.354e-05 9.144e-01 7.946e-01 
  ports1  axe_routier1  axe_routier2 lacs1 
  -1.730e+00 1.989e-01NA 3.042e+00 
Sigma Coefficients:
(Intercept) 
  2.313 

 Degrees of Freedom for the fit: 9 Residual Deg. of Freedom   83
Global Deviance: 817.902
AIC: 835.902
SBC: 858.599
#--

Note that the AIC: 835.902 is similar to your fitted model using  glm.nb 
which is  AIC: 836.2.

The coefficients are not identical but this is not suprissing when you 
are using x-variables with extreme values as pop and Area.
The profile function for sigma can be found using

prof.dev(ga1,sigma, min=7, max=16, step=0.1, type=l)

Your discrepancy with STATA come from the fact that in STATA you are 
fitting the model with sigma fixed to 1.
You can see that by fitting the same model in  GAMLSS.

   ga2-gamlss(nbcas~.,data=zonesdb4,family=NBI, sigma.fix=T, 
sigma.start=1)
GAMLSS-RS iteration 1: Global Deviance = 1194.299
GAMLSS-RS iteration 2: Global Deviance = 1194.298

This  is similar  to  the  log likelihod you are  getting in STATA. i.e. 
-2*-597.14778= 1194.296.

You can also use the stepGAIC() function to simplify your model. For  
example

 ga2-stepGAIC(ga1)

will result to a model with only pop and lacs in the mdel.

Note also the the Negative binomial type II fits better to you data.

   ga3-gamlss(nbcas~.,data=zonesdb4,family=NBII)
GAMLSS-RS iteration 1: Global Deviance = 804.5682
...
GAMLSS-RS iteration 10: Global Deviance = 804.4995


Thanks

Mikis

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] negative binomial family glm R and STATA

2007-01-06 Thread Patrick Giraudoux
Dear Lister,

I am facing a strange problem fitting a GLM of the negative binomial 
family. Actually, I tried to estimate theta (the scale parameter) 
through glm.nb from MASS and could get convergence only relaxing the 
convergence tolerance to 1e-3. With warning messages:

 glm1-glm.nb(nbcas~.,data=zonesdb4,control=glm.control(epsilon = 1e-3))
There were 25 warnings (use warnings() to see them)
  warnings()
Warning messages:
1: iteration limit reached in: theta.ml(Y, mu, n, w, limit = 
control$maxit, trace = control$trace...
2: NaNs produced in: sqrt(1/i)

etc

The estimate of theta was: 0.0939. When trying to compute confidence 
interval then, I got this message:

  confint(glm1a)
Waiting for profiling to be done...
Error in profile.glm(object, which = parm, alpha = (1 - level)/4, trace 
= trace) :
profiling has found a better solution, so original fit had not 
converged

Moreover, trying some other solutions by hand (without warnings 
produced, here) with glm( family=negative.binomial(1)),  I found 
that theta = 0.7 lead to a much lower AIC (AIC= 1073) than theta = 1 
(AIC=1211).

Facing such unstable results my first reaction has been to forget about 
fitting a negative binomial model on this data set. The reader will find 
the dataset in a dumped format below for trials.

A friend of mine tried the same with STATA and got the following result 
without any warning  from STATA :

. glm nbcas pop area v_100khab gares ports axe_routier lacs, 
family(nbinomial) link(log) eform
 
Iteration 0:   log likelihood = -616.84031 
Iteration 1:   log likelihood = -599.77767 
Iteration 2:   log likelihood = -597.22486 
Iteration 3:   log likelihood = -597.14784 
Iteration 4:   log likelihood = -597.14778 
Iteration 5:   log likelihood = -597.14778 
 
Generalized linear models  No. of obs  
=92
Optimization : ML  Residual df 
=84
   Scale parameter 
= 1
Deviance =  597.0007978(1/df) Deviance =  
7.107152
Pearson  =  335.6135273(1/df) Pearson  =  
3.995399
 
Variance function: V(u) = u+(1)u^2 [Neg. Binomial]
Link function: g(u) = ln(u)[Log]
 
   AIC =  
13.15539
Log likelihood   = -597.1477759BIC =  
217.1706
 
-- 

 | OIM
   nbcas |IRR   Std. Err.  zP|z| [95% Conf. 
Interval]
-+ 

 pop |   1.11   1.82e-06 6.02   0.000 1.07
1.14
area |   1.14   .244 0.57   0.569 .661
1.62
   v_100khab |   2.485394   .7924087 2.86   0.004 1.330485
4.642806
   gares |   2.185483   .7648255 2.23   0.025 1.100686
4.339418
   ports |   .1805793.100423-3.08   0.002 .0607158
.5370744
 axe_routier |.828243   .2258397-0.69   0.489 .4853532
1.413376
lacs |   20.50183   12.17126 5.09   0.000 6.404161
65.63311


Has somebody an idea about (1) why the AIC values given are so different 
between softwares (R = 1211, STATA= 13.15) for the same model and (2) 
what can explain so different behaviour between R and STATA ?

Here below the data.frame:


zonesdb4 -
structure(list(nbcas = as.integer(c(318, 0, 42, 3011, 6, 911,
45, 273, 0, 0, 89, 122, 407, 83, 0, 1844, 58, 0, 0, 0, 0, 8926,
0, 0, 0, 0, 108, 0, 13, 1884, 0, 0, 0, 0, 963, 0, 199, 735, 0,
2182, 971, 0, 65, 0, 7927, 30, 0, 186, 0, 1363, 808, 0, 0, 0,
0, 135, 0, 1338, 0, 0, 488, 0, 344, 0, 0, 454, 4808, 0, 692,
0, 0, 24, 1301, 0, 0, 474, 228, 0, 0, 98, 44, 0, 0, 0, 1562,
375, 327, 0, 0, 0, 0, 0)), pop = as.integer(c(247215, 55709,
63625, 253153, 51789, 142806, 129839, 95799, 129996, 8, 76043,
267232, 153200, 136333, 264888, 198244, 233600, 89152, 128085,
71803, 81911, 122523, 149806, 122470, 50979, 160773, 80700, 56146,
226965, 245322, 165768, 215129, 46843, 108471, 108690, 188724,
161794, 226965, 95850, 156326, 145291, 51789, 218808, 53189,
245854, 152047, 146509, 243765, 65012, 226830, 66742, 144762,
93858, 73793, 123107, 189793, 91013, 135212, 67487, 105050, 194903,
206606, 62169, 96832, 145570, 167062, 1598576, 146509, 103928,
118334, 91509, 295644, 139650, 106980, 66529, 126126, 257341,
56973, 33793, 164072, 145225, 155638, 131100, 100880, 245482,
166213, 127365, 113713, 57540, 78571, 62499, 81916)), Area = c(10027.1,
9525.3, 638.646, 861.486, 4966.32, 11973, 1823.89, 1327.45, 789.595,
4892.38, 638.908, 15959.8, 2036.56, 7397.62, 4626.03, 10237.5,
9823.24, 4253.59, 2448.78, 12280.2, 910.972, 16365, 2041.92,
4343.46, 1081.42, 1601.11, 4664.47, 335.865, 2818.68, 7348.1,
1148.41, 265.158, 14883.6,