[R] negative binomial family glm R and STATA

Patrick Giraudoux Sat, 06 Jan 2007 03:05:06 -0800

Dear Lister,

I am facing a strange problem fitting a GLM of the negative binomial 
family. Actually, I tried to estimate theta (the scale parameter) 
through glm.nb from MASS and could get convergence only relaxing the 
convergence tolerance to 1e-3. With warning messages:


 glm1<-glm.nb(nbcas~.,data=zonesdb4,control=glm.control(epsilon = 1e-3))
There were 25 warnings (use warnings() to see them)
 > warnings()
Warning messages:
1: iteration limit reached in: theta.ml(Y, mu, n, w, limit = 
control$maxit, trace = control$trace >   ...
2: NaNs produced in: sqrt(1/i)

etc....

The estimate of theta was: 0.0939. When trying to compute confidence 
interval then, I got this message:

 > confint(glm1a)
Waiting for profiling to be done...
Error in profile.glm(object, which = parm, alpha = (1 - level)/4, trace 
= trace) :
        profiling has found a better solution, so original fit had not 
converged

Moreover, trying some other solutions "by hand" (without warnings 
produced, here) with glm(.... family=negative.binomial(1)....),  I found 
that theta = 0.7 lead to a much lower AIC (AIC= 1073) than theta = 1 
(AIC=1211).

Facing such unstable results my first reaction has been to forget about 
fitting a negative binomial model on this data set. The reader will find 
the dataset in a dumped format below for trials.

A friend of mine tried the same with STATA and got the following result 
without any warning  from STATA :

. glm nbcas pop area v_100khab gares ports axe_routier lacs, 
family(nbinomial) link(log) eform
 
Iteration 0:   log likelihood = -616.84031 
Iteration 1:   log likelihood = -599.77767 
Iteration 2:   log likelihood = -597.22486 
Iteration 3:   log likelihood = -597.14784 
Iteration 4:   log likelihood = -597.14778 
Iteration 5:   log likelihood = -597.14778 
 
Generalized linear models                          No. of obs      
=        92
Optimization     : ML                              Residual df     
=        84
                                                   Scale parameter 
=         1
Deviance         =  597.0007978                    (1/df) Deviance =  
7.107152
Pearson          =  335.6135273                    (1/df) Pearson  =  
3.995399
 
Variance function: V(u) = u+(1)u^2                 [Neg. Binomial]
Link function    : g(u) = ln(u)                    [Log]
 
                                                   AIC             =  
13.15539
Log likelihood   = -597.1477759                    BIC             =  
217.1706
 
------------------------------------------------------------------------------ 

             |                 OIM
       nbcas |        IRR   Std. Err.      z    P>|z|     [95% Conf. 
Interval]
-------------+---------------------------------------------------------------- 

         pop |   1.000011   1.82e-06     6.02   0.000     1.000007    
1.000014
        area |   1.000014   .0000244     0.57   0.569     .9999661    
1.000062
   v_100khab |   2.485394   .7924087     2.86   0.004     1.330485    
4.642806
       gares |   2.185483   .7648255     2.23   0.025     1.100686    
4.339418
       ports |   .1805793    .100423    -3.08   0.002     .0607158    
.5370744
 axe_routier |    .828243   .2258397    -0.69   0.489     .4853532    
1.413376
        lacs |   20.50183   12.17126     5.09   0.000     6.404161    
65.63311


Has somebody an idea about (1) why the AIC values given are so different 
between softwares (R = 1211, STATA= 13.15) for the same model and (2) 
what can explain so different behaviour between R and STATA ?

Here below the data.frame:


zonesdb4 <-
structure(list(nbcas = as.integer(c(318, 0, 42, 3011, 6, 911,
45, 273, 0, 0, 89, 122, 407, 83, 0, 1844, 58, 0, 0, 0, 0, 8926,
0, 0, 0, 0, 108, 0, 13, 1884, 0, 0, 0, 0, 963, 0, 199, 735, 0,
2182, 971, 0, 65, 0, 7927, 30, 0, 186, 0, 1363, 808, 0, 0, 0,
0, 135, 0, 1338, 0, 0, 488, 0, 344, 0, 0, 454, 4808, 0, 692,
0, 0, 24, 1301, 0, 0, 474, 228, 0, 0, 98, 44, 0, 0, 0, 1562,
375, 327, 0, 0, 0, 0, 0)), pop = as.integer(c(247215, 55709,
63625, 253153, 51789, 142806, 129839, 95799, 129996, 66668, 76043,
267232, 153200, 136333, 264888, 198244, 233600, 89152, 128085,
71803, 81911, 122523, 149806, 122470, 50979, 160773, 80700, 56146,
226965, 245322, 165768, 215129, 46843, 108471, 108690, 188724,
161794, 226965, 95850, 156326, 145291, 51789, 218808, 53189,
245854, 152047, 146509, 243765, 65012, 226830, 66742, 144762,
93858, 73793, 123107, 189793, 91013, 135212, 67487, 105050, 194903,
206606, 62169, 96832, 145570, 167062, 1598576, 146509, 103928,
118334, 91509, 295644, 139650, 106980, 66529, 126126, 257341,
56973, 33793, 164072, 145225, 155638, 131100, 100880, 245482,
166213, 127365, 113713, 57540, 78571, 62499, 81916)), Area = c(10027.1,
9525.3, 638.646, 861.486, 4966.32, 11973, 1823.89, 1327.45, 789.595,
4892.38, 638.908, 15959.8, 2036.56, 7397.62, 4626.03, 10237.5,
9823.24, 4253.59, 2448.78, 12280.2, 910.972, 16365, 2041.92,
4343.46, 1081.42, 1601.11, 4664.47, 335.865, 2818.68, 7348.1,
1148.41, 265.158, 14883.6, 3698.58, 12444.6, 1711.45, 15462,
10419.5, 13119.2, 1276.14, 872.91, 19291.4, 6719.82, 8505.53,
13219.8, 13069, 5212.03, 3924.42, 791.219, 881.281, 10038.5,
9101.94, 7925.71, 943.062, 9888, 20585.3, 4600.35, 3258.27, 11813.4,
130.184, 10644.3, 1925.89, 1892.88, 3833.6, 350.3, 7154.79, 2800.63,
559.986, 3152, 7095.39, 6058.3, 113.225, 5067.66, 1293.05, 15109.8,
4111.54, 94.5213, 4012.91, 1468.02, 10651.3, 8541.69, 1806.28,
20166.3, 1110.75, 2026.98, 21114.4, 2041.51, 17740.9, 16627.5,
15266.1, 525.467, 371.132), V_100kHab = structure(as.integer(c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,
1, 2, 1, 1, 1, 1, 2)), .Label = c("0", "1"), class = "factor"),
    gares = structure(as.integer(c(2, 2, 1, 1, 1, 1, 1, 1, 1,
    2, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,
    1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1, 1,
    1, 1, 1, 1, 1, 1, 1)), .Label = c("0", "1"), class = "factor"),
    ports = structure(as.integer(c(2, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
    2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1,
    2, 1, 1, 1, 1, 1, 1)), .Label = c("0", "1"), class = "factor"),
    axe_routier = structure(as.integer(c(2, 3, 3, 3, 2, 2, 2,
    3, 2, 3, 2, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 3, 2, 2, 3, 2,
    3, 2, 3, 3, 3, 2, 2, 3, 2, 3, 2, 3, 2, 3, 3, 2, 3, 2, 3,
    2, 2, 3, 3, 2, 3, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    3, 3, 2, 2, 3, 3, 3, 3, 2, 3, 3, 2, 3, 3, 2, 3, 2, 3, 3,
    3, 2, 3, 3, 3, 2, 2, 3, 3)), .Label = c("0", "1", "2"), class = 
"factor"),
    lacs = structure(as.integer(c(1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2,
    2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1,
    2, 1, 1, 1, 1, 1, 1)), .Label = c("0", "1"), class = "factor")), 
.Names = c("nbcas",
"pop", "Area", "V_100kHab", "gares", "ports", "axe_routier",
"lacs"), row.names = c("Ankoro", "Dilolo", "Tshitenge", "Tshitshimbi",
"Tshofa", "Tshudi Loto", "Vanga Kete", "Wikong", "Djalo Djeka",
"Fungurume", "Gandajika", "Kabalo", "Kabeya Kamuanga", "Kabinda",
"Kabondo Dianda", "Kabongo", "Kafakumba", "Bena Dibele", "Kafubu",
"Kalamba", "Kalambayi Kabanga", "Kalemie", "Kalenda", "Kalonda Est",
"Kamalondo", "Kamana", "Kambove", "Kamiji", "Bibanga", "Kamina",
"Kampemba", "Kanda Kanda", "Kansimba", "Kanzenze", "Kapanga",
"Kapolowe", "Kasaji", "Kasenga", "Katako Kombe", "Katuba", "Kenya",
"Kiambi", "Kikula", "Kilela Balanda", "Kilwa", "Kinda", "Bukenya",
"Kinkondja", "Kipushi", "Kisanga", "Kitenge", "Kole", "Kongolo",
"Likasi", "Lodja", "Lomela", "Lualaba", "Butumba", "Lubao", "Lubilanji",
"Lubudi", "Lubumbashi", "Ludimbi Lukula", "Lukafu", "Lukelenge",
"Lusambo", "Cilundu", "Makota", "Malemba Nkulu", "Manika", "Manono",
"Miabi", "Minga", "Muene Ditu", "Mufunga Sampwe", "Mukanga",
"Mukumbi", "Mulongo", "Mulumba", "Mutshatsha", "Nyemba", "Dilala",
"Nyunzu", "Panda", "Pania Mutombo", "Pweto", "Rwashi", "Sakania",
"Sandoa", "Songa", "Tshamilemba", "Tshilenge"), class = "data.frame", 
na.action = structure(as.integer(c(8,
34, 40, 41, 45, 71, 73, 79, 80, 83, 84, 85, 86, 93)), .Names = c("Wembo 
Nyama",
"Kaniama", "Kasansa", "Bukama", "Kayamba", "Luputa", "Lwamba",
"Mbuji mayi", "Mbulula", "Mitwaba", "Moba", "Dikungu Tshumbe",
"Mpokolo", "Mumbunda"), class = "omit"))


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] negative binomial family glm R and STATA

Reply via email to