Re: [R] weights vs. offset (negative binomial regression)

2023-11-02 Thread peter dalgaard
I think it is more clear-cut than so, at least if the Poisson situation is 
something to go by. 

There, you can do either of these and get equivalent results

> fit.lung <- glm(cases ~ age + city,  offset=log(pop), 
+ family=poisson, data=lungcancer)
> fit.lung2 <- glm(cases/pop ~ age + city,  weights=pop, 
+ family=poisson, data=lungcancer)
There were 12 warnings (use warnings() to see them)

(Except for the warnings about non-integer responses, which have annoyed some 
epidemiologists trying to work with non-log link fuctions.)

The point is that you need to convert to rates on the LHS and then compensate 
for the fact that this has a smaller variance when the population is larger. 
Counts on the LHS combined with weights wouldn't be right. So I would expect 
that the weighted version of the OP's code should model Catch/Effort, although 
I'm not quite sure how glm.nb reacts to non-integer responses.

-pd


> On 31 Oct 2023, at 17:59 , Ben Bolker  wrote:
> 
>  [Please keep r-help in the cc: list]
> 
>  I don't quite know how to interpret the difference between specifying effort 
> as an offset vs. as weights; I would have to spend more time thinking about 
> it/working through it than I have available at the moment.
> 
>   I don't know that specifying effort as weights is *wrong*, but I don't know 
> that it's right or what it is doing: if I were the reviewer of a paper (for 
> example) I would require you to explain what the difference is and convince 
> me that it was appropriate. (Furthermore, "I want to do it this way because 
> it gives me significant effects" is automatically suspicious.)
> 
>  This would be a good question for CrossValidated 
> (https://stats.stackexchange.com), you could try posting it there (I would be 
> interested in the answer!)
> 
>  cheers
>Ben Bolker
> 
> 
> On 2023-10-30 8:19 p.m., 유준택 wrote:
>> Dear Mr. Bolker,
>> Thank you for the fast response.
>> I also know that a poisson (or negative binomial ) regression of glm is  
>> generally modelled using an offset variable.
>> In this case, when a weights term instead of the offset is used, this gave 
>> me significant coefficients of covariance.
>> I understand that the weights function for exponential family distributions 
>> in glm affects the variance of response variable.
>> I was just wondering whether my first model is a completely wrong model and 
>> the use of offset variable is valid in the case that
>> response variable is  not proportional to offset variable such as my dataset.
>> Sincerely,
>> Joon-Taek
>> 2023년 10월 29일 (일) 오전 3:25, Ben Bolker > >님이 작성:
>>Using an offset of log(Effort) as in your second model is the more
>>standard way to approach this problem; it corresponds to assuming that
>>catch is strictly proportional to effort. Adding log(Effort) as a
>>covariate (as illustrated below) tests whether a power-law model (catch
>>propto (Effort)^(b+1), b!=0) is a better description of the data.  (In
>>this case it is not, although the confidence intervals on b are very
>>wide, indicating that we have very little information -- this is not
>>surprising since the proportional range of effort is very small
>>(246-258) in this data set.
>>In general you should *not* check overdispersion of the raw data
>>(i.e., the *marginal distribution* of the data, you should check
>>overdispersion of a fitted (e.g. Poisson) model, as below.
>>cheers
>> Ben Bolker
>>edata <- data.frame(Catch, Effort, xx1, xx2, xx3)
>>## graphical exploration
>>library(ggplot2); theme_set(theme_bw())
>>library(tidyr)
>>edata_long <- edata |> pivot_longer(names_to="var", cols =-c("Catch",
>>"Effort"))
>>ggplot(edata_long, aes(value, Catch)) +
>>  geom_point(alpha = 0.2, aes(size = Effort)) +
>>  facet_wrap(~var, scale="free_x") +
>>  geom_smooth(method = "glm", method.args = list(family =
>>"quasipoisson"))
>>#
>>library(MASS)
>>g1 <- glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata)
>>g2 <- update(g1, . ~ . + log(Effort))
>>g0 <- glm(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata,
>>family = poisson)
>>performance::check_overdispersion(g0)
>>summary(g1)
>>summary(g2)
>>options(digits = 3)
>>confint(g2)
>>summary(g1)
>>On 2023-10-28 3:30 a.m., 유준택 wrote:
>> > Colleagues,
>> >
>> >
>> >
>> > I have a dataset that includes five variables.
>> >
>> > - Catch: the catch number counted in some species (ind.)
>> >
>> > - Effort: fishing effort (the number of fishing vessels)
>> >
>> > - xx1, xx2, xx3: some environmental factors
>> >
>> > As an overdispersion test on the “Catch” variable, I modeled with
>>negative
>> > binomial distribution using a GLM. The “Effort” variable showed a
>>gradually
>> > decreasing trend during 

Re: [R] weights vs. offset (negative binomial regression)

2023-10-31 Thread Ben Bolker

  [Please keep r-help in the cc: list]

  I don't quite know how to interpret the difference between specifying 
effort as an offset vs. as weights; I would have to spend more time 
thinking about it/working through it than I have available at the moment.


   I don't know that specifying effort as weights is *wrong*, but I 
don't know that it's right or what it is doing: if I were the reviewer 
of a paper (for example) I would require you to explain what the 
difference is and convince me that it was appropriate. (Furthermore, "I 
want to do it this way because it gives me significant effects" is 
automatically suspicious.)


  This would be a good question for CrossValidated 
(https://stats.stackexchange.com), you could try posting it there (I 
would be interested in the answer!)


  cheers
Ben Bolker


On 2023-10-30 8:19 p.m., 유준택 wrote:

Dear Mr. Bolker,

Thank you for the fast response.

I also know that a poisson (or negative binomial ) regression of glm is  
generally modelled using an offset variable.


In this case, when a weights term instead of the offset is used, this 
gave me significant coefficients of covariance.
I understand that the weights function for exponential family 
distributions in glm affects the variance of response variable.


I was just wondering whether my first model is a completely wrong model 
and the use of offset variable is valid in the case that
response variable is  not proportional to offset variable such as my 
dataset.


Sincerely,


Joon-Taek

2023년 10월 29일 (일) 오전 3:25, Ben Bolker >님이 작성:


    Using an offset of log(Effort) as in your second model is the more
standard way to approach this problem; it corresponds to assuming that
catch is strictly proportional to effort. Adding log(Effort) as a
covariate (as illustrated below) tests whether a power-law model (catch
propto (Effort)^(b+1), b!=0) is a better description of the data.  (In
this case it is not, although the confidence intervals on b are very
wide, indicating that we have very little information -- this is not
surprising since the proportional range of effort is very small
(246-258) in this data set.

    In general you should *not* check overdispersion of the raw data
(i.e., the *marginal distribution* of the data, you should check
overdispersion of a fitted (e.g. Poisson) model, as below.

    cheers
     Ben Bolker


edata <- data.frame(Catch, Effort, xx1, xx2, xx3)

## graphical exploration

library(ggplot2); theme_set(theme_bw())
library(tidyr)
edata_long <- edata |> pivot_longer(names_to="var", cols =-c("Catch",
"Effort"))
ggplot(edata_long, aes(value, Catch)) +
      geom_point(alpha = 0.2, aes(size = Effort)) +
      facet_wrap(~var, scale="free_x") +
      geom_smooth(method = "glm", method.args = list(family =
"quasipoisson"))
#

library(MASS)
g1 <- glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata)
g2 <- update(g1, . ~ . + log(Effort))
g0 <- glm(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata,
            family = poisson)
performance::check_overdispersion(g0)
summary(g1)
summary(g2)
options(digits = 3)
confint(g2)
summary(g1)



On 2023-10-28 3:30 a.m., 유준택 wrote:
 > Colleagues,
 >
 >
 >
 > I have a dataset that includes five variables.
 >
 > - Catch: the catch number counted in some species (ind.)
 >
 > - Effort: fishing effort (the number of fishing vessels)
 >
 > - xx1, xx2, xx3: some environmental factors
 >
 > As an overdispersion test on the “Catch” variable, I modeled with
negative
 > binomial distribution using a GLM. The “Effort” variable showed a
gradually
 > decreasing trend during the study period. I was able to get the
results I
 > wanted when considered “Effort” function as a weights function in the
 > negative binomial regression as follows:
 >
 >
 >
 > library(qcc)
 >
 >
Catch=c(25,2,7,6,75,5,1,4,66,15,9,25,40,8,7,4,36,11,1,14,141,9,74,38,126,3)
 >
 >

Effort=c(258,258,258,258,258,258,258,254,252,252,252,252,252,252,252,252,252,252,252,248,246,246,246,246,246,246)
 >
 >

xx1=c(0.8,0.5,1.2,0.5,1.1,1.1,1.0,0.6,0.9,0.5,1.2,0.6,1.2,0.7,1.0,0.6,1.6,0.7,0.8,0.6,1.7,0.9,1.1,0.5,1.4,0.5)
 >
 >

xx2=c(1.7,1.6,2.7,2.6,1.5,1.5,2.8,2.5,1.7,1.9,2.2,2.4,1.6,1.4,3.0,2.4,1.4,1.5,2.2,2.3,1.7,1.7,1.9,1.9,1.4,1.4)
 >
 >

xx3=c(188,40,2,10,210,102,117,14,141,28,48,15,220,115,10,14,320,20,3,10,400,150,145,160,460,66)
 >
 > #
 >
 > edata <- data.frame(Catch, Effort, xx1, xx2, xx3)
 >
 > #
 >
 > qcc.overdispersion.test(edata$Catch, type="poisson")
 >
 > #
 >
 > summary(glm.nb(Catch~xx1+xx2+xx3, weights=Effort, data=edata))
 >
 > summary(glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata))
 >
  

Re: [R] weights vs. offset (negative binomial regression)

2023-10-28 Thread Ben Bolker
  Using an offset of log(Effort) as in your second model is the more 
standard way to approach this problem; it corresponds to assuming that 
catch is strictly proportional to effort. Adding log(Effort) as a 
covariate (as illustrated below) tests whether a power-law model (catch 
propto (Effort)^(b+1), b!=0) is a better description of the data.  (In 
this case it is not, although the confidence intervals on b are very 
wide, indicating that we have very little information -- this is not 
surprising since the proportional range of effort is very small 
(246-258) in this data set.


  In general you should *not* check overdispersion of the raw data 
(i.e., the *marginal distribution* of the data, you should check 
overdispersion of a fitted (e.g. Poisson) model, as below.


  cheers
   Ben Bolker


edata <- data.frame(Catch, Effort, xx1, xx2, xx3)

## graphical exploration

library(ggplot2); theme_set(theme_bw())
library(tidyr)
edata_long <- edata |> pivot_longer(names_to="var", cols =-c("Catch", 
"Effort"))

ggplot(edata_long, aes(value, Catch)) +
geom_point(alpha = 0.2, aes(size = Effort)) +
facet_wrap(~var, scale="free_x") +
geom_smooth(method = "glm", method.args = list(family = 
"quasipoisson"))

#

library(MASS)
g1 <- glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata)
g2 <- update(g1, . ~ . + log(Effort))
g0 <- glm(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata,
  family = poisson)
performance::check_overdispersion(g0)
summary(g1)
summary(g2)
options(digits = 3)
confint(g2)
summary(g1)



On 2023-10-28 3:30 a.m., 유준택 wrote:

Colleagues,



I have a dataset that includes five variables.

- Catch: the catch number counted in some species (ind.)

- Effort: fishing effort (the number of fishing vessels)

- xx1, xx2, xx3: some environmental factors

As an overdispersion test on the “Catch” variable, I modeled with negative
binomial distribution using a GLM. The “Effort” variable showed a gradually
decreasing trend during the study period. I was able to get the results I
wanted when considered “Effort” function as a weights function in the
negative binomial regression as follows:



library(qcc)

Catch=c(25,2,7,6,75,5,1,4,66,15,9,25,40,8,7,4,36,11,1,14,141,9,74,38,126,3)

Effort=c(258,258,258,258,258,258,258,254,252,252,252,252,252,252,252,252,252,252,252,248,246,246,246,246,246,246)

xx1=c(0.8,0.5,1.2,0.5,1.1,1.1,1.0,0.6,0.9,0.5,1.2,0.6,1.2,0.7,1.0,0.6,1.6,0.7,0.8,0.6,1.7,0.9,1.1,0.5,1.4,0.5)

xx2=c(1.7,1.6,2.7,2.6,1.5,1.5,2.8,2.5,1.7,1.9,2.2,2.4,1.6,1.4,3.0,2.4,1.4,1.5,2.2,2.3,1.7,1.7,1.9,1.9,1.4,1.4)

xx3=c(188,40,2,10,210,102,117,14,141,28,48,15,220,115,10,14,320,20,3,10,400,150,145,160,460,66)

#

edata <- data.frame(Catch, Effort, xx1, xx2, xx3)

#

qcc.overdispersion.test(edata$Catch, type="poisson")

#

summary(glm.nb(Catch~xx1+xx2+xx3, weights=Effort, data=edata))

summary(glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata))



I am not sure the application of the weights function to the negative
binomial regression is correct. Also I wonder if there is a better way
doing this. Can anyone help?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] weights vs. offset (negative binomial regression)

2023-10-28 Thread 유준택
Colleagues,



I have a dataset that includes five variables.

- Catch: the catch number counted in some species (ind.)

- Effort: fishing effort (the number of fishing vessels)

- xx1, xx2, xx3: some environmental factors

As an overdispersion test on the “Catch” variable, I modeled with negative
binomial distribution using a GLM. The “Effort” variable showed a gradually
decreasing trend during the study period. I was able to get the results I
wanted when considered “Effort” function as a weights function in the
negative binomial regression as follows:



library(qcc)

Catch=c(25,2,7,6,75,5,1,4,66,15,9,25,40,8,7,4,36,11,1,14,141,9,74,38,126,3)

Effort=c(258,258,258,258,258,258,258,254,252,252,252,252,252,252,252,252,252,252,252,248,246,246,246,246,246,246)

xx1=c(0.8,0.5,1.2,0.5,1.1,1.1,1.0,0.6,0.9,0.5,1.2,0.6,1.2,0.7,1.0,0.6,1.6,0.7,0.8,0.6,1.7,0.9,1.1,0.5,1.4,0.5)

xx2=c(1.7,1.6,2.7,2.6,1.5,1.5,2.8,2.5,1.7,1.9,2.2,2.4,1.6,1.4,3.0,2.4,1.4,1.5,2.2,2.3,1.7,1.7,1.9,1.9,1.4,1.4)

xx3=c(188,40,2,10,210,102,117,14,141,28,48,15,220,115,10,14,320,20,3,10,400,150,145,160,460,66)

#

edata <- data.frame(Catch, Effort, xx1, xx2, xx3)

#

qcc.overdispersion.test(edata$Catch, type="poisson")

#

summary(glm.nb(Catch~xx1+xx2+xx3, weights=Effort, data=edata))

summary(glm.nb(Catch~xx1+xx2+xx3+offset(log(Effort)), data=edata))



I am not sure the application of the weights function to the negative
binomial regression is correct. Also I wonder if there is a better way
doing this. Can anyone help?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.