In the book Modern Applied Statistics with S, 4th edition, 2002, by
Venables and Ripley, there is a function logitreg on page 445, which
does provide the weighted logistic regression I asked, judging by the
loss function. And interesting enough, logitreg provides the same
coefficients as glm in the example I provided earlier, even with weights
< 1. Also for residual deviance, logitreg yields the same number as glm.
Unless I misunderstood something, I am convinced that glm is a
valid tool for weighted logistic regression despite the description on
weights and somehow questionable logLik value in the case of non-integer
weights < 1. Perhaps this is a bold claim: the description of weights
can be modified and logLik can be updated as well.
The stackexchange inquiry I provided is what I feel interesting, not the
link in that post. Sorry for the confusion.
On Sat, Aug 29, 2020 at 10:18 AM John Smith <jsw...@gmail.com
<mailto:jsw...@gmail.com>> wrote:
Thanks for very insightful thoughts. What I am trying to achieve
with the weights is actually not new, something like
https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances.
I thought my inquiry was not too strange, and I could utilize some
existing codes. It is just an optimization problem at the end of
day, or not? Thanks
On Sat, Aug 29, 2020 at 9:02 AM John Fox <j...@mcmaster.ca
<mailto:j...@mcmaster.ca>> wrote:
Dear John,
On 2020-08-29 1:30 a.m., John Smith wrote:
> Thanks Prof. Fox.
>
> I am curious: what is the model estimated below?
Nonsense, as Peter explained in a subsequent response to your
prior posting.
>
> I guess my inquiry seems more complicated than I thought:
with y being 0/1, how to fit weighted logistic regression with
weights <1, in the sense of weighted least squares? Thanks
What sense would that make? WLS is meant to account for
non-constant
error variance in a linear model, but in a binomial GLM, the
variance is
purely a function for the mean.
If you had binomial (rather than binary 0/1) observations (i.e.,
binomial trials exceeding 1), then you could account for
overdispersion,
e.g., by introducing a dispersion parameter via the quasibinomial
family, but that isn't equivalent to variance weights in a LM,
rather to
the error-variance parameter in a LM.
I guess the question is what are you trying to achieve with the
weights?
Best,
John
>
>> On Aug 28, 2020, at 10:51 PM, John Fox <j...@mcmaster.ca
<mailto:j...@mcmaster.ca>> wrote:
>>
>> Dear John
>>
>> I think that you misunderstand the use of the weights
argument to glm() for a binomial GLM. From ?glm: "For a binomial
GLM prior weights are used to give the number of trials when the
response is the proportion of successes." That is, in this case
y should be the observed proportion of successes (i.e., between
0 and 1) and the weights are integers giving the number of
trials for each binomial observation.
>>
>> I hope this helps,
>> John
>>
>> John Fox, Professor Emeritus
>> McMaster University
>> Hamilton, Ontario, Canada
>> web: https://socialsciences.mcmaster.ca/jfox/
>>
>>> On 2020-08-28 9:28 p.m., John Smith wrote:
>>> If the weights < 1, then we have different values! See an
example below.
>>> How should I interpret logLik value then?
>>> set.seed(135)
>>> y <- c(rep(0, 50), rep(1, 50))
>>> x <- rnorm(100)
>>> data <- data.frame(cbind(x, y))
>>> weights <- c(rep(1, 50), rep(2, 50))
>>> fit <- glm(y~x, data, family=binomial(), weights/10)
>>> res.dev <http://res.dev> <- residuals(fit, type="deviance")
>>> res2 <- -0.5*res.dev <http://res.dev>^2
>>> cat("loglikelihood value", logLik(fit), sum(res2), "\n")
>>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard
<pda...@gmail.com <mailto:pda...@gmail.com>> wrote:
>>>> If you don't worry too much about an additive constant,
then half the
>>>> negative squared deviance residuals should do. (Not quite
sure how weights
>>>> factor in. Looks like they are accounted for.)
>>>>
>>>> -pd
>>>>
>>>>> On 25 Aug 2020, at 17:33 , John Smith <jsw...@gmail.com
<mailto:jsw...@gmail.com>> wrote:
>>>>>
>>>>> Dear R-help,
>>>>>
>>>>> The function logLik can be used to obtain the maximum
log-likelihood
>>>> value
>>>>> from a glm object. This is an aggregated value, a
summation of individual
>>>>> log-likelihood values. How do I obtain individual values?
In the
>>>> following
>>>>> example, I would expect 9 numbers since the response has
length 9. I
>>>> could
>>>>> write a function to compute the values, but there are lots of
>>>>> family members in glm, and I am trying not to reinvent
wheels. Thanks!
>>>>>
>>>>> counts <- c(18,17,15,20,10,20,25,13,12)
>>>>> outcome <- gl(3,1,9)
>>>>> treatment <- gl(3,3)
>>>>> data.frame(treatment, outcome, counts) # showing data
>>>>> glm.D93 <- glm(counts ~ outcome + treatment, family
= poisson())
>>>>> (ll <- logLik(glm.D93))
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help@r-project.org <mailto:R-help@r-project.org>
mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>
>>>> --
>>>> Peter Dalgaard, Professor,
>>>> Center for Statistics, Copenhagen Business School
>>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>>> Phone: (+45)38153501
>>>> Office: A 4.23
>>>> Email: pd....@cbs.dk <mailto:pd....@cbs.dk> Priv:
pda...@gmail.com <mailto:pda...@gmail.com>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>> [[alternative HTML version deleted]]
>>> ______________________________________________
>>> R-help@r-project.org <mailto:R-help@r-project.org> mailing
list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained,
reproducible code.