Dear Hadley, > -----Original Message----- > From: hadley wickham [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 09, 2007 2:21 AM > To: John Fox > Cc: [email protected] > Subject: Re: [R] Weighted least squares > > Thanks John, > > That's just the explanation I was looking for. I had hoped > that there would be a built in way of dealing with them with > R, but obviously not. > > Given that explanation, it stills seems to me that the way R > calculates n is suboptimal, as demonstrated by my second example: > > summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) > summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50))) > > the weights are only very slightly different but the > estimates of residual standard error are quite different (20 > vs 14 in my run) >
Observations with 0 weight are literally excluded, while those with very small weight (relative to others) don't contribute much to the fit. Consequently you get very similar coefficients but different numbers of observations. I hope this helps, John > Hadley > > On 5/8/07, John Fox <[EMAIL PROTECTED]> wrote: > > Dear Hadley, > > > > I think that the problem is that the term "weights" has different > > meanings, which, although they are related, are not quite the same. > > > > The weights used by lm() are (inverse-)"variance weights," > reflecting > > the variances of the errors, with observations that have > low-variance > > errors therefore being accorded greater weight in the > resulting WLS regression. > > What you have are sometimes called "case weights," and I'm > unaware of > > a general way of handling them in R, although you could > regenerate the > > unaggregated data. As you discovered, you get the same coefficients > > with case weights as with variance weights, but different > standard errors. > > Finally, there are "sampling weights," which are inversely > > proportional to the probability of selection; these are > accommodated by the survey package. > > > > To complicate matters, this terminology isn't entirely standard. > > > > I hope this helps, > > John > > > > -------------------------------- > > John Fox, Professor > > Department of Sociology > > McMaster University > > Hamilton, Ontario > > Canada L8S 4M4 > > 905-525-9140x23604 > > http://socserv.mcmaster.ca/jfox > > -------------------------------- > > > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of hadley > > > wickham > > > Sent: Tuesday, May 08, 2007 5:09 AM > > > To: R Help > > > Subject: [R] Weighted least squares > > > > > > Dear all, > > > > > > I'm struggling with weighted least squares, where > something that I > > > had assumed to be true appears not to be the case. > > > Take the following data set as an example: > > > > > > df <- data.frame(x = runif(100, 0, 100)) df$y <- df$x + 1 + > > > rnorm(100, sd=15) > > > > > > I had expected that: > > > > > > summary(lm(y ~ x, data=df, weights=rep(2, 100))) > summary(lm(y ~ x, > > > data=rbind(df,df))) > > > > > > would be equivalent, but they are not. I suspect the > difference is > > > how the degrees of freedom is calculated - I had expected > it to be > > > sum(weights), but seems to be sum(weights > 0). This seems > > > unintuitive to me: > > > > > > summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) > > > summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50))) > > > > > > What am I missing? And what is the usual way to do a linear > > > regression when you have aggregated data? > > > > > > Thanks, > > > > > > Hadley > > > > > > ______________________________________________ > > > [email protected] mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
