Dear Hamed, I don't think that anyone has picked up on this problem.
What's peculiar about your weights is that several are 0 within rounding error but not exactly 0: > head(df) y x weight 1 1.5115614 0.5520924 2.117337e-34 2 -0.6365313 -0.1259932 2.117337e-34 3 0.3778278 0.4209538 4.934135e-31 4 3.0379232 1.4031545 2.679495e-24 5 1.5364652 0.4607686 2.679495e-24 6 -2.3772787 -0.7396358 6.244160e-21 I can reproduce the results that you report: > (mod.1 <- lm(y ~ x, data=df)) Call: lm(formula = y ~ x, data = df) Coefficients: (Intercept) x -0.04173 2.03790 > max(resid(mod.1)) [1] 1.14046 > (mod.2 <- lm(y ~ x, data=df, weights=weight)) Call: lm(formula = y ~ x, data = df, weights = weight) Coefficients: (Intercept) x -0.05786 1.96087 > max(resid(mod.2)) [1] 36.84939 But the problem disappears when the tiny nonzero weight are set to 0: > df2 <- df > df2$weight <- zapsmall(df2$weight) > head(df2) y x weight 1 1.5115614 0.5520924 0 2 -0.6365313 -0.1259932 0 3 0.3778278 0.4209538 0 4 3.0379232 1.4031545 0 5 1.5364652 0.4607686 0 6 -2.3772787 -0.7396358 0 > (mod.3 <- update(mod.2, data=df2)) Call: lm(formula = y ~ x, data = df2, weights = weight) Coefficients: (Intercept) x -0.05786 1.96087 > max(resid(mod.3)) [1] 1.146663 I don't know exactly why this happens, but suspect numerical instability produced by the near-zero weights, which are smaller than the machine double-epsilon > .Machine$double.neg.eps [1] 1.110223e-16 The problem also disappears, e.g., if the tiny weight are set to 1e-15 rather than 0. I hope this helps, John ----------------------------------------------------------------- John Fox Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: https://socialsciences.mcmaster.ca/jfox/ > -----Original Message----- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Hamed Ha > Sent: Tuesday, September 11, 2018 8:39 AM > To: r-help@r-project.org > Subject: [R] Problem with lm.resid() when weights are provided > > Dear R Help Team. > > I get some weird results when I use the lm function with weight. The issue can > be reproduced by the example below: > > > The input data is (weights are intentionally designed to reflect some > structures in the data) > > > > df > y x weight > 1.51156139 0.55209240 2.117337e-34 > -0.63653132 -0.12599316 2.117337e-34 > 0.37782776 0.42095384 4.934135e-31 > 3.03792318 1.40315446 2.679495e-24 > 1.53646523 0.46076858 2.679495e-24 > -2.37727874 -0.73963576 6.244160e-21 > 0.37183065 0.20407468 1.455107e-17 > -1.53917553 -0.95519361 1.455107e-17 > 1.10926675 0.03897129 3.390908e-14 > -0.37786333 -0.17523593 3.390908e-14 > 2.43973603 0.97970095 7.902000e-11 > -0.35432394 -0.03742559 7.902000e-11 > 2.19296613 1.00355263 4.289362e-04 > 0.49845532 0.34816207 4.289362e-04 > 1.25005260 0.76306225 5.000000e-01 > 0.84360691 0.45152356 5.000000e-01 > 0.29565993 0.53880068 5.000000e-01 > -0.54081334 -0.28104525 5.000000e-01 > 0.83612836 -0.12885659 9.995711e-01 > -1.42526769 -0.87107631 9.999998e-01 > 0.10204789 -0.11649899 1.000000e+00 > 1.14292898 0.37249631 1.000000e+00 > -3.02942081 -1.28966997 1.000000e+00 > -1.37549764 -0.74676145 1.000000e+00 > -2.00118016 -0.55182759 1.000000e+00 > -4.24441674 -1.94603608 1.000000e+00 > 1.17168144 1.00868008 1.000000e+00 > 2.64007761 1.26333069 1.000000e+00 > 1.98550114 1.18509599 1.000000e+00 > -0.58941683 -0.61972416 9.999998e-01 > -4.57559611 -2.30914920 9.995711e-01 > -0.82610544 -0.39347576 9.995711e-01 > -0.02768220 0.20076910 9.995711e-01 > 0.78186399 0.25690215 9.995711e-01 > -0.88314153 -0.20200148 5.000000e-01 > -4.17076452 -2.03547588 5.000000e-01 > 0.93373070 0.54190626 4.289362e-04 > -0.08517734 0.17692491 4.289362e-04 > -4.47546619 -2.14876688 4.289362e-04 > -1.65509103 -0.76898087 4.289362e-04 > -0.39403030 -0.12689705 4.289362e-04 > 0.01203300 -0.18689898 1.841442e-07 > -4.82762639 -2.31391121 1.841442e-07 > -0.72658380 -0.39751171 3.397282e-14 > -2.35886866 -1.01082109 0.000000e+00 > -2.03762707 -0.96439902 0.000000e+00 > 0.90115123 0.60172286 0.000000e+00 > 1.55999194 0.83433953 0.000000e+00 > 3.07994058 1.30942776 0.000000e+00 > 1.78871462 1.10605530 0.000000e+00 > > > > Running simple linear model returns: > > > lm(y~x,data=df) > > Call: > lm(formula = y ~ x, data = df) > > Coefficients: > (Intercept) x > -0.04173 2.03790 > > and > > max(resid(lm(y~x,data=df))) > [1] 1.14046 > > > *HOWEVER if I use the weighted model then:* > > lm(formula = y ~ x, data = df, weights = df$weights) > > Coefficients: > (Intercept) x > -0.05786 1.96087 > > and > > max(resid(lm(y~x,data=df,weights=df$weights))) > [1] 60.91888 > > > as you see, the estimation of the coefficients are nearly the same but the > resid() function returns a giant residual (I have some cases where the value > is > much much higher). Further, if I calculate the residuals by simply > predict(lm(y~x,data=df,weights=df$weights))-df$y then I get the true value for > the residuals. > > > Thanks. > > Please do not hesitate to contact me for more details. > Regards, > Hamed. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.