Raffa, I ran this on a MacOS machine and got what you expected. I added a call to sessionInfo() for your information.
> rm(list=ls()) > N = 30000 > xvar <- runif(N, -10, 10) > e <- rnorm(N, mean=0, sd=1) > yvar <- 1 + 2*xvar + e > plot(xvar,yvar) > lmMod <- lm(yvar~xvar) > print(summary(lmMod)) Call: lm(formula = yvar ~ xvar) Residuals: Min 1Q Median 3Q Max -4.2407 -0.6738 -0.0031 0.6822 4.0619 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.0059022 0.0057370 175.3 <2e-16 *** xvar 2.0005811 0.0009918 2017.2 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9937 on 29998 degrees of freedom Multiple R-squared: 0.9927, Adjusted R-squared: 0.9927 F-statistic: 4.069e+06 on 1 and 29998 DF, p-value: < 2.2e-16 > domain <- seq(min(xvar), max(xvar)) # define a vector of x values to feed > into model > lines(domain, predict(lmMod, newdata = data.frame(xvar=domain))) # add > regression line, using `predict` to generate y-values > sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.4 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.0 R. Mark Sharp, Ph.D. Data Scientist and Biomedical Statistical Consultant 7526 Meadow Green St. San Antonio, TX 78251 mobile: 210-218-2868 rmsh...@me.com > On May 25, 2019, at 7:38 AM, Raffa <raffamai...@gmail.com> wrote: > > I have the following code: > > ``` > > rm(list=ls()) > N = 30000 > xvar <- runif(N, -10, 10) > e <- rnorm(N, mean=0, sd=1) > yvar <- 1 + 2*xvar + e > plot(xvar,yvar) > lmMod <- lm(yvar~xvar) > print(summary(lmMod)) > domain <- seq(min(xvar), max(xvar)) # define a vector of x values to > feed into model > lines(domain, predict(lmMod, newdata = data.frame(xvar=domain))) # > add regression line, using `predict` to generate y-values > > ``` > > I expected the coefficients to be something similar to [1,2]. Instead R > keeps throwing at me random numbers that are not statistically > significant and don't fit the model, and I have 20k observations. For > example > > ``` > > Call: > lm(formula = yvar ~ xvar) > > Residuals: > Min 1Q Median 3Q Max > -21.384 -8.908 1.016 10.972 23.663 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.0007145 0.0670316 0.011 0.991 > xvar 0.0168271 0.0116420 1.445 0.148 > > Residual standard error: 11.61 on 29998 degrees of freedom > Multiple R-squared: 7.038e-05, Adjusted R-squared: 3.705e-05 > F-statistic: 2.112 on 1 and 29998 DF, p-value: 0.1462 > > ``` > > > The strange thing is that the code works perfectly for N=200 or N=2000. > It's only for larger N that this thing happen U(for example, N=20000). I > have tried to ask for example in CrossValidated > <https://stats.stackexchange.com/questions/410050/increasing-number-of-observations-worsen-the-regression-model> > > but the code works for them. Any help? > > I am runnign R 3.6.0 on Kubuntu 19.04 > > Best regards > > Raffaele > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.