[R] Calculation of extremely low p-values (in lm)

2012-12-03 Thread Sindri
Dear R-users

Please excuse me if this topic has been covered before, but I was unable to
find anything relevant by searching

I am currently doing a comparison of two biological variables that have a
highly significant linear relationship.   I know that the p-value of linear
regression is not so interesting in itself, but this particular value does
raise a question.

How does R calculate (extremely low) p-values for linear regression?  

For my data I got a p-value on the order of 10^-9 and a reviewer commented
on this.  I tried to run the same analysis in both SAS and Sigmastat to be
sure that I was doing it right, but both these programs only return a
p-value of p  0.0001
Since I am unable to reproduce my results in another statistics program, it
would be nice to be able to explain this unusally low p-value to the
reviewers.


This problem can be illustrated with the following made-up data:

x_var-c(0.149,0.178,0.3474,0.167,0.121,0.182,0.176,0.448,0.091,0.083,0.090,0.407,0.378,0.132,0.227,0.172,0.088,0.392,0.425,0.150,0.319,0.190,0.171,0.290,0.214,0.431,0.193)

y_var-c(0.918,0.394,0.131,0.9084,0.916,0.934,0.928,0.279,0.830,0.927,0.964,0.323,0.097,0.914,0.614,0.790,0.984,0.530,0.207,0.858,0.408,0.919,0.869,0.347,0.834,0.276,0.940)

fit-lm(y_var~x_var)

 summary(fit)

Call:
lm(formula = y_var ~ x_var)

Residuals:
 Min   1Q   Median   3Q  Max 
-0.39152 -0.06027  0.00933  0.10024  0.22711 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  1.186960.06394  18.562 3.90e-16 ***
x_var   -2.255290.24788  -9.098 2.08e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.1503 on 25 degrees of freedom
Multiple R-squared: 0.768,  Adjusted R-squared: 0.7588 
F-statistic: 82.78 on 1 and 25 DF,  p-value: 2.083e-09 


With kind regards,
Sindri Traustason



-
-
Sindri Traustason
Glostrup Hospital Ophthalmology Research Dept.
Copenhagen, Demark

--
View this message in context: 
http://r.789695.n4.nabble.com/Calculation-of-extremely-low-p-values-in-lm-tp4651823.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of extremely low p-values (in lm)

2012-12-03 Thread Robert Baer

On 12/3/2012 6:20 AM, Sindri wrote:

Dear R-users

Please excuse me if this topic has been covered before, but I was unable to
find anything relevant by searching

I am currently doing a comparison of two biological variables that have a
highly significant linear relationship.   I know that the p-value of linear
regression is not so interesting in itself, but this particular value does
raise a question.

How does R calculate (extremely low) p-values for linear regression?

For my data I got a p-value on the order of 10^-9 and a reviewer commented
on this.  I tried to run the same analysis in both SAS and Sigmastat to be
sure that I was doing it right, but both these programs only return a
p-value of p  0.0001
Since I am unable to reproduce my results in another statistics program, it
would be nice to be able to explain this unusally low p-value to the
reviewers.
This is a matter of you understanding that the p-value is an area under 
a probability density curve.  R is simply printing out the actual area 
in a tail of some distribution.  The other statistical program is making 
the assumption that you are using the p-value to compare to a cutoff 
alpha value that is (in most fields) never set much below p0.001.  If p 
 alpha the hypothesis test crowd , would choose to reject NULL  
hypothesis, so the other statistics programs take the attitude --  why 
provide more detail?.  R chooses to give you the actual number and let 
you do what you will with it.  You could probably benefit from reviewing 
hypothesis testing in a basic statistics book if this is not clear.


Note that 10e-9 is indeed less than 0.0001, so the programs don't 
disagree.  R just provides more detail.




This problem can be illustrated with the following made-up data:

x_var-c(0.149,0.178,0.3474,0.167,0.121,0.182,0.176,0.448,0.091,0.083,0.090,0.407,0.378,0.132,0.227,0.172,0.088,0.392,0.425,0.150,0.319,0.190,0.171,0.290,0.214,0.431,0.193)

y_var-c(0.918,0.394,0.131,0.9084,0.916,0.934,0.928,0.279,0.830,0.927,0.964,0.323,0.097,0.914,0.614,0.790,0.984,0.530,0.207,0.858,0.408,0.919,0.869,0.347,0.834,0.276,0.940)

fit-lm(y_var~x_var)


summary(fit)

Call:
lm(formula = y_var ~ x_var)

Residuals:
  Min   1Q   Median   3Q  Max
-0.39152 -0.06027  0.00933  0.10024  0.22711

Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)  1.186960.06394  18.562 3.90e-16 ***
x_var   -2.255290.24788  -9.098 2.08e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1503 on 25 degrees of freedom
Multiple R-squared: 0.768,  Adjusted R-squared: 0.7588
F-statistic: 82.78 on 1 and 25 DF,  p-value: 2.083e-09


With kind regards,
Sindri Traustason



-
-
Sindri Traustason
Glostrup Hospital Ophthalmology Research Dept.
Copenhagen, Demark

--
View this message in context: 
http://r.789695.n4.nabble.com/Calculation-of-extremely-low-p-values-in-lm-tp4651823.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
__
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksille College of Osteopathic Medicine
A. T. Still University of Health Sciences
Kirksville, MO 63501 USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculation of extremely low p-values (in lm)

2012-12-03 Thread Rui Barradas

Hello,

It's easy to see what's going on by reading the sources, to be open 
source is one of the strong points of R, we know exactly how the values 
are computed. A reviewer might like to have an explanation of what R does.
The op could check with Friedrich Leisch's Creating R Packages: A 
Tutorial, it's running example on S3 classes is precisely the linear 
model. The relevant functions and the way to call them are as follows. 
Note that the p-values are computed using the distribution function, 
pt(), that gives the area under the density, and that the returned 
values are multiplied by two, since the test is two-sided. I've edited 
the code a bit, to give an other way of computing the p-values. The 
results are the same as the results of R's summary.lm() in package stats 
and the code is easy to follow.



linmodEst - function(x, y){
## compute QR-decomposition of x
qx - qr(x)
## compute (x'x)^(-1) x'y
coef - solve.qr(qx, y)
## degrees of freedom and standard deviation of residuals
df - nrow(x) - ncol(x)
sigma2 - sum((y - x %*% coef)^2)/df
## compute sigma^2 * (x'x)^-1
vcov - sigma2 * chol2inv(qx$qr)
colnames(vcov) - rownames(vcov) - colnames(x)
list(coefficients = coef,
vcov = vcov,
sigma = sqrt(sigma2),
df = df)
}

summary.linmod - function(object, ...){
se - sqrt(diag(object$vcov))
tval - coef(object) / se
TAB - cbind(Estimate = coef(object),
StdErr = se,
t.value = tval,
p.value = 2*pt(-abs(tval), df=object$df),
p.value2 = 2*pt(abs(tval), df=object$df, lower.tail = FALSE))
res - list(call=object$call,
coefficients=TAB)
#class(res) - summary.linmod
res
}


mod - linmodEst(cbind(Const = 1, x_var), y_var)
summary.linmod(mod)


Hope this helps,

Rui Barradas


Em 03-12-2012 14:26, Robert Baer escreveu:

On 12/3/2012 6:20 AM, Sindri wrote:

Dear R-users

Please excuse me if this topic has been covered before, but I was 
unable to

find anything relevant by searching

I am currently doing a comparison of two biological variables that 
have a
highly significant linear relationship.   I know that the p-value of 
linear
regression is not so interesting in itself, but this particular value 
does

raise a question.

How does R calculate (extremely low) p-values for linear regression?

For my data I got a p-value on the order of 10^-9 and a reviewer 
commented
on this.  I tried to run the same analysis in both SAS and Sigmastat 
to be

sure that I was doing it right, but both these programs only return a
p-value of p  0.0001
Since I am unable to reproduce my results in another statistics 
program, it

would be nice to be able to explain this unusally low p-value to the
reviewers.
This is a matter of you understanding that the p-value is an area 
under a probability density curve.  R is simply printing out the 
actual area in a tail of some distribution.  The other statistical 
program is making the assumption that you are using the p-value to 
compare to a cutoff alpha value that is (in most fields) never set 
much below p0.001.  If p  alpha the hypothesis test crowd , would 
choose to reject NULL  hypothesis, so the other statistics programs 
take the attitude --  why provide more detail?.  R chooses to give 
you the actual number and let you do what you will with it.  You could 
probably benefit from reviewing hypothesis testing in a basic 
statistics book if this is not clear.


Note that 10e-9 is indeed less than 0.0001, so the programs don't 
disagree.  R just provides more detail.




This problem can be illustrated with the following made-up data:

x_var-c(0.149,0.178,0.3474,0.167,0.121,0.182,0.176,0.448,0.091,0.083,0.090,0.407,0.378,0.132,0.227,0.172,0.088,0.392,0.425,0.150,0.319,0.190,0.171,0.290,0.214,0.431,0.193) 



y_var-c(0.918,0.394,0.131,0.9084,0.916,0.934,0.928,0.279,0.830,0.927,0.964,0.323,0.097,0.914,0.614,0.790,0.984,0.530,0.207,0.858,0.408,0.919,0.869,0.347,0.834,0.276,0.940) 



fit-lm(y_var~x_var)


summary(fit)

Call:
lm(formula = y_var ~ x_var)

Residuals:
  Min   1Q   Median   3Q  Max
-0.39152 -0.06027  0.00933  0.10024  0.22711

Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)  1.186960.06394  18.562 3.90e-16 ***
x_var   -2.255290.24788  -9.098 2.08e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1503 on 25 degrees of freedom
Multiple R-squared: 0.768,Adjusted R-squared: 0.7588
F-statistic: 82.78 on 1 and 25 DF,  p-value: 2.083e-09


With kind regards,
Sindri Traustason



-
-
Sindri Traustason
Glostrup Hospital Ophthalmology Research Dept.
Copenhagen, Demark

--
View this message in context: 
http://r.789695.n4.nabble.com/Calculation-of-extremely-low-p-values-in-lm-tp4651823.html

Sent from the R help mailing list archive at Nabble.com.