Re: [R] lm fails on some large input

2019-04-18 Thread Dingyuan Wang

The final goal is to make two lines and find the intersection point.
I don't want to argue more about the reason.

The tol suggestion is reasonable, and I'll take that.

2019/4/19 4:12, Jeff Newmiller:

The fact that you think x~y is interchangeable with y~x suggests to me that you 
will have a difficult time convincing R Core that this is a bug. I recommend 
that you take at leastan upper division college course in linear regression 
first.

On April 18, 2019 9:35:55 AM PDT, Dingyuan Wang  wrote:

I just want to make a line out of timestamps vs some coordinates, so
y~x
or x~y doesn't matter.

Yes, I know the answer. When trying R, I'm surprised that R can't solve

that either. I first noticed that PostgreSQL can't solve it, and found
that they fixed that in pg 12.

https://www.postgresql.org/message-id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org

Therefore I come to ask whether someone know how to fix this in R, or I

must submit it as a bug?

2019/4/18 23:24, Michael Dewey:

Perhaps subtract 1506705766 from y?

Saying some other software does it well implies you know what the
_correct_ answer is here but I would question what that means with

this

sort of data-set.

On 17/04/2019 07:26, Dingyuan Wang wrote:

Hi,

This input doesn't have any interesting properties except y is unix
time. Spreadsheets can do this well.
Is this a bug that lm can't do x ~ y?

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
101.632, 108.928, 94.08)
  > y = c(1506705739.385, 1506705766.895, 1506705746.293,
1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26,
1506705761.307, 1506705747.372)
  > m = lm(x ~ y)
  > summary(m)

Call:
lm(formula = x ~ y)

Residuals:
   Min   1Q   Median   3Q  Max
-27.0222 -14.9902  -0.6542  14.1938  29.1698

Coefficients: (1 not defined because of singularities)
  Estimate Std. Error t value Pr(>|t|)
(Intercept)   94.734  6.511   14.55 4.88e-07 ***
y NA NA  NA   NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.53 on 8 degrees of freedom

  > summary(lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
  Min  1Q  Median  3Q Max
-2.1687 -1.3345 -0.9466  1.3826  2.6551

Coefficients:
   Estimate Std. Error   t value Pr(>|t|)
(Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
x   6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.885 on 7 degrees of freedom
Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
https://www.avg.com






__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm fails on some large input

2019-04-18 Thread Fox, John
Dear Dingyuan Wang,

But your question was answered clearly earlier in this thread (I forget by 
whom), showing that lm() provides the solution to the regression of x on y if 
the criterion for singularity is tightened:

> lm(x ~ y)

Call:
lm(formula = x ~ y)

Coefficients:
(Intercept)y  
  94.73   NA  

> lm(x ~ y, tol=1e-10)

Call:
lm(formula = x ~ y, tol = 1e-10)

Coefficients:
(Intercept)y  
 -2.403e+091.595e+00  

Best,
 John

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dingyuan
> Wang
> Sent: Thursday, April 18, 2019 12:36 PM
> To: Michael Dewey ; r-help@r-project.org
> Subject: Re: [R] lm fails on some large input
> 
> I just want to make a line out of timestamps vs some coordinates, so y~x or
> x~y doesn't matter.
> 
> Yes, I know the answer. When trying R, I'm surprised that R can't solve that
> either. I first noticed that PostgreSQL can't solve it, and found that they 
> fixed
> that in pg 12.
> 
> https://www.postgresql.org/message-
> id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org
> 
> Therefore I come to ask whether someone know how to fix this in R, or I must
> submit it as a bug?
> 
> 2019/4/18 23:24, Michael Dewey:
> > Perhaps subtract 1506705766 from y?
> >
> > Saying some other software does it well implies you know what the
> > _correct_ answer is here but I would question what that means with
> > this sort of data-set.
> >
> > On 17/04/2019 07:26, Dingyuan Wang wrote:
> >> Hi,
> >>
> >> This input doesn't have any interesting properties except y is unix
> >> time. Spreadsheets can do this well.
> >> Is this a bug that lm can't do x ~ y?
> >>
> >> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> >> Copyright (C) 2018 The R Foundation for Statistical Computing
> >> Platform: x86_64-pc-linux-gnu (64-bit)
> >>
> >>  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> >> 101.632, 108.928, 94.08)
> >>  > y = c(1506705739.385, 1506705766.895, 1506705746.293,
> >> 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26,
> >> 1506705761.307, 1506705747.372)
> >>  > m = lm(x ~ y)
> >>  > summary(m)
> >>
> >> Call:
> >> lm(formula = x ~ y)
> >>
> >> Residuals:
> >>   Min   1Q   Median   3Q  Max
> >> -27.0222 -14.9902  -0.6542  14.1938  29.1698
> >>
> >> Coefficients: (1 not defined because of singularities)
> >>  Estimate Std. Error t value Pr(>|t|)
> >> (Intercept)   94.734  6.511   14.55 4.88e-07 *** y
> >> NA NA  NA   NA
> >> ---
> >> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>
> >> Residual standard error: 19.53 on 8 degrees of freedom
> >>
> >>  > summary(lm(y ~ x))
> >>
> >> Call:
> >> lm(formula = y ~ x)
> >>
> >> Residuals:
> >>  Min  1Q  Median  3Q Max
> >> -2.1687 -1.3345 -0.9466  1.3826  2.6551
> >>
> >> Coefficients:
> >>   Estimate Std. Error   t value Pr(>|t|)
> >> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 *** x
> >> 6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> >> ---
> >> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>
> >> Residual standard error: 1.885 on 7 degrees of freedom Multiple
> >> R-squared:  0.9788,    Adjusted R-squared:  0.9758
> >> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ---
> >> This email has been checked for viruses by AVG.
> >> https://www.avg.com
> >>
> >>
> >
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm fails on some large input

2019-04-18 Thread Jeff Newmiller
The fact that you think x~y is interchangeable with y~x suggests to me that you 
will have a difficult time convincing R Core that this is a bug. I recommend 
that you take at leastan upper division college course in linear regression 
first.

On April 18, 2019 9:35:55 AM PDT, Dingyuan Wang  wrote:
>I just want to make a line out of timestamps vs some coordinates, so
>y~x 
>or x~y doesn't matter.
>
>Yes, I know the answer. When trying R, I'm surprised that R can't solve
>
>that either. I first noticed that PostgreSQL can't solve it, and found 
>that they fixed that in pg 12.
>
>https://www.postgresql.org/message-id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org
>
>Therefore I come to ask whether someone know how to fix this in R, or I
>
>must submit it as a bug?
>
>2019/4/18 23:24, Michael Dewey:
>> Perhaps subtract 1506705766 from y?
>> 
>> Saying some other software does it well implies you know what the 
>> _correct_ answer is here but I would question what that means with
>this 
>> sort of data-set.
>> 
>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>>> Hi,
>>>
>>> This input doesn't have any interesting properties except y is unix 
>>> time. Spreadsheets can do this well.
>>> Is this a bug that lm can't do x ~ y?
>>>
>>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>>> Copyright (C) 2018 The R Foundation for Statistical Computing
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>>  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 
>>> 101.632, 108.928, 94.08)
>>>  > y = c(1506705739.385, 1506705766.895, 1506705746.293, 
>>> 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26, 
>>> 1506705761.307, 1506705747.372)
>>>  > m = lm(x ~ y)
>>>  > summary(m)
>>>
>>> Call:
>>> lm(formula = x ~ y)
>>>
>>> Residuals:
>>>   Min   1Q   Median   3Q  Max
>>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>>>
>>> Coefficients: (1 not defined because of singularities)
>>>  Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)   94.734  6.511   14.55 4.88e-07 ***
>>> y NA NA  NA   NA
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>> Residual standard error: 19.53 on 8 degrees of freedom
>>>
>>>  > summary(lm(y ~ x))
>>>
>>> Call:
>>> lm(formula = y ~ x)
>>>
>>> Residuals:
>>>  Min  1Q  Median  3Q Max
>>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>>>
>>> Coefficients:
>>>   Estimate Std. Error   t value Pr(>|t|)
>>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>>> x   6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>> Residual standard error: 1.885 on 7 degrees of freedom
>>> Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
>>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ---
>>> This email has been checked for viruses by AVG.
>>> https://www.avg.com
>>>
>>>
>>
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm fails on some large input

2019-04-18 Thread Dingyuan Wang
I just want to make a line out of timestamps vs some coordinates, so y~x 
or x~y doesn't matter.


Yes, I know the answer. When trying R, I'm surprised that R can't solve 
that either. I first noticed that PostgreSQL can't solve it, and found 
that they fixed that in pg 12.


https://www.postgresql.org/message-id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org

Therefore I come to ask whether someone know how to fix this in R, or I 
must submit it as a bug?


2019/4/18 23:24, Michael Dewey:

Perhaps subtract 1506705766 from y?

Saying some other software does it well implies you know what the 
_correct_ answer is here but I would question what that means with this 
sort of data-set.


On 17/04/2019 07:26, Dingyuan Wang wrote:

Hi,

This input doesn't have any interesting properties except y is unix 
time. Spreadsheets can do this well.

Is this a bug that lm can't do x ~ y?

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

 > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 
101.632, 108.928, 94.08)
 > y = c(1506705739.385, 1506705766.895, 1506705746.293, 
1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26, 
1506705761.307, 1506705747.372)

 > m = lm(x ~ y)
 > summary(m)

Call:
lm(formula = x ~ y)

Residuals:
  Min   1Q   Median   3Q  Max
-27.0222 -14.9902  -0.6542  14.1938  29.1698

Coefficients: (1 not defined because of singularities)
 Estimate Std. Error t value Pr(>|t|)
(Intercept)   94.734  6.511   14.55 4.88e-07 ***
y NA NA  NA   NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.53 on 8 degrees of freedom

 > summary(lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
 Min  1Q  Median  3Q Max
-2.1687 -1.3345 -0.9466  1.3826  2.6551

Coefficients:
  Estimate Std. Error   t value Pr(>|t|)
(Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
x   6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.885 on 7 degrees of freedom
Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
https://www.avg.com






__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm fails on some large input

2019-04-18 Thread Jeff Newmiller
I make a general rule not to stick time values into numerical analysis 
algorithms without first subtracting a reasonable epoch (to obtain difftime) 
and then using as.numeric.POSIXt with the units argument set explicitly so the 
analysis uses numeric values that I can interpret. While the explicit use of 
difftime function does something similar, if any other operations are performed 
on it the units could change again before the inevitable conversion to numeric 
occurs somewhere down the line so I think taking responsibility for the numeric 
conversion myself is less likely to leave surprises.

On April 18, 2019 9:32:09 AM PDT, William Dunlap via R-help 
 wrote:
>This sort of data arises quite easily if you deal with time/dates
>around
>now.  E.g.,
>
>> d <- data.frame(
>+ when = seq(as.POSIXct("2017-09-29 18:22:01"), by="secs", len=10),
>+ measurement = log2(1:10))
>> coef(lm(data=d, measurement ~ when))
>   (Intercept)   when
>2.1791061114716954 NA
>> as.numeric(d$when)[1:2]
>[1] 1506734521 1506734522
>
>There are problems with the time units (seconds vs. hours) if you
>subtract
>off a time because the units of -.POSIXt depend on the data:
>
>> coef(lm(data=d, measurement ~ I(when - min(when
>(Intercept) I(when - min(when))
>0.68327571513124297 0.33240675474232279
>> coef(lm(data=d, measurement ~ I(when - as.POSIXct("2017-09-29
>00:00:00"
>(Intercept) I(when - as.POSIXct("2017-09-29
>00:00:00"))
>   -21978.3837546251634
>1196.6643170736229
>
>
>Hence you have to use difftime and specify the units
>
>> coef(lm(data=d, measurement ~ difftime(when, as.POSIXct("2017-09-29
>00:00:00"), units="secs")))
>  (Intercept)
>  -2.1978383754612696e+04
>difftime(when, as.POSIXct("2017-09-29 00:00:00"), units = "secs")
>   3.3240675474248449e-01
>> coef(lm(data=d, measurement ~ difftime(when, min(when),
>units="secs")))
>  (Intercept) difftime(when, min(when), units =
>"secs")
>  0.68327571513124297
> 0.33240675474232279
>
>
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>
>On Thu, Apr 18, 2019 at 8:24 AM Michael Dewey 
>wrote:
>
>> Perhaps subtract 1506705766 from y?
>>
>> Saying some other software does it well implies you know what the
>> _correct_ answer is here but I would question what that means with
>this
>> sort of data-set.
>>
>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>> > Hi,
>> >
>> > This input doesn't have any interesting properties except y is unix
>> > time. Spreadsheets can do this well.
>> > Is this a bug that lm can't do x ~ y?
>> >
>> > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>> > Copyright (C) 2018 The R Foundation for Statistical Computing
>> > Platform: x86_64-pc-linux-gnu (64-bit)
>> >
>> >  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>> > 101.632, 108.928, 94.08)
>> >  > y = c(1506705739.385, 1506705766.895, 1506705746.293,
>1506705761.873,
>> > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307,
>> > 1506705747.372)
>> >  > m = lm(x ~ y)
>> >  > summary(m)
>> >
>> > Call:
>> > lm(formula = x ~ y)
>> >
>> > Residuals:
>> >   Min   1Q   Median   3Q  Max
>> > -27.0222 -14.9902  -0.6542  14.1938  29.1698
>> >
>> > Coefficients: (1 not defined because of singularities)
>> >  Estimate Std. Error t value Pr(>|t|)
>> > (Intercept)   94.734  6.511   14.55 4.88e-07 ***
>> > y NA NA  NA   NA
>> > ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > Residual standard error: 19.53 on 8 degrees of freedom
>> >
>> >  > summary(lm(y ~ x))
>> >
>> > Call:
>> > lm(formula = y ~ x)
>> >
>> > Residuals:
>> >  Min  1Q  Median  3Q Max
>> > -2.1687 -1.3345 -0.9466  1.3826  2.6551
>> >
>> > Coefficients:
>> >   Estimate Std. Error   t value Pr(>|t|)
>> > (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>> > x   6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>> > ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > Residual standard error: 1.885 on 7 degrees of freedom
>> > Multiple R-squared:  0.9788,Adjusted R-squared:  0.9758
>> > F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ---
>> > This email has been checked for viruses by AVG.
>> > https://www.avg.com
>> >
>> >
>>
>> --
>> Michael
>> http://www.dewey.myzen.co.uk/home.html
>>
>> __

Re: [R] lm fails on some large input

2019-04-18 Thread Fox, John
Dear Peter,

> -Original Message-
> From: peter dalgaard [mailto:pda...@gmail.com]
> Sent: Thursday, April 18, 2019 12:23 PM
> To: Fox, John 
> Cc: Michael Dewey ; Dingyuan Wang
> ; r-help@r-project.org
> Subject: Re: [R] lm fails on some large input
> 
> Um, you need to reverse y and x there. The question was about lm(y ~ x)
> 

Good catch! I missed that in the original posting, and lm() does indeed produce 
the LS solution for the regression of y on x. And, as I'd have expected, the 
naïve approach also fails for the regression of x on y:

> Y <- cbind(1, y)
> b <- solve(t(Y) %*% Y) %*% t(Y) %*% x
Error in solve.default(t(Y) %*% Y) : 
  system is computationally singular: reciprocal condition number = 6.19587e-35

resolving the mystery.

Thanks,
 John

> > X <- cbind(1, y)
> > solve(crossprod(X))
> Error in solve.default(crossprod(X)) :
>   system is computationally singular: reciprocal condition number = 6.19587e-
> 35
> 
> Actually, lm can QR perfectly OK, but it gets caught by its singularity 
> detection:
> 
> > qr <- qr(X, tol=1e-10)
> > qr # without the tol bit, you get same thing but $rank == 1
> $qr
>  y
>  [1,] -3.000 -4.520117e+09
>  [2,]  0.333 -3.426530e+01
>  [3,]  0.333 -2.947103e-02
>  [4,]  0.333  4.252164e-01
>  [5,]  0.333 -3.665468e-01
>  [6,]  0.333 -3.488029e-01
>  [7,]  0.333  2.614064e-01
>  [8,]  0.333  4.086982e-01
>  [9,]  0.333  2.018556e-03
> 
> $rank
> [1] 2
> 
> $qraux
> [1] 1.33 1.571779
> 
> $pivot
> [1] 1 2
> 
> attr(,"class")
> [1] "qr"
> > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632,
> > 108.928, 94.08)
> > qr.coef(qr,x)
>   y
> -2.403345e+09  1.595099e+00
> 
> > lm(x~y)
> 
> Call:
> lm(formula = x ~ y)
> 
> Coefficients:
> (Intercept)y
>   94.73   NA
> 
> > lm(x~y, tol=1e-10)
> 
> Call:
> lm(formula = x ~ y, tol = 1e-10)
> 
> Coefficients:
> (Intercept)y
>  -2.403e+091.595e+00
> 
> > lm(x~I(y-mean(y)))
> 
> Call:
> lm(formula = x ~ I(y - mean(y)))
> 
> Coefficients:
>(Intercept)  I(y - mean(y))
> 94.734   1.595
> 
> 
> > On 18 Apr 2019, at 17:56 , Fox, John  wrote:
> >
> > Dear Michael and Dingyuan Wang,
> >
> >> -Original Message-
> >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> >> Michael Dewey
> >> Sent: Thursday, April 18, 2019 11:25 AM
> >> To: Dingyuan Wang ; r-help@r-project.org
> >> Subject: Re: [R] lm fails on some large input
> >>
> >> Perhaps subtract 1506705766 from y?
> >>
> >> Saying some other software does it well implies you know what the
> >> _correct_ answer is here but I would question what that means with
> >> this sort of data- set.
> >
> > It's rather an interesting problem, though, because the naïve computation of
> the LS solution works:
> >
> > plot(x, y)
> > X <- cbind(1, x)
> > b <- solve(t(X) %*% X) %*% t(X) %*% y
> > b
> > abline(b)
> >
> > That surprised me, because I expected that lm() computation, using the QR
> decomposition, would be more numerically stable.
> >
> > Best,
> > John
> >
> > -
> > John Fox
> > Professor Emeritus
> > McMaster University
> > Hamilton, Ontario, Canada
> > Web: https://socialsciences.mcmaster.ca/jfox/
> >
> >
> >
> >>
> >> On 17/04/2019 07:26, Dingyuan Wang wrote:
> >>> Hi,
> >>>
> >>> This input doesn't have any interesting properties except y is unix
> >>> time. Spreadsheets can do this well.
> >>> Is this a bug that lm can't do x ~ y?
> >>>
> >>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> >>> Copyright (C) 2018 The R Foundation for Statistical Computing
> >>> Platform: x86_64-pc-linux-gnu (64-bit)
> >>>
> >>>> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> >>> 101.632, 108.928, 94.08)  > y = c(1506705739.385, 1506705766.895,
> >>> 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351,
> >>> 1506705756.26, 1506705761.307,
> >>> 1506705747.372)
> >>>> m = lm(x ~ y)
> >>>> summary(m)
> >>>
> >>> Call:
> >>> lm(formula = x ~ y)
> >&g

Re: [R] lm fails on some large input

2019-04-18 Thread Berry, Charles


> On Apr 18, 2019, at 8:24 AM, Michael Dewey  wrote:
> 
> Perhaps subtract 1506705766 from y?

Good advice. Some further notes follow.

One can specify `tol` to have a smaller than default value

e.g.

  m2 <- lm(x ~ y, tol=1e-12)

which is accurate:

  plot(y,x)
  abline(coef=coef(m2))
 

Users of numerical procedures need to be mindful of the default settings of the 
algorithms they use.

As is well known, the use of a too large default for convergence of an 
optimization algorithm can lead to seriously wrong results. There is an example 
described here:

https://science.sciencemag.org/content/296/5575/1945/tab-pdf

One might quibble with the choice of tol=1e-7 (the default in lm.fit), and 64 
bit floating point will support much smaller values. However, there are usually 
statistical issues surrounding fitting highly collinear variables.

So,  `tol = 1e-07` seems more like a feature than a bug.

HTH,

Chuck

> 
> Saying some other software does it well implies you know what the _correct_ 
> answer is here but I would question what that means with this sort of 
> data-set.
> 
> On 17/04/2019 07:26, Dingyuan Wang wrote:
>> Hi,
>> This input doesn't have any interesting properties except y is unix time. 
>> Spreadsheets can do this well.
>> Is this a bug that lm can't do x ~ y?
>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>> Copyright (C) 2018 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 
>> > 108.928, 94.08)
>> > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, 
>> > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, 
>> > 1506705747.372)
>> > m = lm(x ~ y)
>> > summary(m)
>> Call:
>> lm(formula = x ~ y)
>> Residuals:
>>  Min   1Q   Median   3Q  Max
>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>> Coefficients: (1 not defined because of singularities)
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept)   94.734  6.511   14.55 4.88e-07 ***
>> y NA NA  NA   NA
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> Residual standard error: 19.53 on 8 degrees of freedom
>> > summary(lm(y ~ x))
>> Call:
>> lm(formula = y ~ x)
>> Residuals:
>> Min  1Q  Median  3Q Max
>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>> Coefficients:
>>  Estimate Std. Error   t value Pr(>|t|)
>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>> x   6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> Residual standard error: 1.885 on 7 degrees of freedom
>> Multiple R-squared:  0.9788,Adjusted R-squared:  0.9758
>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> ---
>> This email has been checked for viruses by AVG.
>> https://www.avg.com
> 
> -- 
> Michael
> http://www.dewey.myzen.co.uk/home.html
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm fails on some large input

2019-04-18 Thread William Dunlap via R-help
This sort of data arises quite easily if you deal with time/dates around
now.  E.g.,

> d <- data.frame(
+ when = seq(as.POSIXct("2017-09-29 18:22:01"), by="secs", len=10),
+ measurement = log2(1:10))
> coef(lm(data=d, measurement ~ when))
   (Intercept)   when
2.1791061114716954 NA
> as.numeric(d$when)[1:2]
[1] 1506734521 1506734522

There are problems with the time units (seconds vs. hours) if you subtract
off a time because the units of -.POSIXt depend on the data:

> coef(lm(data=d, measurement ~ I(when - min(when
(Intercept) I(when - min(when))
0.68327571513124297 0.33240675474232279
> coef(lm(data=d, measurement ~ I(when - as.POSIXct("2017-09-29
00:00:00"
(Intercept) I(when - as.POSIXct("2017-09-29
00:00:00"))
   -21978.3837546251634
1196.6643170736229


Hence you have to use difftime and specify the units

> coef(lm(data=d, measurement ~ difftime(when, as.POSIXct("2017-09-29
00:00:00"), units="secs")))
  (Intercept)
  -2.1978383754612696e+04
difftime(when, as.POSIXct("2017-09-29 00:00:00"), units = "secs")
   3.3240675474248449e-01
> coef(lm(data=d, measurement ~ difftime(when, min(when), units="secs")))
  (Intercept) difftime(when, min(when), units =
"secs")
  0.68327571513124297
 0.33240675474232279



Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Apr 18, 2019 at 8:24 AM Michael Dewey 
wrote:

> Perhaps subtract 1506705766 from y?
>
> Saying some other software does it well implies you know what the
> _correct_ answer is here but I would question what that means with this
> sort of data-set.
>
> On 17/04/2019 07:26, Dingyuan Wang wrote:
> > Hi,
> >
> > This input doesn't have any interesting properties except y is unix
> > time. Spreadsheets can do this well.
> > Is this a bug that lm can't do x ~ y?
> >
> > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> > Copyright (C) 2018 The R Foundation for Statistical Computing
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> >  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> > 101.632, 108.928, 94.08)
> >  > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873,
> > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307,
> > 1506705747.372)
> >  > m = lm(x ~ y)
> >  > summary(m)
> >
> > Call:
> > lm(formula = x ~ y)
> >
> > Residuals:
> >   Min   1Q   Median   3Q  Max
> > -27.0222 -14.9902  -0.6542  14.1938  29.1698
> >
> > Coefficients: (1 not defined because of singularities)
> >  Estimate Std. Error t value Pr(>|t|)
> > (Intercept)   94.734  6.511   14.55 4.88e-07 ***
> > y NA NA  NA   NA
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 19.53 on 8 degrees of freedom
> >
> >  > summary(lm(y ~ x))
> >
> > Call:
> > lm(formula = y ~ x)
> >
> > Residuals:
> >  Min  1Q  Median  3Q Max
> > -2.1687 -1.3345 -0.9466  1.3826  2.6551
> >
> > Coefficients:
> >   Estimate Std. Error   t value Pr(>|t|)
> > (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
> > x   6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 1.885 on 7 degrees of freedom
> > Multiple R-squared:  0.9788,Adjusted R-squared:  0.9758
> > F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ---
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm fails on some large input

2019-04-18 Thread peter dalgaard
Um, you need to reverse y and x there. The question was about lm(y ~ x)

> X <- cbind(1, y)
> solve(crossprod(X))
Error in solve.default(crossprod(X)) : 
  system is computationally singular: reciprocal condition number = 6.19587e-35

Actually, lm can QR perfectly OK, but it gets caught by its singularity 
detection:

> qr <- qr(X, tol=1e-10)
> qr # without the tol bit, you get same thing but $rank == 1
$qr
 y
 [1,] -3.000 -4.520117e+09
 [2,]  0.333 -3.426530e+01
 [3,]  0.333 -2.947103e-02
 [4,]  0.333  4.252164e-01
 [5,]  0.333 -3.665468e-01
 [6,]  0.333 -3.488029e-01
 [7,]  0.333  2.614064e-01
 [8,]  0.333  4.086982e-01
 [9,]  0.333  2.018556e-03

$rank
[1] 2

$qraux
[1] 1.33 1.571779

$pivot
[1] 1 2

attr(,"class")
[1] "qr"
> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 
> 108.928, 94.08)
> qr.coef(qr,x)
  y 
-2.403345e+09  1.595099e+00 

> lm(x~y)

Call:
lm(formula = x ~ y)

Coefficients:
(Intercept)y  
  94.73   NA  

> lm(x~y, tol=1e-10)

Call:
lm(formula = x ~ y, tol = 1e-10)

Coefficients:
(Intercept)y  
 -2.403e+091.595e+00  

> lm(x~I(y-mean(y)))

Call:
lm(formula = x ~ I(y - mean(y)))

Coefficients:
   (Intercept)  I(y - mean(y))  
94.734   1.595  


> On 18 Apr 2019, at 17:56 , Fox, John  wrote:
> 
> Dear Michael and Dingyuan Wang,
> 
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael
>> Dewey
>> Sent: Thursday, April 18, 2019 11:25 AM
>> To: Dingyuan Wang ; r-help@r-project.org
>> Subject: Re: [R] lm fails on some large input
>> 
>> Perhaps subtract 1506705766 from y?
>> 
>> Saying some other software does it well implies you know what the _correct_
>> answer is here but I would question what that means with this sort of data-
>> set.
> 
> It's rather an interesting problem, though, because the naïve computation of 
> the LS solution works:
> 
> plot(x, y)
> X <- cbind(1, x)
> b <- solve(t(X) %*% X) %*% t(X) %*% y
> b
> abline(b)
> 
> That surprised me, because I expected that lm() computation, using the QR 
> decomposition, would be more numerically stable.
> 
> Best,
> John
> 
> -
> John Fox
> Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> Web: https://socialsciences.mcmaster.ca/jfox/
> 
> 
> 
>> 
>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>>> Hi,
>>> 
>>> This input doesn't have any interesting properties except y is unix
>>> time. Spreadsheets can do this well.
>>> Is this a bug that lm can't do x ~ y?
>>> 
>>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>>> Copyright (C) 2018 The R Foundation for Statistical Computing
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> 
>>>> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>>> 101.632, 108.928, 94.08)  > y = c(1506705739.385, 1506705766.895,
>>> 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351,
>>> 1506705756.26, 1506705761.307,
>>> 1506705747.372)
>>>> m = lm(x ~ y)
>>>> summary(m)
>>> 
>>> Call:
>>> lm(formula = x ~ y)
>>> 
>>> Residuals:
>>>  Min   1Q   Median   3Q  Max
>>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>>> 
>>> Coefficients: (1 not defined because of singularities)
>>> Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)   94.734  6.511   14.55 4.88e-07 *** y
>>> NA NA  NA   NA
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>> 
>>> Residual standard error: 19.53 on 8 degrees of freedom
>>> 
>>>> summary(lm(y ~ x))
>>> 
>>> Call:
>>> lm(formula = y ~ x)
>>> 
>>> Residuals:
>>> Min  1Q  Median  3Q Max
>>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>>> 
>>> Coefficients:
>>>  Estimate Std. Error   t value Pr(>|t|)
>>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 *** x
>>> 6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>> 
>>> Residual standard error: 1.885 on 7 degrees of freedom Multiple
>>> R-squared:  0.9788,Adjusted R-squared:  0.9758
>>> F-statistic: 323.3 on 

Re: [R] lm fails on some large input

2019-04-18 Thread Fox, John
Dear Michael and Dingyuan Wang,

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael
> Dewey
> Sent: Thursday, April 18, 2019 11:25 AM
> To: Dingyuan Wang ; r-help@r-project.org
> Subject: Re: [R] lm fails on some large input
> 
> Perhaps subtract 1506705766 from y?
> 
> Saying some other software does it well implies you know what the _correct_
> answer is here but I would question what that means with this sort of data-
> set.

It's rather an interesting problem, though, because the naïve computation of 
the LS solution works:

plot(x, y)
X <- cbind(1, x)
b <- solve(t(X) %*% X) %*% t(X) %*% y
b
abline(b)

That surprised me, because I expected that lm() computation, using the QR 
decomposition, would be more numerically stable.

Best,
 John

-
John Fox
Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: https://socialsciences.mcmaster.ca/jfox/



> 
> On 17/04/2019 07:26, Dingyuan Wang wrote:
> > Hi,
> >
> > This input doesn't have any interesting properties except y is unix
> > time. Spreadsheets can do this well.
> > Is this a bug that lm can't do x ~ y?
> >
> > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> > Copyright (C) 2018 The R Foundation for Statistical Computing
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> >  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> > 101.632, 108.928, 94.08)  > y = c(1506705739.385, 1506705766.895,
> > 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351,
> > 1506705756.26, 1506705761.307,
> > 1506705747.372)
> >  > m = lm(x ~ y)
> >  > summary(m)
> >
> > Call:
> > lm(formula = x ~ y)
> >
> > Residuals:
> >   Min   1Q   Median   3Q  Max
> > -27.0222 -14.9902  -0.6542  14.1938  29.1698
> >
> > Coefficients: (1 not defined because of singularities)
> >      Estimate Std. Error t value Pr(>|t|)
> > (Intercept)   94.734  6.511   14.55 4.88e-07 *** y
> > NA NA  NA   NA
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 19.53 on 8 degrees of freedom
> >
> >  > summary(lm(y ~ x))
> >
> > Call:
> > lm(formula = y ~ x)
> >
> > Residuals:
> >      Min  1Q  Median  3Q Max
> > -2.1687 -1.3345 -0.9466  1.3826  2.6551
> >
> > Coefficients:
> >   Estimate Std. Error   t value Pr(>|t|)
> > (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 *** x
> > 6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 1.885 on 7 degrees of freedom Multiple
> > R-squared:  0.9788,    Adjusted R-squared:  0.9758
> > F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ---
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> >
> 
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm fails on some large input

2019-04-18 Thread Michael Dewey

Perhaps subtract 1506705766 from y?

Saying some other software does it well implies you know what the 
_correct_ answer is here but I would question what that means with this 
sort of data-set.


On 17/04/2019 07:26, Dingyuan Wang wrote:

Hi,

This input doesn't have any interesting properties except y is unix 
time. Spreadsheets can do this well.

Is this a bug that lm can't do x ~ y?

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

 > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 
101.632, 108.928, 94.08)
 > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, 
1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, 
1506705747.372)

 > m = lm(x ~ y)
 > summary(m)

Call:
lm(formula = x ~ y)

Residuals:
  Min   1Q   Median   3Q  Max
-27.0222 -14.9902  -0.6542  14.1938  29.1698

Coefficients: (1 not defined because of singularities)
     Estimate Std. Error t value Pr(>|t|)
(Intercept)   94.734  6.511   14.55 4.88e-07 ***
y NA NA  NA   NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.53 on 8 degrees of freedom

 > summary(lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
     Min  1Q  Median  3Q Max
-2.1687 -1.3345 -0.9466  1.3826  2.6551

Coefficients:
  Estimate Std. Error   t value Pr(>|t|)
(Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
x   6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.885 on 7 degrees of freedom
Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
https://www.avg.com




--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lm fails on some large input

2019-04-18 Thread Dingyuan Wang

Hi,

This input doesn't have any interesting properties except y is unix 
time. Spreadsheets can do this well.

Is this a bug that lm can't do x ~ y?

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 
101.632, 108.928, 94.08)
> y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, 
1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, 
1506705747.372)

> m = lm(x ~ y)
> summary(m)

Call:
lm(formula = x ~ y)

Residuals:
 Min   1Q   Median   3Q  Max
-27.0222 -14.9902  -0.6542  14.1938  29.1698

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept)   94.734  6.511   14.55 4.88e-07 ***
y NA NA  NA   NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.53 on 8 degrees of freedom

> summary(lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
Min  1Q  Median  3Q Max
-2.1687 -1.3345 -0.9466  1.3826  2.6551

Coefficients:
 Estimate Std. Error   t value Pr(>|t|)
(Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
x   6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.885 on 7 degrees of freedom
Multiple R-squared:  0.9788,Adjusted R-squared:  0.9758
F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.