Re: [R] Regression with factor having1 level

2016-03-11 Thread peter dalgaard

> On 11 Mar 2016, at 23:48 , David Winsemius  wrote:
> 
>> 
>> On Mar 11, 2016, at 2:07 PM, peter dalgaard  wrote:
>> 
>> 
>>> On 11 Mar 2016, at 17:56 , David Winsemius  wrote:
>>> 
 
 On Mar 11, 2016, at 12:48 AM, peter dalgaard  wrote:
 
 
> On 11 Mar 2016, at 08:25 , David Winsemius  wrote:
>> 
 ...
>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), 
>>> x3=rnorm(10))
>>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>> contrasts can be applied
> 
> Yes, and the error appears to come from `model.matrix`:
> 
>> model.matrix(y~x1+factor(x2)+x3, dfrm)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
> contrasts can be applied only to factors with 2 or more levels
> 
 
 Actually not. The above is because you use an explicit factor(x2). The 
 actual smoking gun is this line in lm()
 
 mf$drop.unused.levels <- TRUE
>>> 
>>> It's possible that modifying model.matrix to allow single level factors 
>>> would then bump up against that check, but  at the moment the traceback() 
>>> from an error generated with data that has a single level factor and no 
>>> call to factor in the formula still implicates code in model.matrix:
>> 
>> You're missing the point: model.matrix has a beef with 1-level factors, not 
>> with 2-level factors of which one level happens to be absent, which is what 
>> this thread was originally about. It is lm that via model.frame with 
>> drop.unused.levels=TRUE converts the latter factors to the former.
>> 
> 
> I guess I did miss the point. Apologies for being obtuse. I thought that a 
> one level factor would have been "aliased out" when model.matrix "realized" 
> that it was collinear with the intercept. (Further apologies for my 
> projection of cognitive capacites on a machine.) Are you saying it remains 
> desirable that an error be thrown rather than reporting an NA for 
> coefficients and issuing a warning?
> 

For the moment I was just analyzing where this came from. Intuitively I'd be 
leaning in the opposite direction -- dropping factor levels automatically is 
usually a bad thing.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-11 Thread David Winsemius

> On Mar 11, 2016, at 2:07 PM, peter dalgaard  wrote:
> 
> 
>> On 11 Mar 2016, at 17:56 , David Winsemius  wrote:
>> 
>>> 
>>> On Mar 11, 2016, at 12:48 AM, peter dalgaard  wrote:
>>> 
>>> 
 On 11 Mar 2016, at 08:25 , David Winsemius  wrote:
> 
>>> ...
>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), 
>> x3=rnorm(10))
>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
> contrasts can be applied
 
 Yes, and the error appears to come from `model.matrix`:
 
> model.matrix(y~x1+factor(x2)+x3, dfrm)
 Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
 contrasts can be applied only to factors with 2 or more levels
 
>>> 
>>> Actually not. The above is because you use an explicit factor(x2). The 
>>> actual smoking gun is this line in lm()
>>> 
>>> mf$drop.unused.levels <- TRUE
>> 
>> It's possible that modifying model.matrix to allow single level factors 
>> would then bump up against that check, but  at the moment the traceback() 
>> from an error generated with data that has a single level factor and no call 
>> to factor in the formula still implicates code in model.matrix:
> 
> You're missing the point: model.matrix has a beef with 1-level factors, not 
> with 2-level factors of which one level happens to be absent, which is what 
> this thread was originally about. It is lm that via model.frame with 
> drop.unused.levels=TRUE converts the latter factors to the former.
> 

I guess I did miss the point. Apologies for being obtuse. I thought that a one 
level factor would have been "aliased out" when model.matrix "realized" that it 
was collinear with the intercept. (Further apologies for my projection of 
cognitive capacites on a machine.) Are you saying it remains desirable that an 
error be thrown rather than reporting an NA for coefficients and issuing a 
warning?

-- 
David.


> -pd 
> 
> 
>> 
>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=factor(TRUE), x3=rnorm(10))
>>> lm(y~x1+x2+x3, dfrm)
>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>> contrasts can be applied only to factors with 2 or more levels
>>> traceback()
>> 5: stop("contrasts can be applied only to factors with 2 or more levels")
>> 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
>> 3: model.matrix.default(mt, mf, contrasts)
>> 2: model.matrix(mt, mf, contrasts)
>> 1: lm(y ~ x1 + x2 + x3, dfrm)
>> 
>> -- 
>> David.
>> 
>>> 
>>> which someone must have thought was a good idea at some point
>>> 
>>> model.matrix itself is quite happy to leave factors alone and let 
>>> subsequent code sort out any singularities, e.g.
>>> 
 model.matrix(y~x1+x2, data=df[1:2,])
>>> (Intercept) x1 x2B
>>> 1   1  1   0
>>> 2   1  1   0
>>> attr(,"assign")
>>> [1] 0 1 2
>>> attr(,"contrasts")
>>> attr(,"contrasts")$x2
>>> [1] "contr.treatment"
>>> 
>>> 
>>> 
> model.matrix(y~x1+x2+x3, dfrm)
 (Intercept)  x1 x2TRUE x3
 11  0.04887847  1 -0.4199628
 21 -1.04786688  1  1.3947923
 31 -0.34896007  1 -2.1873666
 41 -0.08866061  1  0.1204129
 51 -0.4366  1 -1.6631057
 61 -0.83449110  1  1.1631801
 71 -0.67887823  1  0.3207544
 81 -1.12206068  1  0.6012040
 91  0.05116683  1  0.3598696
 10   1  1.74413583  1  0.3608478
 attr(,"assign")
 [1] 0 1 2 3
 attr(,"contrasts")
 attr(,"contrasts")$x2
 [1] "contr.treatment"
 
 -- 
 
 David Winsemius
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> -- 
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: pd@cbs.dk  Priv: pda...@gmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 
> 
> 
> 
> 
> 
> 
> 
> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Regression with factor having1 level

2016-03-11 Thread peter dalgaard

> On 11 Mar 2016, at 17:56 , David Winsemius  wrote:
> 
>> 
>> On Mar 11, 2016, at 12:48 AM, peter dalgaard  wrote:
>> 
>> 
>>> On 11 Mar 2016, at 08:25 , David Winsemius  wrote:
 
>> ...
> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), 
> x3=rnorm(10))
> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
 Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
 contrasts can be applied
>>> 
>>> Yes, and the error appears to come from `model.matrix`:
>>> 
 model.matrix(y~x1+factor(x2)+x3, dfrm)
>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>>> contrasts can be applied only to factors with 2 or more levels
>>> 
>> 
>> Actually not. The above is because you use an explicit factor(x2). The 
>> actual smoking gun is this line in lm()
>> 
>> mf$drop.unused.levels <- TRUE
> 
> It's possible that modifying model.matrix to allow single level factors would 
> then bump up against that check, but  at the moment the traceback() from an 
> error generated with data that has a single level factor and no call to 
> factor in the formula still implicates code in model.matrix:

You're missing the point: model.matrix has a beef with 1-level factors, not 
with 2-level factors of which one level happens to be absent, which is what 
this thread was originally about. It is lm that via model.frame with 
drop.unused.levels=TRUE converts the latter factors to the former.

-pd 


> 
>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=factor(TRUE), x3=rnorm(10))
>> lm(y~x1+x2+x3, dfrm)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>  contrasts can be applied only to factors with 2 or more levels
>> traceback()
> 5: stop("contrasts can be applied only to factors with 2 or more levels")
> 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
> 3: model.matrix.default(mt, mf, contrasts)
> 2: model.matrix(mt, mf, contrasts)
> 1: lm(y ~ x1 + x2 + x3, dfrm)
> 
> -- 
> David.
> 
>> 
>> which someone must have thought was a good idea at some point
>> 
>> model.matrix itself is quite happy to leave factors alone and let subsequent 
>> code sort out any singularities, e.g.
>> 
>>> model.matrix(y~x1+x2, data=df[1:2,])
>> (Intercept) x1 x2B
>> 1   1  1   0
>> 2   1  1   0
>> attr(,"assign")
>> [1] 0 1 2
>> attr(,"contrasts")
>> attr(,"contrasts")$x2
>> [1] "contr.treatment"
>> 
>> 
>> 
 model.matrix(y~x1+x2+x3, dfrm)
>>> (Intercept)  x1 x2TRUE x3
>>> 11  0.04887847  1 -0.4199628
>>> 21 -1.04786688  1  1.3947923
>>> 31 -0.34896007  1 -2.1873666
>>> 41 -0.08866061  1  0.1204129
>>> 51 -0.4366  1 -1.6631057
>>> 61 -0.83449110  1  1.1631801
>>> 71 -0.67887823  1  0.3207544
>>> 81 -1.12206068  1  0.6012040
>>> 91  0.05116683  1  0.3598696
>>> 10   1  1.74413583  1  0.3608478
>>> attr(,"assign")
>>> [1] 0 1 2 3
>>> attr(,"contrasts")
>>> attr(,"contrasts")$x2
>>> [1] "contr.treatment"
>>> 
>>> -- 
>>> 
>>> David Winsemius
>>> Alameda, CA, USA
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> -- 
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd@cbs.dk  Priv: pda...@gmail.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> David Winsemius
> Alameda, CA, USA

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-11 Thread David Winsemius

> On Mar 11, 2016, at 12:48 AM, peter dalgaard  wrote:
> 
> 
>> On 11 Mar 2016, at 08:25 , David Winsemius  wrote:
>>> 
> ...
 dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), 
 x3=rnorm(10))
 lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>>> contrasts can be applied
>> 
>> Yes, and the error appears to come from `model.matrix`:
>> 
>>> model.matrix(y~x1+factor(x2)+x3, dfrm)
>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>> contrasts can be applied only to factors with 2 or more levels
>> 
> 
> Actually not. The above is because you use an explicit factor(x2). The actual 
> smoking gun is this line in lm()
> 
> mf$drop.unused.levels <- TRUE

It's possible that modifying model.matrix to allow single level factors would 
then bump up against that check, but  at the moment the traceback() from an 
error generated with data that has a single level factor and no call to factor 
in the formula still implicates code in model.matrix:

> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=factor(TRUE), x3=rnorm(10))
> lm(y~x1+x2+x3, dfrm)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
> traceback()
5: stop("contrasts can be applied only to factors with 2 or more levels")
4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
3: model.matrix.default(mt, mf, contrasts)
2: model.matrix(mt, mf, contrasts)
1: lm(y ~ x1 + x2 + x3, dfrm)

-- 
David.

> 
> which someone must have thought was a good idea at some point
> 
> model.matrix itself is quite happy to leave factors alone and let subsequent 
> code sort out any singularities, e.g.
> 
>> model.matrix(y~x1+x2, data=df[1:2,])
>  (Intercept) x1 x2B
> 1   1  1   0
> 2   1  1   0
> attr(,"assign")
> [1] 0 1 2
> attr(,"contrasts")
> attr(,"contrasts")$x2
> [1] "contr.treatment"
> 
> 
> 
>>> model.matrix(y~x1+x2+x3, dfrm)
>>  (Intercept)  x1 x2TRUE x3
>> 11  0.04887847  1 -0.4199628
>> 21 -1.04786688  1  1.3947923
>> 31 -0.34896007  1 -2.1873666
>> 41 -0.08866061  1  0.1204129
>> 51 -0.4366  1 -1.6631057
>> 61 -0.83449110  1  1.1631801
>> 71 -0.67887823  1  0.3207544
>> 81 -1.12206068  1  0.6012040
>> 91  0.05116683  1  0.3598696
>> 10   1  1.74413583  1  0.3608478
>> attr(,"assign")
>> [1] 0 1 2 3
>> attr(,"contrasts")
>> attr(,"contrasts")$x2
>> [1] "contr.treatment"
>> 
>> -- 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 
> 
> 
> 
> 
> 
> 
> 
> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-11 Thread Robert McGehee
Hi,
In case this is helpful for anyone, I think I've coded a satisfactory
function answering my problem (of handling formulas containing 1-level
factors) by hacking liberally at the model.matrix code to remove any
model terms for which the contrast fails. As it's a problem I've come
across a lot (since my data frames have factors and lots of missing
values), adding support for 1-level factors might be a nice item for
the R Wishlist. I suppose a key question is, does anyone ever _want_
to see the error "contrasts can be applied only to factors with 2 or
more levels", or should the contrasts function just add a column of
all zeros (or ones) to the design matrix and let the modelling
functions handle that the same way it does any other zero-variance
term?

Anyway, my function below:

lmresid <- function(formula, data) {
mf <- model.frame(formula, data=data, na.action=na.exclude)
omit <- attr(mf, "na.action")
t <- terms(mf)
contr.funs <- as.character(getOption("contrasts"))
namD <- names(mf)
for (i in namD) if (is.character(mf[[i]]))
mf[[i]] <- factor(mf[[i]])
isF <- vapply(mf, function(x) is.factor(x) || is.logical(x), NA)
isF[1] <- FALSE
isOF <- vapply(mf, is.ordered, NA)
for (nn in namD[isF])
if (is.null(attr(mf[[nn]], "contrasts"))) {
noCntr <- try(contrasts(mf[[nn]]) <- contr.funs[1 +
isOF[nn]], silent=TRUE)
if (inherits(noCntr, "try-error")) {   # Remove term
from model on error
mf[[nn]] <- NULL
t <- terms(update(t, as.formula(paste("~ . -", nn))), data=mf)
}
}
ans <- .External2(stats:::C_modelmatrix, t, mf)
r   <- .lm.fit(ans, mf[[1]])$residual
stats:::naresid.exclude(omit, r)
}

## Note that lmresid now returns the same values as resid with the
## 1-level factor removed.
df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
x2=factor(c("A","A","A","A","B")))
lmresid(y~x1+x2, data=df)
resid(lm(y~x1, data=df, na.action=na.exclude))

--Robert

PS, Peter, wasn't sure if you also meant to add comments, but they
didn't come through.


On Fri, Mar 11, 2016 at 3:40 AM, peter dalgaard  wrote:
>
>> On 11 Mar 2016, at 02:03 , Robert McGehee  wrote:
>>
>>> df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
>> x2=factor(c("A","A","A","A","B")))
>>> resid(lm(y~x1+x2, data=df, na.action=na.exclude)
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
>
>
>
>
>
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-11 Thread peter dalgaard
The one you cite must have been due to fat-fingering (send instead of delete), 
but there was a later followup to David, w/copy to r-help.

-pd

On 11 Mar 2016, at 16:03 , Robert McGehee  wrote:

> 
> PS, Peter, wasn't sure if you also meant to add comments, but they
> didn't come through.
> 
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-11 Thread peter dalgaard

> On 11 Mar 2016, at 02:03 , Robert McGehee  wrote:
> 
>> df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
> x2=factor(c("A","A","A","A","B")))
>> resid(lm(y~x1+x2, data=df, na.action=na.exclude)

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-11 Thread peter dalgaard

> On 11 Mar 2016, at 08:25 , David Winsemius  wrote:
>> 
...
>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), 
>>> x3=rnorm(10))
>>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>> contrasts can be applied
> 
> Yes, and the error appears to come from `model.matrix`:
> 
>> model.matrix(y~x1+factor(x2)+x3, dfrm)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>  contrasts can be applied only to factors with 2 or more levels
> 

Actually not. The above is because you use an explicit factor(x2). The actual 
smoking gun is this line in lm()

mf$drop.unused.levels <- TRUE

which someone must have thought was a good idea at some point

model.matrix itself is quite happy to leave factors alone and let subsequent 
code sort out any singularities, e.g.

> model.matrix(y~x1+x2, data=df[1:2,])
  (Intercept) x1 x2B
1   1  1   0
2   1  1   0
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$x2
[1] "contr.treatment"



>> model.matrix(y~x1+x2+x3, dfrm)
>   (Intercept)  x1 x2TRUE x3
> 11  0.04887847  1 -0.4199628
> 21 -1.04786688  1  1.3947923
> 31 -0.34896007  1 -2.1873666
> 41 -0.08866061  1  0.1204129
> 51 -0.4366  1 -1.6631057
> 61 -0.83449110  1  1.1631801
> 71 -0.67887823  1  0.3207544
> 81 -1.12206068  1  0.6012040
> 91  0.05116683  1  0.3598696
> 10   1  1.74413583  1  0.3608478
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$x2
> [1] "contr.treatment"
> 
> -- 
> 
> David Winsemius
> Alameda, CA, USA
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-10 Thread David Winsemius

> On Mar 10, 2016, at 5:45 PM, Nordlund, Dan (DSHS/RDA) <nord...@dshs.wa.gov> 
> wrote:
> 
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
>> Winsemius
>> Sent: Thursday, March 10, 2016 4:39 PM
>> To: Robert McGehee
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Regression with factor having1 level
>> 
>> 
>>> On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcge...@gmail.com>
>> wrote:
>>> 
>>> Hello R-helpers,
>>> I'd like a function that given an arbitrary formula and a data frame
>>> returns the residual of the dependent variable,and maintains all NA values.
>> 
>> What does "maintains all NA values" actually mean?
>>> 
>>> Here's an example that will give me what I want if my formula is
>>> y~x1+x2+x3 and my data frame is df:
>>> 
>>> resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
>>> 
>>> Here's the catch, I do not want my function to ever fail due to a
>>> factor with only one level. A one-level factor may appear because 1)
>>> the user passed it in, or 2) (more common) only one factor in a term
>>> is left after na.exclude removes the other NA values.
>>> 
>>> Here is the error I would get
>> 
>> From what code?
>> 
>> 
>>> above if one of the terms was a factor with one level:
>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>> contrasts can be applied only to factors with 2 or more levels
>> 
>> Unable to create that error with the actions you decribe but to not actually
>> offer in coded form:
>> 
>> 
>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
>>> lm(y~x1+x2+x3, dfrm)
>> 
>> Call:
>> lm(formula = y ~ x1 + x2 + x3, data = dfrm)
>> 
>> Coefficients:
>> (Intercept)   x1   x2TRUE   x3
>>   -0.16274 -0.30032   NA -0.09093
>> 
>>> resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
>>  1   2   3   4   5   6
>> -0.16097245  0.65408508 -0.70098223 -0.15360434  1.26027872  0.55752239
>>  7   8   9  10
>> -0.05965653 -2.17480605  1.42917190 -0.65103650
>> 
>>> 
>> 
>> 
>>> Instead of giving me an error, I'd like the function to do just what
>>> lm() normally does when it sees a variable with no variance, ignore
>>> the variable (coefficient is NA) and continue to regress out all the other
>> variables.
>>> Thus if 'x2' is a factor with one variable in the above example, I'd
>>> like the function to return the result of:
>>> resid(lm(y~x1+x3, data=df, na.action=na.exclude)) Can anyone provide
>>> me a straight forward recommendation for how to do this?
>>> I feel like it should be easy, but I'm honestly stuck, and my Google
>>> searching for this hasn't gotten anywhere. The key is that I'd like
>>> the solution to be generic enough to work with an arbitrary linear
>>> formula, and not substantially kludgy (like trying ever combination of
>>> regressions terms until one works) as I'll be running this a lot on
>>> big data sets and don't want my computation time swamped by running
>>> unnecessary regressions or checking for number of factors after removing
>> NAs.
>>> 
>>> Thanks in advance!
>>> --Robert
>>> 
>>> 
>>> PS. The Google search feature in the R-help archives appears to be down:
>>> http://tolstoy.newcastle.edu.au/R/
>> 
>> It's working for me.
>> 
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
> 
> I agree that what is wanted is not clear.  However, if dfrm is created with 
> x2 as a factor, then you get the error message that the OP mentions when you 
> run the regression.
> 
>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), 
>> x3=rnorm(10))
>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>  contrasts can be applied

Yes, and the error appears to come from `model.matrix`:

> model.matrix(y~x1+factor(x2)+x3, dfrm)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

> model.matrix(y~x1+x2+x3, dfrm)
   (Intercept)  x1 x2TRUE x3
11  0.0488

Re: [R] Regression with factor having1 level

2016-03-10 Thread Nordlund, Dan (DSHS/RDA)
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> Winsemius
> Sent: Thursday, March 10, 2016 4:39 PM
> To: Robert McGehee
> Cc: r-help@r-project.org
> Subject: Re: [R] Regression with factor having1 level
> 
> 
> > On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcge...@gmail.com>
> wrote:
> >
> > Hello R-helpers,
> > I'd like a function that given an arbitrary formula and a data frame
> > returns the residual of the dependent variable,and maintains all NA values.
> 
> What does "maintains all NA values" actually mean?
> >
> > Here's an example that will give me what I want if my formula is
> > y~x1+x2+x3 and my data frame is df:
> >
> > resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
> >
> > Here's the catch, I do not want my function to ever fail due to a
> > factor with only one level. A one-level factor may appear because 1)
> > the user passed it in, or 2) (more common) only one factor in a term
> > is left after na.exclude removes the other NA values.
> >
> > Here is the error I would get
> 
> From what code?
> 
> 
> > above if one of the terms was a factor with one level:
> > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> >  contrasts can be applied only to factors with 2 or more levels
> 
> Unable to create that error with the actions you decribe but to not actually
> offer in coded form:
> 
> 
> > dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
> > lm(y~x1+x2+x3, dfrm)
> 
> Call:
> lm(formula = y ~ x1 + x2 + x3, data = dfrm)
> 
> Coefficients:
> (Intercept)   x1   x2TRUE   x3
>-0.16274 -0.30032   NA -0.09093
> 
> > resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
>   1   2   3   4   5   6
> -0.16097245  0.65408508 -0.70098223 -0.15360434  1.26027872  0.55752239
>   7   8   9  10
> -0.05965653 -2.17480605  1.42917190 -0.65103650
> 
> >
> 
> 
> > Instead of giving me an error, I'd like the function to do just what
> > lm() normally does when it sees a variable with no variance, ignore
> > the variable (coefficient is NA) and continue to regress out all the other
> variables.
> > Thus if 'x2' is a factor with one variable in the above example, I'd
> > like the function to return the result of:
> > resid(lm(y~x1+x3, data=df, na.action=na.exclude)) Can anyone provide
> > me a straight forward recommendation for how to do this?
> > I feel like it should be easy, but I'm honestly stuck, and my Google
> > searching for this hasn't gotten anywhere. The key is that I'd like
> > the solution to be generic enough to work with an arbitrary linear
> > formula, and not substantially kludgy (like trying ever combination of
> > regressions terms until one works) as I'll be running this a lot on
> > big data sets and don't want my computation time swamped by running
> > unnecessary regressions or checking for number of factors after removing
> NAs.
> >
> > Thanks in advance!
> > --Robert
> >
> >
> > PS. The Google search feature in the R-help archives appears to be down:
> > http://tolstoy.newcastle.edu.au/R/
> 
> It's working for me.
> 
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 

I agree that what is wanted is not clear.  However, if dfrm is created with x2 
as a factor, then you get the error message that the OP mentions when you run 
the regression.

> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), 
> x3=rnorm(10))
> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied


Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-10 Thread Robert McGehee
Here's an example for clarity:

> df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
x2=factor(c("A","A","A","A","B")))
> resid(lm(y~x1+x2, data=df, na.action=na.exclude)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels

Note that the x2 factor variable contains two levels, but the "B" level is
excluded in the regression due to the NA value in x1. Hence the error.

Instead of the above error, I would like a function that returns the
residual of the regression without the offending term, which in this case
would be equivalent to:
> resid(lm(y~x1, data=df, na.action=na.exclude)
 1  2  3  4  5
-1  1 -1  1 NA

Note the 5th term returns an NA as there is an NA in the x1 independent
variable, which was what I had meant by maintain NAs.

I'm currently leaning towards rewriting model.matrix.default so that it
removes offending terms rather than give an error, but if someone has done
this already (or something more elegant), that would of course be preferred
:)
--Robert

On Thu, Mar 10, 2016 at 7:39 PM, David Winsemius 
wrote:

>
> > On Mar 10, 2016, at 2:00 PM, Robert McGehee  wrote:
> >
> > Hello R-helpers,
> > I'd like a function that given an arbitrary formula and a data frame
> > returns the residual of the dependent variable,and maintains all NA
> values.
>
> What does "maintains all NA values" actually mean?
> >
> > Here's an example that will give me what I want if my formula is
> y~x1+x2+x3
> > and my data frame is df:
> >
> > resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
> >
> > Here's the catch, I do not want my function to ever fail due to a factor
> > with only one level. A one-level factor may appear because 1) the user
> > passed it in, or 2) (more common) only one factor in a term is left after
> > na.exclude removes the other NA values.
> >
> > Here is the error I would get
>
> From what code?
>
>
> > above if one of the terms was a factor with
> > one level:
> > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> >  contrasts can be applied only to factors with 2 or more levels
>
> Unable to create that error with the actions you decribe but to not
> actually offer in coded form:
>
>
> > dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
> > lm(y~x1+x2+x3, dfrm)
>
> Call:
> lm(formula = y ~ x1 + x2 + x3, data = dfrm)
>
> Coefficients:
> (Intercept)   x1   x2TRUE   x3
>-0.16274 -0.30032   NA -0.09093
>
> > resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
>   1   2   3   4   5   6
> -0.16097245  0.65408508 -0.70098223 -0.15360434  1.26027872  0.55752239
>   7   8   9  10
> -0.05965653 -2.17480605  1.42917190 -0.65103650
>
> >
>
>
> > Instead of giving me an error, I'd like the function to do just what lm()
> > normally does when it sees a variable with no variance, ignore the
> variable
> > (coefficient is NA) and continue to regress out all the other variables.
> > Thus if 'x2' is a factor with one variable in the above example, I'd like
> > the function to return the result of:
> > resid(lm(y~x1+x3, data=df, na.action=na.exclude))
> > Can anyone provide me a straight forward recommendation for how to do
> this?
> > I feel like it should be easy, but I'm honestly stuck, and my Google
> > searching for this hasn't gotten anywhere. The key is that I'd like the
> > solution to be generic enough to work with an arbitrary linear formula,
> and
> > not substantially kludgy (like trying ever combination of regressions
> terms
> > until one works) as I'll be running this a lot on big data sets and don't
> > want my computation time swamped by running unnecessary regressions or
> > checking for number of factors after removing NAs.
> >
> > Thanks in advance!
> > --Robert
> >
> >
> > PS. The Google search feature in the R-help archives appears to be down:
> > http://tolstoy.newcastle.edu.au/R/
>
> It's working for me.
>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-10 Thread David Winsemius

> On Mar 10, 2016, at 2:00 PM, Robert McGehee  wrote:
> 
> Hello R-helpers,
> I'd like a function that given an arbitrary formula and a data frame
> returns the residual of the dependent variable,and maintains all NA values.

What does "maintains all NA values" actually mean?
> 
> Here's an example that will give me what I want if my formula is y~x1+x2+x3
> and my data frame is df:
> 
> resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
> 
> Here's the catch, I do not want my function to ever fail due to a factor
> with only one level. A one-level factor may appear because 1) the user
> passed it in, or 2) (more common) only one factor in a term is left after
> na.exclude removes the other NA values.
> 
> Here is the error I would get

>From what code?


> above if one of the terms was a factor with
> one level:
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>  contrasts can be applied only to factors with 2 or more levels

Unable to create that error with the actions you decribe but to not actually 
offer in coded form:


> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
> lm(y~x1+x2+x3, dfrm)

Call:
lm(formula = y ~ x1 + x2 + x3, data = dfrm)

Coefficients:
(Intercept)   x1   x2TRUE   x3  
   -0.16274 -0.30032   NA -0.09093  

> resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
  1   2   3   4   5   6 
-0.16097245  0.65408508 -0.70098223 -0.15360434  1.26027872  0.55752239 
  7   8   9  10 
-0.05965653 -2.17480605  1.42917190 -0.65103650 

> 


> Instead of giving me an error, I'd like the function to do just what lm()
> normally does when it sees a variable with no variance, ignore the variable
> (coefficient is NA) and continue to regress out all the other variables.
> Thus if 'x2' is a factor with one variable in the above example, I'd like
> the function to return the result of:
> resid(lm(y~x1+x3, data=df, na.action=na.exclude))
> Can anyone provide me a straight forward recommendation for how to do this?
> I feel like it should be easy, but I'm honestly stuck, and my Google
> searching for this hasn't gotten anywhere. The key is that I'd like the
> solution to be generic enough to work with an arbitrary linear formula, and
> not substantially kludgy (like trying ever combination of regressions terms
> until one works) as I'll be running this a lot on big data sets and don't
> want my computation time swamped by running unnecessary regressions or
> checking for number of factors after removing NAs.
> 
> Thanks in advance!
> --Robert
> 
> 
> PS. The Google search feature in the R-help archives appears to be down:
> http://tolstoy.newcastle.edu.au/R/

It's working for me.

> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with factor having1 level

2016-03-10 Thread Ben Bolker
Robert McGehee  gmail.com> writes:

> 
> Hello R-helpers,
> I'd like a function that given an arbitrary formula and a data frame
> returns the residual of the dependent variable, and maintains all
>  NA values.
> 
> Here's an example that will give me what I want if my formula is y~x1+x2+x3
> and my data frame is df:
> 
> resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
> 
> Here's the catch, I do not want my function to ever fail due to a factor
> with only one level. A one-level factor may appear because 1) the user
> passed it in, or 2) (more common) only one factor in a term is left after
> na.exclude removes the other NA values.
> 

 [snip to try to make Gmane happy]
> 
> Can anyone provide me a straight forward recommendation for how 
> to do this?

  The only approach I can think of is to screen for single-level factors
yourself and remove these factors from the
formula. It's a little tricky; you can't call model.frame() with a single-level
factor (that's where the error comes from), and you have to strip out NA
values yourself so you can see which factors end up with only a single
level after NA removal.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression with factor having1 level

2016-03-10 Thread Robert McGehee
Hello R-helpers,
I'd like a function that given an arbitrary formula and a data frame
returns the residual of the dependent variable, and maintains all NA values.

Here's an example that will give me what I want if my formula is y~x1+x2+x3
and my data frame is df:

resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))

Here's the catch, I do not want my function to ever fail due to a factor
with only one level. A one-level factor may appear because 1) the user
passed it in, or 2) (more common) only one factor in a term is left after
na.exclude removes the other NA values.

Here is the error I would get above if one of the terms was a factor with
one level:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels

Instead of giving me an error, I'd like the function to do just what lm()
normally does when it sees a variable with no variance, ignore the variable
(coefficient is NA) and continue to regress out all the other variables.
Thus if 'x2' is a factor with one variable in the above example, I'd like
the function to return the result of:
resid(lm(y~x1+x3, data=df, na.action=na.exclude))

Can anyone provide me a straight forward recommendation for how to do this?
I feel like it should be easy, but I'm honestly stuck, and my Google
searching for this hasn't gotten anywhere. The key is that I'd like the
solution to be generic enough to work with an arbitrary linear formula, and
not substantially kludgy (like trying ever combination of regressions terms
until one works) as I'll be running this a lot on big data sets and don't
want my computation time swamped by running unnecessary regressions or
checking for number of factors after removing NAs.

Thanks in advance!
--Robert


PS. The Google search feature in the R-help archives appears to be down:
http://tolstoy.newcastle.edu.au/R/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.