Re: [R] Names of variables needed in newdata for predict.glm

2018-03-31 Thread David Winsemius

> On Mar 31, 2018, at 8:48 AM, Bendix Carstensen  
> wrote:
> 
> all.vars works fine, EXCEPT, it give a bit too much.
> I only want the regression variables, but in the following example I also get 
> "k" the variable holding the chosen knots. Any machinery to find only "real" 
> regression variables?
> cheers, Bendix
> 
> library( splines )
> y <- rnorm(100)
> x <- rnorm(100)
> k <- -1:1
> ml <-  lm( y ~ bs(x,knots=k) )
> mg <- glm( y ~ bs(x,knots=k) )
> all.vars(ml$terms)
> all.vars(mg$terms)
> all.vars(mg$formula)

If you allowed a requirement that "real" regression variables have been passed 
in a data argument, then this might succeed:

> ml <-  lm( y ~ bs(x,knots=k), data=dat )
> all.vars(ml$terms)
[1] "y" "x" "k"
> all.vars(ml$formula)
character(0)
> all.vars(ml$terms)[ all.vars(ml$terms) %in% names(dat)]
[1] "y" "x"

-- 
David.
> 

> 
> Fra: Marc Girondot 
> Sendt: 8. marts 2018 06:26
> Til: Bendix Carstensen; r-help@r-project.org
> Emne: Re: [R] Names of variables needed in newdata for predict.glm
> 
> Hi,
> 
> Some try:
>> names(mi$xlevels)
> [1] "f"
>> all.vars(mi$formula)
> [1] "D" "x" "f" "Y"
>> names(mx$xlevels)
> [1] "f"
>> all.vars(mx$formula)
> [1] "D" "x" "f"
> 
> When offset is indicated out of the formula, it does not work...
> 
> Marc
> 
> Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :
>> I would like to extract the names, modes [numeric/factor] and levels
>> of variables needed in a data frame supplied as newdata= argument to
>> predict.glm()
>> 
>> Here is a small example illustrating my troubles; what I want from
>> (both of) the glm objects is the vector c("x","f","Y") and an
>> indication that f is a factor:
>> 
>> library( splines )
>> dd <- data.frame( D = sample(0:1,200,rep=T),
>>   x = abs(rnorm(200)),
>>   f = factor(sample(letters[1:4],200,rep=T)),
>>   Y = runif(200,0.5,10) )
>> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , 
>> family=poisson, data=dd)
>> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), 
>> family=poisson, data=dd)
>> 
>> attr(mx$terms,"dataClasses")
>> attr(mi$terms,"dataClasses")
>> mi$xlevels
>> mx$xlevels
>> 
>> ...so far not quite there.
>> 
>> Regards,
>> 
>> Bendix Carstensen
>> 
>> Senior Statistician
>> Steno Diabetes Center
>> Clinical Epidemiology
>> Niels Steensens Vej 2-4
>> DK-2820 Gentofte, Denmark
>> b...@bxc.dk
>> bendix.carsten...@regionh.dk
>> http://BendixCarstensen.com
>> 
>> 
>> 
>> 
>> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
>> modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, 
>> beder vi dig venligst informere afsender om fejlen ved at bruge 
>> svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at 
>> videresende eller kopiere den.
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> 
> 
> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
> modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder 
> vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. 
> Samtidig bedes du slette e-mailen med det samme uden at videresende eller 
> kopiere den.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Names of variables needed in newdata for predict.glm

2018-03-31 Thread Bendix Carstensen
all.vars works fine, EXCEPT, it give a bit too much.
I only want the regression variables, but in the following example I also get 
"k" the variable holding the chosen knots. Any machinery to find only "real" 
regression variables?
cheers, Bendix

library( splines )
y <- rnorm(100)
x <- rnorm(100)
k <- -1:1
ml <-  lm( y ~ bs(x,knots=k) )
mg <- glm( y ~ bs(x,knots=k) )
all.vars(ml$terms)
all.vars(mg$terms)
all.vars(mg$formula)


Fra: Marc Girondot 
Sendt: 8. marts 2018 06:26
Til: Bendix Carstensen; r-help@r-project.org
Emne: Re: [R] Names of variables needed in newdata for predict.glm

Hi,

Some try:
 > names(mi$xlevels)
[1] "f"
 > all.vars(mi$formula)
[1] "D" "x" "f" "Y"
 > names(mx$xlevels)
[1] "f"
 > all.vars(mx$formula)
[1] "D" "x" "f"

When offset is indicated out of the formula, it does not work...

Marc

Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :
> I would like to extract the names, modes [numeric/factor] and levels
> of variables needed in a data frame supplied as newdata= argument to
> predict.glm()
>
> Here is a small example illustrating my troubles; what I want from
> (both of) the glm objects is the vector c("x","f","Y") and an
> indication that f is a factor:
>
> library( splines )
> dd <- data.frame( D = sample(0:1,200,rep=T),
>x = abs(rnorm(200)),
>f = factor(sample(letters[1:4],200,rep=T)),
>Y = runif(200,0.5,10) )
> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , 
> family=poisson, data=dd)
> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), 
> family=poisson, data=dd)
>
> attr(mx$terms,"dataClasses")
> attr(mi$terms,"dataClasses")
> mi$xlevels
> mx$xlevels
>
> ...so far not quite there.
>
> Regards,
>
> Bendix Carstensen
>
> Senior Statistician
> Steno Diabetes Center
> Clinical Epidemiology
> Niels Steensens Vej 2-4
> DK-2820 Gentofte, Denmark
> b...@bxc.dk
> bendix.carsten...@regionh.dk
> http://BendixCarstensen.com
>
> 
>
>
> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
> modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder 
> vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. 
> Samtidig bedes du slette e-mailen med det samme uden at videresende eller 
> kopiere den.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>





Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder 
vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. 
Samtidig bedes du slette e-mailen med det samme uden at videresende eller 
kopiere den.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Names of variables needed in newdata for predict.glm

2018-03-08 Thread Marc Schwartz
Hi Bendix,

If the 'model' argument to glm() is TRUE (the default), you can get the 
structure of the model frame that was used to fit the model, by using:

> str(mx$data)
'data.frame':   200 obs. of  4 variables:
 $ D: int  0 1 0 1 1 0 1 1 1 1 ...
 $ x: num  0.705 2.15 0.572 1.249 0.807 ...
 $ f: Factor w/ 4 levels "a","b","c","d": 1 4 1 4 4 1 4 2 4 4 ...
 $ Y: num  0.787 8.267 3.085 5.738 9.593 ...


> str(mi$data)
'data.frame':   200 obs. of  4 variables:
 $ D: int  0 1 0 1 1 0 1 1 1 1 ...
 $ x: num  0.705 2.15 0.572 1.249 0.807 ...
 $ f: Factor w/ 4 levels "a","b","c","d": 1 4 1 4 4 1 4 2 4 4 ...
 $ Y: num  0.787 8.267 3.085 5.738 9.593 ...


The first column in the data frame will be the response variable.

In both cases, the offset variable 'Y' is included, whether the offset was part 
of the formula or specified as a separate argument.

You can then process the results as you need from there, such as:

> sapply(mx$data, class)
D x f Y 
"integer" "numeric"  "factor" "numeric" 


Regards,

Marc Schwartz




> On Mar 8, 2018, at 12:26 AM, Marc Girondot via R-help  
> wrote:
> 
> Hi,
> 
> Some try:
> > names(mi$xlevels)
> [1] "f"
> > all.vars(mi$formula)
> [1] "D" "x" "f" "Y"
> > names(mx$xlevels)
> [1] "f"
> > all.vars(mx$formula)
> [1] "D" "x" "f"
> 
> When offset is indicated out of the formula, it does not work...
> 
> Marc
> 
> Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :
>> I would like to extract the names, modes [numeric/factor] and levels
>> of variables needed in a data frame supplied as newdata= argument to
>> predict.glm()
>> 
>> Here is a small example illustrating my troubles; what I want from
>> (both of) the glm objects is the vector c("x","f","Y") and an
>> indication that f is a factor:
>> 
>> library( splines )
>> dd <- data.frame( D = sample(0:1,200,rep=T),
>>   x = abs(rnorm(200)),
>>   f = factor(sample(letters[1:4],200,rep=T)),
>>   Y = runif(200,0.5,10) )
>> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , 
>> family=poisson, data=dd)
>> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), 
>> family=poisson, data=dd)
>> 
>> attr(mx$terms,"dataClasses")
>> attr(mi$terms,"dataClasses")
>> mi$xlevels
>> mx$xlevels
>> 
>> ...so far not quite there.
>> 
>> Regards,
>> 
>> Bendix Carstensen
>> 
>> Senior Statistician
>> Steno Diabetes Center
>> Clinical Epidemiology
>> Niels Steensens Vej 2-4
>> DK-2820 Gentofte, Denmark
>> b...@bxc.dk
>> bendix.carsten...@regionh.dk
>> http://BendixCarstensen.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Names of variables needed in newdata for predict.glm

2018-03-07 Thread Marc Girondot via R-help

Hi,

Some try:
> names(mi$xlevels)
[1] "f"
> all.vars(mi$formula)
[1] "D" "x" "f" "Y"
> names(mx$xlevels)
[1] "f"
> all.vars(mx$formula)
[1] "D" "x" "f"

When offset is indicated out of the formula, it does not work...

Marc

Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :

I would like to extract the names, modes [numeric/factor] and levels
of variables needed in a data frame supplied as newdata= argument to
predict.glm()

Here is a small example illustrating my troubles; what I want from
(both of) the glm objects is the vector c("x","f","Y") and an
indication that f is a factor:

library( splines )
dd <- data.frame( D = sample(0:1,200,rep=T),
   x = abs(rnorm(200)),
   f = factor(sample(letters[1:4],200,rep=T)),
   Y = runif(200,0.5,10) )
mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , 
family=poisson, data=dd)
mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), 
family=poisson, data=dd)

attr(mx$terms,"dataClasses")
attr(mi$terms,"dataClasses")
mi$xlevels
mx$xlevels

...so far not quite there.

Regards,

Bendix Carstensen

Senior Statistician
Steno Diabetes Center
Clinical Epidemiology
Niels Steensens Vej 2-4
DK-2820 Gentofte, Denmark
b...@bxc.dk
bendix.carsten...@regionh.dk
http://BendixCarstensen.com




Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder 
vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. 
Samtidig bedes du slette e-mailen med det samme uden at videresende eller 
kopiere den.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.