Re: [R] Names of variables needed in newdata for predict.glm
> On Mar 31, 2018, at 8:48 AM, Bendix Carstensen > wrote: > > all.vars works fine, EXCEPT, it give a bit too much. > I only want the regression variables, but in the following example I also get > "k" the variable holding the chosen knots. Any machinery to find only "real" > regression variables? > cheers, Bendix > > library( splines ) > y <- rnorm(100) > x <- rnorm(100) > k <- -1:1 > ml <- lm( y ~ bs(x,knots=k) ) > mg <- glm( y ~ bs(x,knots=k) ) > all.vars(ml$terms) > all.vars(mg$terms) > all.vars(mg$formula) If you allowed a requirement that "real" regression variables have been passed in a data argument, then this might succeed: > ml <- lm( y ~ bs(x,knots=k), data=dat ) > all.vars(ml$terms) [1] "y" "x" "k" > all.vars(ml$formula) character(0) > all.vars(ml$terms)[ all.vars(ml$terms) %in% names(dat)] [1] "y" "x" -- David. > > ________________________ > Fra: Marc Girondot > Sendt: 8. marts 2018 06:26 > Til: Bendix Carstensen; r-help@r-project.org > Emne: Re: [R] Names of variables needed in newdata for predict.glm > > Hi, > > Some try: >> names(mi$xlevels) > [1] "f" >> all.vars(mi$formula) > [1] "D" "x" "f" "Y" >> names(mx$xlevels) > [1] "f" >> all.vars(mx$formula) > [1] "D" "x" "f" > > When offset is indicated out of the formula, it does not work... > > Marc > > Le 07/03/2018 à 06:20, Bendix Carstensen a écrit : >> I would like to extract the names, modes [numeric/factor] and levels >> of variables needed in a data frame supplied as newdata= argument to >> predict.glm() >> >> Here is a small example illustrating my troubles; what I want from >> (both of) the glm objects is the vector c("x","f","Y") and an >> indication that f is a factor: >> >> library( splines ) >> dd <- data.frame( D = sample(0:1,200,rep=T), >> x = abs(rnorm(200)), >> f = factor(sample(letters[1:4],200,rep=T)), >> Y = runif(200,0.5,10) ) >> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , >> family=poisson, data=dd) >> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), >> family=poisson, data=dd) >> >> attr(mx$terms,"dataClasses") >> attr(mi$terms,"dataClasses") >> mi$xlevels >> mx$xlevels >> >> ...so far not quite there. >> >> Regards, >> >> Bendix Carstensen >> >> Senior Statistician >> Steno Diabetes Center >> Clinical Epidemiology >> Niels Steensens Vej 2-4 >> DK-2820 Gentofte, Denmark >> b...@bxc.dk >> bendix.carsten...@regionh.dk >> http://BendixCarstensen.com >> >> >> >> >> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette >> modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, >> beder vi dig venligst informere afsender om fejlen ved at bruge >> svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at >> videresende eller kopiere den. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > > > Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette > modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder > vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. > Samtidig bedes du slette e-mailen med det samme uden at videresende eller > kopiere den. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Names of variables needed in newdata for predict.glm
all.vars works fine, EXCEPT, it give a bit too much. I only want the regression variables, but in the following example I also get "k" the variable holding the chosen knots. Any machinery to find only "real" regression variables? cheers, Bendix library( splines ) y <- rnorm(100) x <- rnorm(100) k <- -1:1 ml <- lm( y ~ bs(x,knots=k) ) mg <- glm( y ~ bs(x,knots=k) ) all.vars(ml$terms) all.vars(mg$terms) all.vars(mg$formula) Fra: Marc Girondot Sendt: 8. marts 2018 06:26 Til: Bendix Carstensen; r-help@r-project.org Emne: Re: [R] Names of variables needed in newdata for predict.glm Hi, Some try: > names(mi$xlevels) [1] "f" > all.vars(mi$formula) [1] "D" "x" "f" "Y" > names(mx$xlevels) [1] "f" > all.vars(mx$formula) [1] "D" "x" "f" When offset is indicated out of the formula, it does not work... Marc Le 07/03/2018 à 06:20, Bendix Carstensen a écrit : > I would like to extract the names, modes [numeric/factor] and levels > of variables needed in a data frame supplied as newdata= argument to > predict.glm() > > Here is a small example illustrating my troubles; what I want from > (both of) the glm objects is the vector c("x","f","Y") and an > indication that f is a factor: > > library( splines ) > dd <- data.frame( D = sample(0:1,200,rep=T), >x = abs(rnorm(200)), >f = factor(sample(letters[1:4],200,rep=T)), >Y = runif(200,0.5,10) ) > mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , > family=poisson, data=dd) > mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), > family=poisson, data=dd) > > attr(mx$terms,"dataClasses") > attr(mi$terms,"dataClasses") > mi$xlevels > mx$xlevels > > ...so far not quite there. > > Regards, > > Bendix Carstensen > > Senior Statistician > Steno Diabetes Center > Clinical Epidemiology > Niels Steensens Vej 2-4 > DK-2820 Gentofte, Denmark > b...@bxc.dk > bendix.carsten...@regionh.dk > http://BendixCarstensen.com > > > > > Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette > modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder > vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. > Samtidig bedes du slette e-mailen med det samme uden at videresende eller > kopiere den. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at videresende eller kopiere den. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Names of variables needed in newdata for predict.glm
Hi Bendix, If the 'model' argument to glm() is TRUE (the default), you can get the structure of the model frame that was used to fit the model, by using: > str(mx$data) 'data.frame': 200 obs. of 4 variables: $ D: int 0 1 0 1 1 0 1 1 1 1 ... $ x: num 0.705 2.15 0.572 1.249 0.807 ... $ f: Factor w/ 4 levels "a","b","c","d": 1 4 1 4 4 1 4 2 4 4 ... $ Y: num 0.787 8.267 3.085 5.738 9.593 ... > str(mi$data) 'data.frame': 200 obs. of 4 variables: $ D: int 0 1 0 1 1 0 1 1 1 1 ... $ x: num 0.705 2.15 0.572 1.249 0.807 ... $ f: Factor w/ 4 levels "a","b","c","d": 1 4 1 4 4 1 4 2 4 4 ... $ Y: num 0.787 8.267 3.085 5.738 9.593 ... The first column in the data frame will be the response variable. In both cases, the offset variable 'Y' is included, whether the offset was part of the formula or specified as a separate argument. You can then process the results as you need from there, such as: > sapply(mx$data, class) D x f Y "integer" "numeric" "factor" "numeric" Regards, Marc Schwartz > On Mar 8, 2018, at 12:26 AM, Marc Girondot via R-help > wrote: > > Hi, > > Some try: > > names(mi$xlevels) > [1] "f" > > all.vars(mi$formula) > [1] "D" "x" "f" "Y" > > names(mx$xlevels) > [1] "f" > > all.vars(mx$formula) > [1] "D" "x" "f" > > When offset is indicated out of the formula, it does not work... > > Marc > > Le 07/03/2018 à 06:20, Bendix Carstensen a écrit : >> I would like to extract the names, modes [numeric/factor] and levels >> of variables needed in a data frame supplied as newdata= argument to >> predict.glm() >> >> Here is a small example illustrating my troubles; what I want from >> (both of) the glm objects is the vector c("x","f","Y") and an >> indication that f is a factor: >> >> library( splines ) >> dd <- data.frame( D = sample(0:1,200,rep=T), >> x = abs(rnorm(200)), >> f = factor(sample(letters[1:4],200,rep=T)), >> Y = runif(200,0.5,10) ) >> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , >> family=poisson, data=dd) >> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), >> family=poisson, data=dd) >> >> attr(mx$terms,"dataClasses") >> attr(mi$terms,"dataClasses") >> mi$xlevels >> mx$xlevels >> >> ...so far not quite there. >> >> Regards, >> >> Bendix Carstensen >> >> Senior Statistician >> Steno Diabetes Center >> Clinical Epidemiology >> Niels Steensens Vej 2-4 >> DK-2820 Gentofte, Denmark >> b...@bxc.dk >> bendix.carsten...@regionh.dk >> http://BendixCarstensen.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Names of variables needed in newdata for predict.glm
Hi, Some try: > names(mi$xlevels) [1] "f" > all.vars(mi$formula) [1] "D" "x" "f" "Y" > names(mx$xlevels) [1] "f" > all.vars(mx$formula) [1] "D" "x" "f" When offset is indicated out of the formula, it does not work... Marc Le 07/03/2018 à 06:20, Bendix Carstensen a écrit : I would like to extract the names, modes [numeric/factor] and levels of variables needed in a data frame supplied as newdata= argument to predict.glm() Here is a small example illustrating my troubles; what I want from (both of) the glm objects is the vector c("x","f","Y") and an indication that f is a factor: library( splines ) dd <- data.frame( D = sample(0:1,200,rep=T), x = abs(rnorm(200)), f = factor(sample(letters[1:4],200,rep=T)), Y = runif(200,0.5,10) ) mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , family=poisson, data=dd) mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), family=poisson, data=dd) attr(mx$terms,"dataClasses") attr(mi$terms,"dataClasses") mi$xlevels mx$xlevels ...so far not quite there. Regards, Bendix Carstensen Senior Statistician Steno Diabetes Center Clinical Epidemiology Niels Steensens Vej 2-4 DK-2820 Gentofte, Denmark b...@bxc.dk bendix.carsten...@regionh.dk http://BendixCarstensen.com Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at videresende eller kopiere den. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.