Re: [R] coef(summary) and plyr
Upon reading the plyr documentation that was the distinct impression I got and I´m glad that whatever expectations I had developed regarding plyr were fulfilled. Thx for the input Hadley. Maybe this is a cumbersome solution, but it works.. And Matthew, I will most definitively look into the datatable library. mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100)) mydf$fac-factor(sample((0:2),replace=T,100)) mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100) dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl test-function(a){ coef(summary(a))-lo a-colnames(lo) b-rownames(lo) c-length(a) e-character(0) r-NULL for (x in (1:c)){ d-rep(paste(a[1:c],b[x],sep= )) e-paste(c(e,d)) t-lo[x,] r-c(r,t) names(r)-e } return(r) } ldply(dl,function(x) test(x))-g g Regards, Moleps On 9. aug. 2010, at 19.55, Hadley Wickham wrote: That's exactly what dlply does - so you should never have to do that yourself. I'm unclear what you are saying. Are you saying that the plyr function _should_ have examined the objects in that list and determined that there were 4 rows and properly labeled the rows to indicate which list they came from? Yes, exactly. It's the output from coef(summary(x)) that makes it look like this isn't happening. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
correction... Col and rows were mixed up and loop only worked when rows were less than or equal to number of columns //M test-function(a){ coef(summary(a))-lo a-colnames(lo) b-rownames(lo) c-length(a) e-character(0) r-NULL for (x in (1:length(b))){ d-rep(paste(a[1:c],b[x],sep= )) e-paste(c(e,d)) t-lo[x,] r-c(r,t) names(r)-e } return(r) } On 9. aug. 2010, at 19.55, Hadley Wickham wrote: That's exactly what dlply does - so you should never have to do that yourself. I'm unclear what you are saying. Are you saying that the plyr function _should_ have examined the objects in that list and determined that there were 4 rows and properly labeled the rows to indicate which list they came from? Yes, exactly. It's the output from coef(summary(x)) that makes it look like this isn't happening. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
On Aug 9, 2010, at 7:51 AM, moleps wrote: Dear all, I´m having trouble getting a list of regression variables back into a dataframe. mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100)) mydf$fac-factor(sample((0:2),replace=T,100)) mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100) dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl here I´d like to use ldply(dl,coef(summary)) or something similar but I cant figure it out... dfdl - ldply(dl, function(x) coef(summary(x)) ) Doesn't create a grouping variable, so: dfdl$group=rep(0:2, each=4) David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
If you look at the output (as I did) you should see that despite whatever expectations you have developed regarding plyr, that it did not produce a grouping variable: ldply(dl, function(x) coef(summary(x)) ) facEstimate Std. Error t value Pr(|t|) 10 -0.3563418 0.1438322 -2.477483 1.820555e-02 21 0.9197772 0.15259006.027768 7.097623e-07 32 3.0481679 0.1331307 22.896050 1.197920e-22 40 -18.7726473 0.1281064 -146.539553 2.125848e-50 51 -0.2961841 0.1885210 -1.571093 1.273942e-01 62 1.2846496 0.18333947.006946 1.277086e-07 70 2.9664816 0.1737222 17.076010 2.448612e-16 81 -18.7265068 0.2044723 -91.584567 3.048491e-36 92 0.3993073 0.19797132.016996 5.455569e-02 10 0 0.7657945 0.24774593.091048 4.846678e-03 11 1 3.0365005 0.1731814 17.533641 1.470033e-15 12 2 -19.2140081 0.1882448 -102.069256 2.741417e-34 Warning message: In data.frame(..., check.names = FALSE) : row names were found from a short variable and have been discarded -- David On Aug 9, 2010, at 10:11 AM, moleps wrote: ldply doesnt need a grouping variable as far as I understand the command.. Description For each element of a list, apply function then combine results into a data frame Usage ldply(.data, .fun = NULL, ..., .progress = none) regards, M On 9. aug. 2010, at 15.33, David Winsemius wrote: On Aug 9, 2010, at 7:51 AM, moleps wrote: Dear all, I´m having trouble getting a list of regression variables back into a dataframe. mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100)) mydf$fac-factor(sample((0:2),replace=T,100)) mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100) dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl here I´d like to use ldply(dl,coef(summary)) or something similar but I cant figure it out... dfdl - ldply(dl, function(x) coef(summary(x)) ) Doesn't create a grouping variable, so: dfdl$group=rep(0:2, each=4) David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
On Mon, Aug 9, 2010 at 9:29 AM, David Winsemius dwinsem...@comcast.net wrote: If you look at the output (as I did) you should see that despite whatever expectations you have developed regarding plyr, that it did not produce a grouping variable: ldply(dl, function(x) coef(summary(x)) ) fac Estimate Std. Error t value Pr(|t|) 1 0 -0.3563418 0.1438322 -2.477483 1.820555e-02 2 1 0.9197772 0.1525900 6.027768 7.097623e-07 3 2 3.0481679 0.1331307 22.896050 1.197920e-22 4 0 -18.7726473 0.1281064 -146.539553 2.125848e-50 5 1 -0.2961841 0.1885210 -1.571093 1.273942e-01 6 2 1.2846496 0.1833394 7.006946 1.277086e-07 7 0 2.9664816 0.1737222 17.076010 2.448612e-16 8 1 -18.7265068 0.2044723 -91.584567 3.048491e-36 9 2 0.3993073 0.1979713 2.016996 5.455569e-02 10 0 0.7657945 0.2477459 3.091048 4.846678e-03 11 1 3.0365005 0.1731814 17.533641 1.470033e-15 12 2 -19.2140081 0.1882448 -102.069256 2.741417e-34 cf. ldply(dl, coef) fac (Intercept)x1 x2x3 1 0 -0.12051346 1.1391933 3.022287 -19.01828 2 1 -0.08890497 1.0741715 3.219577 -19.14279 3 2 -0.12728421 0.9284263 2.973905 -19.12774 which you can see maintains the original grouping variable fac. The problem is that summary stores the variable names as rownames, which does not make for easy rbinding. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
On Aug 9, 2010, at 10:11 AM, moleps wrote: ldply doesnt need a grouping variable as far as I understand the command.. There is one further improvement to consider. When I tried using dlply to tackle a problem on which I had been bashing my head for the last three days and it gave just the results I had been looking for, I also noticed that the dlply function returns the grouping variable levels in an attribute, split_labels, which could be unlisted to use as an argument to the rep() call I suggested earlier: dfdl$group=rep(unlist(attr(dl, split_labels)), each=4) That might make the results more self-documenting in situations where the grouping levels were more involved than 0:2. (Now, if I can get rms::Predict to behave as nicely with the plyr functions as did rms::cph, I will be home free.) -- David. Description For each element of a list, apply function then combine results into a data frame Usage ldply(.data, .fun = NULL, ..., .progress = none) regards, M On 9. aug. 2010, at 15.33, David Winsemius wrote: On Aug 9, 2010, at 7:51 AM, moleps wrote: Dear all, I´m having trouble getting a list of regression variables back into a dataframe. mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100)) mydf$fac-factor(sample((0:2),replace=T,100)) mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100) dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl here I´d like to use ldply(dl,coef(summary)) or something similar but I cant figure it out... dfdl - ldply(dl, function(x) coef(summary(x)) ) Doesn't create a grouping variable, so: dfdl$group=rep(0:2, each=4) David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
There is one further improvement to consider. When I tried using dlply to tackle a problem on which I had been bashing my head for the last three days and it gave just the results I had been looking for, I also noticed that the dlply function returns the grouping variable levels in an attribute, split_labels, which could be unlisted to use as an argument to the rep() call I suggested earlier: dfdl$group=rep(unlist(attr(dl, split_labels)), each=4) That might make the results more self-documenting in situations where the grouping levels were more involved than 0:2. That's exactly what dlply does - so you should never have to do that yourself. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
ldply doesnt need a grouping variable as far as I understand the command.. Description For each element of a list, apply function then combine results into a data frame Usage ldply(.data, .fun = NULL, ..., .progress = none) regards, M On 9. aug. 2010, at 15.33, David Winsemius wrote: On Aug 9, 2010, at 7:51 AM, moleps wrote: Dear all, I´m having trouble getting a list of regression variables back into a dataframe. mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100)) mydf$fac-factor(sample((0:2),replace=T,100)) mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100) dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl here I´d like to use ldply(dl,coef(summary)) or something similar but I cant figure it out... dfdl - ldply(dl, function(x) coef(summary(x)) ) Doesn't create a grouping variable, so: dfdl$group=rep(0:2, each=4) David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
On Aug 9, 2010, at 12:47 PM, Hadley Wickham wrote: There is one further improvement to consider. When I tried using dlply to tackle a problem on which I had been bashing my head for the last three days and it gave just the results I had been looking for, I also noticed that the dlply function returns the grouping variable levels in an attribute, split_labels, which could be unlisted to use as an argument to the rep() call I suggested earlier: dfdl$group=rep(unlist(attr(dl, split_labels)), each=4) That might make the results more self-documenting in situations where the grouping levels were more involved than 0:2. That's exactly what dlply does - so you should never have to do that yourself. I'm unclear what you are saying. Are you saying that the plyr function _should_ have examined the objects in that list and determined that there were 4 rows and properly labeled the rows to indicate which list they came from? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
That's exactly what dlply does - so you should never have to do that yourself. I'm unclear what you are saying. Are you saying that the plyr function _should_ have examined the objects in that list and determined that there were 4 rows and properly labeled the rows to indicate which list they came from? Yes, exactly. It's the output from coef(summary(x)) that makes it look like this isn't happening. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
Another option for consideration : library(data.table) mydt = as.data.table(mydf) mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac] fac X.Intercept. x1 x2x3 [1,] 0 -0.16247059 1.130220 2.988769 -19.14719 [2,] 1 0.08224509 1.216673 2.847960 -19.16105 [3,] 2 0.02052320 1.135421 3.134154 -19.22555 mydt[,data.table(coef(summary(lm(y~x1+x2+x3))),keep.rownames=TRUE), by=fac] fac rn Estimate Std..Error t.value Pr...t.. [1,] 0 (Intercept) -0.16247059 0.1521507 -1.0678269 2.929087e-01 [2,] 0 x1 1.13021985 0.13740208.2256414 1.079035e-09 [3,] 0 x2 2.98876920 0.1404903 21.2738533 1.325909e-21 [4,] 0 x3 -19.14719151 0.1335139 -143.4096890 4.520371e-50 [5,] 1 (Intercept) 0.08224509 0.23606640.3483981 7.313719e-01 [6,] 1 x1 1.21667349 0.27232014.4678058 2.637743e-04 [7,] 1 x2 2.84796003 0.2232960 12.7541904 9.192555e-11 [8,] 1 x3 -19.16104669 0.2394431 -80.0233818 1.707058e-25 [9,] 2 (Intercept) 0.02052320 0.19025260.1078734 9.147302e-01 [10,] 2 x1 1.13542085 0.17863336.3561559 2.980475e-07 [11,] 2 x2 3.13415398 0.1894404 16.5442781 7.827178e-18 [12,] 2 x3 -19.22554984 0.1708307 -112.5415605 2.536686e-45 http://datatable.r-forge.r-project.org/ Matthew -- View this message in context: http://r.789695.n4.nabble.com/coef-summary-and-plyr-tp2318460p2319068.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
On Mon, Aug 9, 2010 at 4:30 PM, Matthew Dowle mdo...@mdowle.plus.com wrote: Another option for consideration : library(data.table) mydt = as.data.table(mydf) mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac] fac X.Intercept. x1 x2 x3 [1,] 0 -0.16247059 1.130220 2.988769 -19.14719 [2,] 1 0.08224509 1.216673 2.847960 -19.16105 [3,] 2 0.02052320 1.135421 3.134154 -19.22555 mydt[,data.table(coef(summary(lm(y~x1+x2+x3))),keep.rownames=TRUE), by=fac] fac rn Estimate Std..Error t.value Pr...t.. [1,] 0 (Intercept) -0.16247059 0.1521507 -1.0678269 2.929087e-01 [2,] 0 x1 1.13021985 0.1374020 8.2256414 1.079035e-09 [3,] 0 x2 2.98876920 0.1404903 21.2738533 1.325909e-21 [4,] 0 x3 -19.14719151 0.1335139 -143.4096890 4.520371e-50 [5,] 1 (Intercept) 0.08224509 0.2360664 0.3483981 7.313719e-01 [6,] 1 x1 1.21667349 0.2723201 4.4678058 2.637743e-04 [7,] 1 x2 2.84796003 0.2232960 12.7541904 9.192555e-11 [8,] 1 x3 -19.16104669 0.2394431 -80.0233818 1.707058e-25 [9,] 2 (Intercept) 0.02052320 0.1902526 0.1078734 9.147302e-01 [10,] 2 x1 1.13542085 0.1786333 6.3561559 2.980475e-07 [11,] 2 x2 3.13415398 0.1894404 16.5442781 7.827178e-18 [12,] 2 x3 -19.22554984 0.1708307 -112.5415605 2.536686e-45 That reminds me: library(reshape) ldply(dl, function(x) namerows(as.data.frame(coef(summary(x) I think both Matthew and I would agree that row names are a pain. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.