Re: [R] coef(summary) and plyr

2010-08-10 Thread moleps
Upon reading the plyr documentation that was the distinct impression I got and 
I´m glad that whatever expectations  I had developed regarding plyr were 
fulfilled. Thx for the input Hadley. 

 Maybe this is a cumbersome solution, but it works.. 

And Matthew, I will most definitively look into the datatable library. 

mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100))
mydf$fac-factor(sample((0:2),replace=T,100))
 mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100)
dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl


test-function(a){
coef(summary(a))-lo
a-colnames(lo)
b-rownames(lo)
c-length(a)
e-character(0)
r-NULL
for (x in (1:c)){
d-rep(paste(a[1:c],b[x],sep= ))
e-paste(c(e,d))
t-lo[x,]
r-c(r,t)
names(r)-e
}
return(r)
}


ldply(dl,function(x) test(x))-g
g


Regards,

Moleps


On 9. aug. 2010, at 19.55, Hadley Wickham wrote:

 That's exactly what dlply does - so you should never have to do that
 yourself.
 
 I'm unclear what you are saying. Are you saying that the plyr function
 _should_ have examined the objects in that list and determined that there
 were 4 rows and properly labeled the rows to indicate which list they came
 from?
 
 Yes, exactly.  It's the output from coef(summary(x)) that makes it
 look like this isn't happening.
 
 Hadley
 
 -- 
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-10 Thread moleps
correction...

Col and rows were mixed up and loop only worked when rows were less than or 
equal to number of columns

//M


test-function(a){
coef(summary(a))-lo
a-colnames(lo)
b-rownames(lo)
c-length(a)
e-character(0)
r-NULL
for (x in (1:length(b))){
d-rep(paste(a[1:c],b[x],sep= ))
e-paste(c(e,d))
t-lo[x,]
r-c(r,t)
names(r)-e
}
return(r)
}

On 9. aug. 2010, at 19.55, Hadley Wickham wrote:

 That's exactly what dlply does - so you should never have to do that
 yourself.
 
 I'm unclear what you are saying. Are you saying that the plyr function
 _should_ have examined the objects in that list and determined that there
 were 4 rows and properly labeled the rows to indicate which list they came
 from?
 
 Yes, exactly.  It's the output from coef(summary(x)) that makes it
 look like this isn't happening.
 
 Hadley
 
 -- 
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread David Winsemius


On Aug 9, 2010, at 7:51 AM, moleps wrote:


Dear all,

I´m having trouble getting a list of regression variables back into  
a dataframe.


mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100))

mydf$fac-factor(sample((0:2),replace=T,100))

mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100)

dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl

here I´d like to use

ldply(dl,coef(summary)) or something similar but I cant figure it  
out...


dfdl - ldply(dl, function(x) coef(summary(x)) )

Doesn't create a grouping variable, so:

dfdl$group=rep(0:2, each=4)


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread David Winsemius
If you look at the output (as I did)  you should see that despite  
whatever expectations you have developed regarding plyr, that it did  
not produce a grouping variable:


 ldply(dl, function(x) coef(summary(x)) )
   facEstimate Std. Error t value Pr(|t|)
10  -0.3563418  0.1438322   -2.477483 1.820555e-02
21   0.9197772  0.15259006.027768 7.097623e-07
32   3.0481679  0.1331307   22.896050 1.197920e-22
40 -18.7726473  0.1281064 -146.539553 2.125848e-50
51  -0.2961841  0.1885210   -1.571093 1.273942e-01
62   1.2846496  0.18333947.006946 1.277086e-07
70   2.9664816  0.1737222   17.076010 2.448612e-16
81 -18.7265068  0.2044723  -91.584567 3.048491e-36
92   0.3993073  0.19797132.016996 5.455569e-02
10   0   0.7657945  0.24774593.091048 4.846678e-03
11   1   3.0365005  0.1731814   17.533641 1.470033e-15
12   2 -19.2140081  0.1882448 -102.069256 2.741417e-34
Warning message:
In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded

--
David


On Aug 9, 2010, at 10:11 AM, moleps wrote:

ldply doesnt need a grouping variable as far as I understand the  
command..


Description

For each element of a list, apply function then combine results into  
a data frame


Usage

ldply(.data, .fun = NULL, ..., .progress = none)


regards,

M


On 9. aug. 2010, at 15.33, David Winsemius wrote:



On Aug 9, 2010, at 7:51 AM, moleps wrote:


Dear all,

I´m having trouble getting a list of regression variables back  
into a dataframe.


mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100))

mydf$fac-factor(sample((0:2),replace=T,100))

mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100)

dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl

here I´d like to use

ldply(dl,coef(summary)) or something similar but I cant figure it  
out...


dfdl - ldply(dl, function(x) coef(summary(x)) )

Doesn't create a grouping variable, so:

dfdl$group=rep(0:2, each=4)


David Winsemius, MD
West Hartford, CT





David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
On Mon, Aug 9, 2010 at 9:29 AM, David Winsemius dwinsem...@comcast.net wrote:
 If you look at the output (as I did)  you should see that despite whatever
 expectations you have developed regarding plyr, that it did not produce a
 grouping variable:

 ldply(dl, function(x) coef(summary(x)) )
   fac    Estimate Std. Error     t value     Pr(|t|)
 1    0  -0.3563418  0.1438322   -2.477483 1.820555e-02
 2    1   0.9197772  0.1525900    6.027768 7.097623e-07
 3    2   3.0481679  0.1331307   22.896050 1.197920e-22
 4    0 -18.7726473  0.1281064 -146.539553 2.125848e-50
 5    1  -0.2961841  0.1885210   -1.571093 1.273942e-01
 6    2   1.2846496  0.1833394    7.006946 1.277086e-07
 7    0   2.9664816  0.1737222   17.076010 2.448612e-16
 8    1 -18.7265068  0.2044723  -91.584567 3.048491e-36
 9    2   0.3993073  0.1979713    2.016996 5.455569e-02
 10   0   0.7657945  0.2477459    3.091048 4.846678e-03
 11   1   3.0365005  0.1731814   17.533641 1.470033e-15
 12   2 -19.2140081  0.1882448 -102.069256 2.741417e-34

cf.

 ldply(dl, coef)
  fac (Intercept)x1   x2x3
1   0 -0.12051346 1.1391933 3.022287 -19.01828
2   1 -0.08890497 1.0741715 3.219577 -19.14279
3   2 -0.12728421 0.9284263 2.973905 -19.12774

which you can see maintains the original grouping variable fac.

The problem is that summary stores the variable names as rownames,
which does not make for easy rbinding.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread David Winsemius


On Aug 9, 2010, at 10:11 AM, moleps wrote:

ldply doesnt need a grouping variable as far as I understand the  
command..


There is one further improvement to consider. When I tried using dlply  
to tackle a problem on which I had been bashing my head for the last  
three days and it gave just the results I had been looking for, I also  
noticed that the dlply function returns the grouping variable levels  
in an attribute, split_labels, which could be unlisted to use as an  
argument to the rep() call I suggested earlier:


dfdl$group=rep(unlist(attr(dl, split_labels)), each=4)

That might make the results more self-documenting in situations where  
the grouping levels were more involved than 0:2.


(Now, if I can get rms::Predict to behave as nicely with the plyr  
functions as did rms::cph, I will be home free.)

--
David.


Description

For each element of a list, apply function then combine results into  
a data frame


Usage

ldply(.data, .fun = NULL, ..., .progress = none)


regards,

M


On 9. aug. 2010, at 15.33, David Winsemius wrote:



On Aug 9, 2010, at 7:51 AM, moleps wrote:


Dear all,

I´m having trouble getting a list of regression variables back  
into a dataframe.


mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100))

mydf$fac-factor(sample((0:2),replace=T,100))

mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100)

dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl

here I´d like to use

ldply(dl,coef(summary)) or something similar but I cant figure it  
out...


dfdl - ldply(dl, function(x) coef(summary(x)) )

Doesn't create a grouping variable, so:

dfdl$group=rep(0:2, each=4)


David Winsemius, MD
West Hartford, CT





David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
 There is one further improvement to consider. When I tried using dlply to
 tackle a problem on which I had been bashing my head for the last three days
 and it gave just the results I had been looking for, I also noticed that the
 dlply function returns the grouping variable levels in an attribute,
 split_labels, which could be unlisted to use as an argument to the rep()
 call I suggested earlier:

 dfdl$group=rep(unlist(attr(dl, split_labels)), each=4)

 That might make the results more self-documenting in situations where the
 grouping levels were more involved than 0:2.

That's exactly what dlply does - so you should never have to do that yourself.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread moleps
ldply doesnt need a grouping variable as far as I understand the command..

Description

For each element of a list, apply function then combine results into a data 
frame

Usage

ldply(.data, .fun = NULL, ..., .progress = none)


regards,

M


On 9. aug. 2010, at 15.33, David Winsemius wrote:

 
 On Aug 9, 2010, at 7:51 AM, moleps wrote:
 
 Dear all,
 
 I´m having trouble getting a list of regression variables back into a 
 dataframe.
 
 mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100))
 
 mydf$fac-factor(sample((0:2),replace=T,100))
 
 mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100)
 
 dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl
 
 here I´d like to use
 
 ldply(dl,coef(summary)) or something similar but I cant figure it out...
 
 dfdl - ldply(dl, function(x) coef(summary(x)) )
 
 Doesn't create a grouping variable, so:
 
 dfdl$group=rep(0:2, each=4)
 
 
 David Winsemius, MD
 West Hartford, CT
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread David Winsemius


On Aug 9, 2010, at 12:47 PM, Hadley Wickham wrote:

There is one further improvement to consider. When I tried using  
dlply to
tackle a problem on which I had been bashing my head for the last  
three days
and it gave just the results I had been looking for, I also noticed  
that the

dlply function returns the grouping variable levels in an attribute,
split_labels, which could be unlisted to use as an argument to  
the rep()

call I suggested earlier:

dfdl$group=rep(unlist(attr(dl, split_labels)), each=4)

That might make the results more self-documenting in situations  
where the

grouping levels were more involved than 0:2.


That's exactly what dlply does - so you should never have to do that  
yourself.


I'm unclear what you are saying. Are you saying that the plyr function  
_should_ have examined the objects in that list and determined that  
there were 4 rows and properly labeled the rows to indicate which list  
they came from?





Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
 That's exactly what dlply does - so you should never have to do that
 yourself.

 I'm unclear what you are saying. Are you saying that the plyr function
 _should_ have examined the objects in that list and determined that there
 were 4 rows and properly labeled the rows to indicate which list they came
 from?

Yes, exactly.  It's the output from coef(summary(x)) that makes it
look like this isn't happening.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread Matthew Dowle


Another option for consideration :

library(data.table)
mydt = as.data.table(mydf)

mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac]
 fac X.Intercept.   x1   x2x3
[1,]   0  -0.16247059 1.130220 2.988769 -19.14719
[2,]   1   0.08224509 1.216673 2.847960 -19.16105
[3,]   2   0.02052320 1.135421 3.134154 -19.22555

mydt[,data.table(coef(summary(lm(y~x1+x2+x3))),keep.rownames=TRUE),  by=fac]
 fac  rn Estimate Std..Error  t.value Pr...t..
[1,]   0 (Intercept)  -0.16247059  0.1521507   -1.0678269 2.929087e-01
[2,]   0  x1   1.13021985  0.13740208.2256414 1.079035e-09
[3,]   0  x2   2.98876920  0.1404903   21.2738533 1.325909e-21
[4,]   0  x3 -19.14719151  0.1335139 -143.4096890 4.520371e-50
[5,]   1 (Intercept)   0.08224509  0.23606640.3483981 7.313719e-01
[6,]   1  x1   1.21667349  0.27232014.4678058 2.637743e-04
[7,]   1  x2   2.84796003  0.2232960   12.7541904 9.192555e-11
[8,]   1  x3 -19.16104669  0.2394431  -80.0233818 1.707058e-25
[9,]   2 (Intercept)   0.02052320  0.19025260.1078734 9.147302e-01
[10,]   2  x1   1.13542085  0.17863336.3561559  2.980475e-07
[11,]   2  x2   3.13415398  0.1894404   16.5442781  7.827178e-18
[12,]   2  x3 -19.22554984  0.1708307 -112.5415605  2.536686e-45

http://datatable.r-forge.r-project.org/

Matthew




-- 
View this message in context: 
http://r.789695.n4.nabble.com/coef-summary-and-plyr-tp2318460p2319068.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coef(summary) and plyr

2010-08-09 Thread Hadley Wickham
On Mon, Aug 9, 2010 at 4:30 PM, Matthew Dowle mdo...@mdowle.plus.com wrote:


 Another option for consideration :

 library(data.table)
 mydt = as.data.table(mydf)

 mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac]
     fac X.Intercept.       x1       x2        x3
 [1,]   0  -0.16247059 1.130220 2.988769 -19.14719
 [2,]   1   0.08224509 1.216673 2.847960 -19.16105
 [3,]   2   0.02052320 1.135421 3.134154 -19.22555

 mydt[,data.table(coef(summary(lm(y~x1+x2+x3))),keep.rownames=TRUE),  by=fac]
     fac          rn     Estimate Std..Error      t.value     Pr...t..
 [1,]   0 (Intercept)  -0.16247059  0.1521507   -1.0678269 2.929087e-01
 [2,]   0          x1   1.13021985  0.1374020    8.2256414 1.079035e-09
 [3,]   0          x2   2.98876920  0.1404903   21.2738533 1.325909e-21
 [4,]   0          x3 -19.14719151  0.1335139 -143.4096890 4.520371e-50
 [5,]   1 (Intercept)   0.08224509  0.2360664    0.3483981 7.313719e-01
 [6,]   1          x1   1.21667349  0.2723201    4.4678058 2.637743e-04
 [7,]   1          x2   2.84796003  0.2232960   12.7541904 9.192555e-11
 [8,]   1          x3 -19.16104669  0.2394431  -80.0233818 1.707058e-25
 [9,]   2 (Intercept)   0.02052320  0.1902526    0.1078734 9.147302e-01
 [10,]   2          x1   1.13542085  0.1786333    6.3561559  2.980475e-07
 [11,]   2          x2   3.13415398  0.1894404   16.5442781  7.827178e-18
 [12,]   2          x3 -19.22554984  0.1708307 -112.5415605  2.536686e-45

That reminds me:

library(reshape)
ldply(dl, function(x) namerows(as.data.frame(coef(summary(x)

I think both Matthew and I would agree that row names are a pain.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.