Thank you all. I'm very happy with this solution. Just two questions: I use mutate() with package plyr and it gaves me a error message, is it a new function and my package may be old? Is there any extractor for the R-squared?
Thanks again, Cecília Carmo ________________________________________ De: Peter Ehlers [ehl...@ucalgary.ca] Enviado: quarta-feira, 3 de Abril de 2013 19:01 Para: Adams, Jean Cc: Cecilia Carmo; r-help@r-project.org Assunto: Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data A few minor improvements to Jean's post suggested inline below. On 2013-04-03 05:41, Adams, Jean wrote: > Cecilia, > > Thanks for providing a reproducible example. Excellent. > > You could use the ddply() function in the plyr package to fit the model for > each industry and year, keep the coefficients, and then estimate the fitted > and residual values. > > Jean > > library(plyr) > coef <- ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z, > data=dat)$coef) > names(coef) <- c("industry", "year", "b0", "b1", "b2") > final4 <- merge(final3, coef) > newdata1 <- transform(final4, Yhat = b0 + b1*X + b2*Z) > newdata2 <- transform(newdata1, residual = Y-Yhat) > plot(as.factor(newdata2$firm), newdata2$residual) Suggestion 1: Use the extractor function coef() and also avoid using the name of an R function as a variable name: Coef <- ddply(...., function(dat) coef(lm(....))) Suggestion 2: Use plyr's mutate() to do both transforms at once: newdata <- mutate(final4, Yhat = b0 + b1*X + b2*Z, residual = Y-Yhat) [Or you could use within(), but I now find mutate handier, mainly because it doesn't 'reverse' the order of the new variables.] Suggestion 3: Use the 'data=' argument in the plot: boxplot(residual ~ firm, data = newdata) Peter Ehlers > > On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo <cecilia.ca...@ua.pt> wrote: > >> Hi R-helpers, >> >> >> >> My real data is a panel (unbalanced and with gaps in years) of thousands >> of firms, by year and industry, and with financial information (variables >> X, Y, Z, for example), the number of firms by year and industry is not >> always equal, the number of years by industry is not always equal. >> >> >> >> #reproducible example >> firm1<-sort(rep(1:10,5),decreasing=F) >> year1<-rep(2000:2004,10) >> industry1<-rep(20,50) >> X<-rnorm(50) >> Y<-rnorm(50) >> Z<-rnorm(50) >> data1<-data.frame(firm1,year1,industry1,X,Y,Z) >> data1 >> colnames(data1)<-c("firm","year","industry","X","Y","Z") >> >> >> >> firm2<-sort(rep(11:15,3),decreasing=F) >> year2<-rep(2001:2003,5) >> industry2<-rep(30,15) >> X<-rnorm(15) >> Y<-rnorm(15) >> Z<-rnorm(15) >> data2<-data.frame(firm2,year2,industry2,X,Y,Z) >> data2 >> colnames(data2)<-c("firm","year","industry","X","Y","Z") >> >> firm3<-sort(rep(16:20,4),decreasing=F) >> year3<-rep(2001:2004,5) >> industry3<-rep(40,20) >> X<-rnorm(20) >> Y<-rnorm(20) >> Z<-rnorm(20) >> data3<-data.frame(firm3,year3,industry3,X,Y,Z) >> data3 >> colnames(data3)<-c("firm","year","industry","X","Y","Z") >> >> >> >> final1<-rbind(data1,data2) >> final2<-rbind(final1,data3) >> final2 >> final3<-final2[order(final2$industry,final2$year),] >> final3 >> >> >> >> I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year, >> to obtain the estimates of b0, b1 and b2 by industry and year (for example >> I need to have de b0 for industry 20 and year 2000, for industry 20 and >> year 2001...). Then I need to calculate the fitted values and the residuals >> by firm so I need to keep b0, b1 and b2 in a way that I could do something >> like >> newdata1<-transform(final3,Y'=b0+b1.X+b2.Z) >> newdata2<-transform(newdata1,residual=Y-Y') >> or another way to keep Y' and the residuals in a dataframe with the >> columns firm and year. >> >> >> >> Until now I have been doing this in very hard way and because I need to do >> it several times, I need your help to get an easier way. >> >> >> >> Thank you, >> >> >> >> Cecília Carmo >> >> Universidade de Aveiro >> >> Portugal >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.