On Nov 25, 2013, at 3:35 PM, Gary Dong wrote: > Dear R users, > > I have a large data set which includes data from 300 cities. I want to run > a biviriate regression for each city and record the coefficient and the > adjusted R square. > > For example, in the following, I have 10 cities represented by numbers from > 1 to 10: > > x = cumsum(c(0, runif(999, -1, +1))) > y = cumsum(c(0, runif(999, -1, +1))) > city = rep(1:10,each=100) > data<-data.frame(cbind(x,y,city)) > > I can manually run regressions for each city: > fit_city1 <- lm(y ~ x,data=subset(data,data$city==1)) > summary(fit_city1) > > Obvious, it is very tedious to run 300 regressions. I wonder if there is a > quicker way to do this. Use for loop? what I want to see is something like > this: > > City Coefficient Adjusted R square > 1 -0.05 0.36 > 2 -0.12 0.20 > 3 -0.05 0.32 > ..... > The way to get the most rapid response from this list is to post a dataset that represents the complexity of the problem. Presumably this large dataset is either a dataframe with a column of city entries or a list of dataframes. Why not post dput() applied to an extract of three of the cities and include sufficient rows to allow a regression?
> > [[alternative HTML version deleted]] > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. This is a plain text list. -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.