Another good option is SQL, the fastest and most scalable solution. If you decide to give it a try pay close attention to indexes.
> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Steve Miller > Sent: Monday, May 01, 2006 8:55 AM > To: 'Guojun Zhu'; [email protected] > Subject: Re: [R] efficiency in merging two data frames > > I'm sure you'll get ingenious responses to help you optimize > your R code. I > deal with similar investment data in even larger numbers > (e.g. 10 years of > daily return data for each stock in the Russell 3000), and > prefer reading > and consolidating the data in Python using dictionaries and > lists, then > either piping the data to R in a read statement (read.table("pipe > python...")) or using Rpy to write R data frames directly from Python. > Python is more facile with these basic data manipulations for > hundreds of > thousands or even millions of records, and performance is generally > considerably better. > > Steve Miller > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Guojun Zhu > Sent: Monday, May 01, 2006 2:35 AM > To: [email protected] > Subject: [R] efficiency in merging two data frames > > I have two data sets about lots of companies' stock > and fiscal data. One is monthly data with about > 144,000 lines, and the other is quaterly with about > 56,000. Each data set takes different company code. > I need to merge these two together. I read both ask > cvs. And the other file with corresponding firm code. > Now I have three data sets. return$PERMNO, > account$GVKEY. id is the data frames of the > corresponding relation and has both id$PERMNO and > id$GVKEY. Also, I need to convert the return's month > into quarter and finally merge two data frames(return > and account). I end up write a short program for > this, but it runs very slow. 15+ minutes. Is there > quick way to do it. Here is my original codes. > > > > id$fy=rep(0,length(id$PERMNO)) > for (i in 1:length(id$PERMNO)) > > id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]] > > return$GVKEY=rep(0,length(return$PERMNO)) > return$fyy=rep(0,length(return$PERMNO)) > return$fyq=rep(0,length(return$PERMNO)) > for (i in i:length(return$PERMNO)) { > temp<-id$PERMNO==return$PERMNO[[i]]; > tempmon<-id$fy[temp][[1]]; > if (return$month[[i]]<-tempmon) { > return$fyy[[i]]<-return$year[[i]]; > return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3; > } > else{ > return$fyy[[i]]<-return$year[[i]]+1; > return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3; > } > return$GVKEY[[i]]<-id$GVKEY[temp][[1]]; > } > > returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by > .y<-c("GVKEY", > "fyy","fyq")) > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
