Hi

and you would better to give up thinking in "for cycles" as a good 
tool for data manipulation and calculation. 

R has usually functions for direct manipulation and calculation with 
vectors or matrices, vhich are far more effective than calculation in 
cycle.

I also recommend you to go through Paul Johnsons Rtips which is set 
of many good ideas how to do some things.

HTH
Petr

On 1 May 2006 at 7:54, Steve Miller wrote:

From:                   "Steve Miller" <[EMAIL PROTECTED]>
To:                     "'Guojun Zhu'" <[EMAIL PROTECTED]>,
        <[email protected]>
Date sent:              Mon, 1 May 2006 07:54:54 -0500
Subject:                Re: [R] efficiency in merging two data frames

> I'm sure you'll get ingenious responses to help you optimize your R
> code. I deal with similar investment data in even larger numbers (e.g.
> 10 years of daily return data for each stock in the Russell 3000), and
> prefer reading and consolidating the data in Python using dictionaries
> and lists, then either piping the data to R in a read statement
> (read.table("pipe python...")) or using Rpy to write R data frames
> directly from Python. Python is more facile with these basic data
> manipulations for hundreds of thousands or even millions of records,
> and performance is generally considerably better.
> 
> Steve Miller 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Guojun Zhu
> Sent: Monday, May 01, 2006 2:35 AM To: [email protected]
> Subject: [R] efficiency in merging two data frames
> 
> I have two data sets about lots of companies' stock
> and fiscal data.  One is monthly data with about
> 144,000 lines, and the other is quaterly with about
> 56,000.  Each data set takes different company code. 
> I need to merge these two together.  I read both ask
> cvs.  And the other file with corresponding firm code.
>  Now I have three data sets. return$PERMNO,
> account$GVKEY.  id is the data frames of the
> corresponding relation and has both id$PERMNO and
> id$GVKEY.  Also, I need to convert the return's month
> into quarter and finally merge two data frames(return
> and account).  I end up write a short program for
> this, but it runs very slow.  15+ minutes.  Is there
> quick way to do it.  Here is my original codes.
> 
> 
> 
> id$fy=rep(0,length(id$PERMNO))
> for (i in 1:length(id$PERMNO))
> 
> id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]]
> 
> return$GVKEY=rep(0,length(return$PERMNO))
> return$fyy=rep(0,length(return$PERMNO))
> return$fyq=rep(0,length(return$PERMNO))
> for (i in i:length(return$PERMNO)) {
>     temp<-id$PERMNO==return$PERMNO[[i]];
>     tempmon<-id$fy[temp][[1]];
>     if (return$month[[i]]<-tempmon) {
>  return$fyy[[i]]<-return$year[[i]];
>  return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3;
>  }
>       else{
>  return$fyy[[i]]<-return$year[[i]]+1;
>  return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3;
>  }
>     return$GVKEY[[i]]<-id$GVKEY[temp][[1]];
> }
> 
> returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by.y<-c("G
> VKEY", "fyy","fyq"))
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to