I think you are looking for the 'data.table'
package.

On 09/10/2011 17:31, ivo welch wrote:
Dear R experts---I am struggling with memory and speed issues.  Advice
would be appreciated.

I have a long data set (of financial stock returns, with stock name
and trading day).  All three variables, stock return, id and day, are
irregular.  About 1.3GB in object.size (200MB on disk).  now, I need
to merge the main data set with some aggregate data (e.g., the S&P500
market rate of return, with a day index) from the same day.  this
"market data set" is not a big data set (object.size=300K, 5 columns,
12000 rows).

let's say my (dumb statistical) plan is to run one grand regression,
where the individual rate of return is y and the market rate of return
is x.  the following should work without a problem:

combined<- merge( main, aggregate.data, by="day", all.x=TRUE, all.y=FALSE )
lm( stockreturn ~ marketreturn, data=combined )

alas, the merge is neither space-efficient nor fast.  in fact, I run
out of memory on my 16GB linux machine.  my guess is that by whittling
it down, I could work it (perhaps doing it in chunks, and then
rbinding it), but this is painful.

in perl, I would define a hash with the day as key and the market
return as value, and then loop over the main data set to supplement
it.

is there a recommended way of doing such tasks in R, either super-fast
(so that I merge many many times) or space efficient (so that I merge
once and store the results)?

sincerely,

/iaw

----
Ivo Welch (ivo.we...@gmail.com)

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to