On Mon, Mar 22, 2010 at 6:37 AM, <jan.su...@biokapital.no> wrote: > Hi > I have a problem in R that I have been trying to solve but without > success. > I am trying to join two tables on two variables : an ID and a date > (optional) that will be common between the two tables > > In SQL (and SAS PROC SQL) I am a frequent user of the "select" command and > I am used to the following nomenclature : > > select a.*, b.c, b.y, b.z from table1 a, table2 b where a.date=b.date and > a.id=b.id > > I tried this in R (using sqldf) but it takes waaaay too long to get the > result. My data sets are >1 gb each.
Try adding indexes to the join columns. I was able to do a join between two 1 million row tables in 13 seconds (under a minute including the time to add the indexes) on a laptop (not a particularly fast machine). > set.seed(1) > n <- 1000000 > DF1 <- data.frame(a = sample(n, n), b1 = runif(n)) > DF2 <- data.frame(a = sample(n, n), b2 = runif(n)) > library(sqldf) > > sqldf() <SQLiteConnection:(2708,1)> > system.time(sqldf("create index ai1 on DF1(a)")) user system elapsed 16.63 0.09 16.85 > system.time(sqldf("create index ai2 on DF2(a)")) user system elapsed 16.06 0.04 16.30 > system.time(sqldf("select * from main.DF1 natural join main.DF2")) user system elapsed 12.72 0.21 12.93 > sqldf() > Is there any way this can be done by merge() or any other more "R-like" > way ? > I have tried making a compound variable with paste(id, as.character(date), > sep="") and merge() on that but since the date more often than not is NA > this doesnt seem to work. I'm a bit stuck and the documentation is of > little help to me in this case. > If anybody would be kind to point me in the right direction I would be > grateful. > > Jan > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.