One obvious alternative is an SQL join, which you could do directly in a DBMS, or from R via RMySQL / RSQLite /... Keep in mind that creating indexes on user/userid before the join may save a lot of time.
> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Adam > D. I. Kramer > Sent: Thursday, September 07, 2006 2:46 PM > To: Prof Brian Ripley > Cc: r-help@stat.math.ethz.ch > Subject: Re: [R] Alternatives to merge for large data sets? > > > On Thu, 7 Sep 2006, Prof Brian Ripley wrote: > > > Which version of R? > > Previously, 2.3.1. > > > Please try 2.4.0 alpha, as it has a different and more efficient > > algorithm for the case of 1-1 matches. > > I downloaded and installed R-latest, but got the same error message: > > Error: cannot allocate vector of size 7301 Kb > > ...though at least the too-big size was larger this time. > > My data set is not exactly 1-1; every item in "prof" may have > one or more > matches in "pubbounds," though every item in "pubbounds" > corrosponds only to > one "prof." > > --Adam > > > > > On Wed, 6 Sep 2006, Adam D. I. Kramer wrote: > > > >> Hello, > >> > >> I am trying to merge two very large data sets, via > >> > >> pubbounds.prof <- > >> > merge(x=pubbounds,y=prof,by.x="user",by.y="userid",all=TRUE,so > rt=FALSE) > >> > >> which gives me an error of > >> > >> Error: cannot allocate vector of size 2962 Kb > >> > >> I am reasonably sure that this is correct syntax. > >> > >> The trouble is that pubbounds and prof are large; they are > data frames which > >> take up 70M and 11M respectively when saved as .Rdata files. > >> > >> I understand from various archive searches that "merge > can't handle that," > >> because merge takes n^2 memory, which I do not have. > > > > Not really true (it has been changed since those days). Of > course, if you > > have multiple matches it must do so. > > > >> My question is whether there is an alternative to merge > which would carry > >> out the process in a slower, iterative manner...or if I > should just bite the > >> bullet, write.table, and use a perl script to do the job. > >> > >> Thankful as always, > >> Adam D. I. Kramer > > > > -- > > Brian D. Ripley, [EMAIL PROTECTED] > > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > > University of Oxford, Tel: +44 1865 272861 (self) > > 1 South Parks Road, +44 1865 272866 (PA) > > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.