OK, I'm a conehead. There's no memory.limit() on my LINUX setup; neither is there a --max-memory-size option.
Sorry for any false trails. -jason >From: "Jason Barnhart" <[EMAIL PROTECTED]> >To: "Robert Citek" <[EMAIL PROTECTED]>, ><r-help@stat.math.ethz.ch> >Subject: Re: [R] large data set, error: cannot allocate vector >Date: Tue, 9 May 2006 14:32:45 -0700 > >Robert, > >Thanks, I stand corrected on the RAM issue re: 32 vs. 64 bit builds. > >As for the --max-memory-size option, I'll try to check my LINUX version at >home tonight. > >-jason > >----- Original Message ----- >From: "Robert Citek" <[EMAIL PROTECTED]> >To: <r-help@stat.math.ethz.ch> >Cc: "Jason Barnhart" <[EMAIL PROTECTED]> >Sent: Tuesday, May 09, 2006 1:27 PM >Subject: Re: [R] large data set, error: cannot allocate vector > > > > > > On May 9, 2006, at 1:32 PM, Jason Barnhart wrote: > > > >> 1) So the original problem remains unsolved? > > > > The question was answered but the problem remains unsolved. The >question > > was, why am I getting an error "cannot allocate vector" when reading in >a > > 100 MM integer list. The answer appears to be: > > > > 1) R loads the entire data set into RAM > > 2) on a 32-bit system R max'es out at 3 GB > > 3) loading 100 MM integer entries into a data.frame requires more than >3 > > GB of RAM (5-10 GB based on projections from 10 MM entries) > > > > So, the new question is, how does one work around such limits? > > > >> You can load data but lack memory to do more (or so it appears). It > >> seems to me that your options are: > >> a) ensure that the --max-mem-size option is allowing R to utilize >all > >> available RAM > > > > --max-mem-size doesn't exist in my version: > > > > $ R --max-mem-size > > WARNING: unknown option '--max-mem-size' > > > > Do different versions of R on different OSes and different platforms >have > > different options? > > > > FWIW, here's the usage statement from ?mem.limits: > > > > R --min-vsize=vl --max-vsize=vu --min-nsize=nl --max-nsize=nu --max- > > ppsize=N > > > >> b) sample if possible, i.e. are 20MM necessary > > > > Yes, or within a factor of 4 of that. > > > >> c) load in matrices or vectors, then "process" or analyze > > > > Yes, I just need to learn more of the R language to do what I want. > > > >> d) load data in database that R connects to, use that engine for > >> processing > > > > I have a gut feeling something like this is the way to go. > > > >> e) drop unnecessary columns from data.frame > > > > Yes. Currently, one of the fields is an identifier field which is a >long > > text field (30+ chars). That should probably be converted to an >integer > > to conserve on both time and space. > > > >> f) analyze subsets of the data (variable-wise--review fewer vars at >a > >> time) > > > > Possibly. > > > >> g) buy more RAM (32 vs 64 bit architecture should not be the issue, > >> since you use LINUX) > > > > 32-bit seems to be the limit. We've got 6 GB of RAM and 8 GB of swap. > > Despite that R chokes well before those limits are reached. > > > >> h) ??? > > > > Yes, possibly some other solution we haven't considered. > > > >> 2) Not finding memory.limit() is very odd. You should consider > >> reviewing the bug reporting process to determine if this should be > >> reported. Here's an example of my output. > >> > memory.limit() > >> [1] 1782579200 > > > > Do different versions of R on different OSes and different platforms >have > > different functions? > > > >> 3) This may not be the correct way to look at the timing differences >you > >> experienced. However, it seems R is holding up well. > >> > >> 10MM 100MM ratio-100MM/10MM > >> cat 0.04 7.60 190.00 > >> scan 9.93 92.27 9.29 > >> ratio scan/cat 248.25 12.14 > > > > I re-ran the timing test for the 100 MM file taking caching into >account. > > Linux with 6 GB has no problem caching the 100 MM file (600 MB): > > > > 10MM 100MM ratio-100MM/10MM > > cat 0.04 0.38 9.50 > > scan 9.93 92.27 9.29 > > ratio scan/cat 248.25 242.82 > > > >> Please let me know how you resolve. I'm curious about your solution > >> HTH, > > > > Indeed, very helpful. I'm learning more about R every day. Thanks for > > your feedback. > > > > Regards, > > - Robert > > http://www.cwelug.org/downloads > > Help others get OpenSource software. Distribute FLOSS > > for Windows, Linux, *BSD, and MacOS X with BitTorrent > > > > > >______________________________________________ >R-help@stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! >http://www.R-project.org/posting-guide.html ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html