Re: [R] Suggestion for big files [was: Re: A comment about R:]

jim holtman Thu, 05 Jan 2006 09:02:55 -0800

If what you are reading in is numeric data, then it would require (807 *
118519 * 8) 760MB just to store a single copy of the object -- more memory
than you have on your computer.  If you were reading it in, then the problem
is the paging that was occurring.


You have to look at storing this in a database and working on a subset of
the data.  Do you really need to have all 807 variables in memory at the
same time?

If you use 'scan', you could specify that you do not want some of the
variables read in so it might make a more reasonably sized objects.


On 1/5/06, François Pinard <[EMAIL PROTECTED]> wrote:
>
> [ronggui]
>
> >R's week when handling large data file.  I has a data file : 807 vars,
> >118519 obs.and its CVS format.  Stata can read it in in 2 minus,but In
> >my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M.
>
> Just (another) thought.  I used to use SPSS, many, many years ago, on
> CDC machines, where the CPU had limited memory and no kind of paging
> architecture.  Files did not need to be very large for being too large.
>
> SPSS had a feature that was then useful, about the capability of
> sampling a big dataset directly at file read time, quite before
> processing starts.  Maybe something similar could help in R (that is,
> instead of reading the whole data in memory, _then_ sampling it.)
>
> One can read records from a file, up to a preset amount of them.  If the
> file happens to contain more records than that preset number (the number
> of records in the whole file is not known beforehand), already read
> records may be dropped at random and replaced by other records coming
> from the file being read.  If the random selection algorithm is properly
> chosen, it can be made so that all records in the original file have
> equal probability of being kept in the final subset.
>
> If such a sampling facility was built right within usual R reading
> routines (triggered by an extra argument, say), it could offer
> a compromise for processing large files, and also sometimes accelerate
> computations for big problems, even when memory is not at stake.
>
> --
> François Pinard   http://pinard.progiciels-bpi.ca
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



--
Jim Holtman
Cincinnati, OH
+1 513 247 0281

What the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Suggestion for big files [was: Re: A comment about R:]

Reply via email to