If you are iterating through the dataset 4000 numbers/lines at a time, you can do this by opening the dataset as a connection and then reading/processing 4000 lines at a time
myFile <- open('dataset', 'r') while (TRUE){ input <- scan(myFile, what=0, n=4000) if (length(input) == 0) break .....process the data.... } But if you want to randomly select which lines, then some type of database is better. On 3/12/07, Thaden, John J <[EMAIL PROTECTED]> wrote: > > Feng, > I had the same question as you, how to read a subset of data, and the > same > reaction as Wensui when I discovered that read.table could not. Even if > my > computer's memory were up to it, I am troubled by the idea of reading in > 1.8 > GB of data (in my case) to get just 4,000 numbers, for instance, > particularly > if I'm then going to iterating through the entire dataset in 4,000-number > chunks. > I ended up defining a NetCDF format to hold my data using the RNetCDF > package, since that package's var.get.nc() function is perfectly able to > read > subsets of a NetCDF variable. Furthermore, NetCDF files allow data to be > matrices and even higher order arrays, from which you can then retrieve > any > chunk by including var.get.nc 'start' and 'count' arguments in the form of > vectors of length equal to the number of array dimensions. Once a NetCDF > format is defined, all else is painless. One limitation is that the > RNetCDF > package only supports version 3 of the NetCDF library, a version that puts > a > 2 GB limit on a variable's size. Version 4 removes this limitation; I'm > hopeful some day that an R package will be an interface to the NetCDF > version > 4 library. > John Thaden > > Message: 22 > Date: Sun, 11 Mar 2007 21:33:04 -0500 > From: "jim holtman" <[EMAIL PROTECTED]> > Subject: Re: [R] read.table for a subset of data > To: "Wensui Liu" <[EMAIL PROTECTED]> > Cc: r-help <r-help@stat.math.ethz.ch> > Message-ID: > <[EMAIL PROTECTED]> > Content-Type: text/plain > > If you know what 10 rows to read, then you can 'skip' to them, but it the > system still has to read each line at a time. > > I have a 200,000 line csv file of numerics that takes me 4 seconds to read > in with 'read.csv' using 'colClasses', so I would guess your 100K line > file > would take half of that. Is 2 seconds of time a waste of resources? > > > On 3/11/07, Wensui Liu <[EMAIL PROTECTED]> wrote: > > > > Jim, > > > > Glad to see your reply. > > > > Refering to your email, what if I just want to read 10 rows from a csv > > table with 100000 rows? Do you think it a waste of resource to read > > the whole table in? > > Anything thought? > > > > wensui > > > > On 3/11/07, jim holtman <[EMAIL PROTECTED]> wrote: > > > Why cann't you read in the whole data set and then create the > > subsets? This > > > is easily done with 'split'. If the data is too large, then consider > a > > data > > > base. > > > > > > On 3/11/07, gnv shqp <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi R-experts, > > > > > > > > I have data from four conditions of an experiment. I tried to > create > > four > > > > subsets of the data with read.table, for example, > > > > read.table("Experiment.csv",subset=(condition=="1")) > > > > . I found a similar post in the archive, but the answer to that > post > > was > > > > no. Any new ideas about reading subsets of data with read.table? > > > > > > > > Thanks! > > > > > > > > Feng > > > > > > > > [[alternative HTML version deleted]] > > > > > > Confidentiality Notice: This e-mail message, including any attachments, is > for the sole use of the intended recipient(s) and may contain confidential > and privileged information. Any unauthorized review, use, disclosure or > distribution is prohibited. If you are not the intended recipient, please > contact the sender by reply e-mail and destroy all copies of the original > message. > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.