If you are iterating through the dataset 4000 numbers/lines at a time, you
can do this by opening the dataset as a connection and then
reading/processing 4000 lines at a time

myFile <- open('dataset', 'r')
while (TRUE){
    input <- scan(myFile, what=0, n=4000)
    if (length(input) == 0) break
    .....process the data....
}

But if you want to randomly select which lines, then some type of database
is better.


On 3/12/07, Thaden, John J <[EMAIL PROTECTED]> wrote:
>
> Feng,
>   I had the same question as you, how to read a subset of data, and the
> same
> reaction as Wensui when I discovered that read.table could not.  Even if
> my
> computer's memory were up to it, I am troubled by the idea of reading in
> 1.8
> GB of data (in my case) to get just 4,000 numbers, for instance,
> particularly
> if I'm then going to iterating through the entire dataset in 4,000-number
> chunks.
>   I ended up defining a NetCDF format to hold my data using the RNetCDF
> package, since that package's var.get.nc() function is perfectly able to
> read
> subsets of a NetCDF variable.  Furthermore, NetCDF files allow data to be
> matrices and even higher order arrays, from which you can then retrieve
> any
> chunk by including var.get.nc 'start' and 'count' arguments in the form of
> vectors of length equal to the number of array dimensions.  Once a NetCDF
> format is defined, all else is painless.  One limitation is that the
> RNetCDF
> package only supports version 3 of the NetCDF library, a version that puts
> a
> 2 GB limit on a variable's size.  Version 4 removes this limitation; I'm
> hopeful some day that an R package will be an interface to the NetCDF
> version
> 4 library.
> John Thaden
>
> Message: 22
> Date: Sun, 11 Mar 2007 21:33:04 -0500
> From: "jim holtman" <[EMAIL PROTECTED]>
> Subject: Re: [R] read.table for a subset of data
> To: "Wensui Liu" <[EMAIL PROTECTED]>
> Cc: r-help <r-help@stat.math.ethz.ch>
> Message-ID:
>        <[EMAIL PROTECTED]>
> Content-Type: text/plain
>
> If you know what 10 rows to read, then you can 'skip' to them, but it the
> system still has to read each line at a time.
>
> I have a 200,000 line csv file of numerics that takes me 4 seconds to read
> in with 'read.csv' using 'colClasses', so I would guess your 100K line
> file
> would take half of that.  Is 2 seconds of time a waste of resources?
>
>
> On 3/11/07, Wensui Liu <[EMAIL PROTECTED]> wrote:
> >
> > Jim,
> >
> > Glad to see your reply.
> >
> > Refering to your email, what if I just want to read 10 rows from a csv
> > table with 100000 rows? Do you think it a waste of resource to read
> > the whole table in?
> > Anything thought?
> >
> > wensui
> >
> > On 3/11/07, jim holtman <[EMAIL PROTECTED]> wrote:
> > > Why cann't you read in the whole data set and then create the
> > subsets?  This
> > > is easily done with 'split'.  If the data is too large, then consider
> a
> > data
> > > base.
> > >
> > > On 3/11/07, gnv shqp <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Hi R-experts,
> > > >
> > > > I have data from four conditions of an experiment.  I tried to
> create
> > four
> > > > subsets of the data with read.table, for example,
> > > > read.table("Experiment.csv",subset=(condition=="1"))
> > > > .  I found a similar post in the archive, but the answer to that
> post
> > was
> > > > no.   Any  new ideas about  reading subsets of data with read.table?
> > > >
> > > > Thanks!
> > > >
> > > > Feng
> > > >
> > > >        [[alternative HTML version deleted]]
> > > >
>
> Confidentiality Notice: This e-mail message, including any attachments, is
> for the sole use of the intended recipient(s) and may contain confidential
> and privileged information.  Any unauthorized review, use, disclosure or
> distribution is prohibited.  If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of the original
> message.
>
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to