Re: [R] How to read in this data format?

jim holtman Mon, 05 Mar 2007 08:30:05 -0800

If you want to process 'n' lines from the file, then just setup the file as
a connection and read the desired length in a loop like below:


f.1 <- file('/tempxx.txt', 'r')
nlines <- 0
# read 1000 lines at a time
while (TRUE){
    lines <- readLines(f.1, n=1000)
    if (length(lines) == 0) break  # quit then no lines are read
    # processing
    nlines <- nlines + length(lines)
}
cat (nlines, "lines read\n")



On 3/5/07, Bart Joosen <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Although the solution worked, I'v got some troubles with some data files.
> These datafiles are very large (600-700 MB), so my computer starts
> swapping.
>
> If I use the code, written below, I get:
> Error in .Call("R_lazyLoadDBfetch", key, file, compressed, hook, PACKAGE =
> "base") :
>        recursive default argument reference
> After about 15 minutes of loading the data with the  Lines. <-
> readLines("myfile.dat") command.
>
> When I look in the help for readLines, I saw that there is a n to setup a
> maximum number, but is there a way to set a starting row number? If I can
> split up my datafiles in 4-8 small datasets, it's ok for me. But I
> couldn't
> figure it out.
>
>
> Thanks
>
> Bart
>
>
>
>
> >From: "Gabor Grothendieck" <[EMAIL PROTECTED]>
> >To: "Bart Joosen" <[EMAIL PROTECTED]>
> >CC: [email protected]
> >Subject: Re: [R] How to read in this data format?
> >Date: Thu, 1 Mar 2007 16:46:21 -0500
> >
> >On 3/1/07, Bart Joosen <[EMAIL PROTECTED]> wrote:
> >>Dear All,
> >>
> >>thanks for the replies, Jim Holtman has given a solution which fits my
> >>needs, but Gabor Grothendieck did the same thing,
> >>but it looks like the coding will allow faster processing (should check
> >>this
> >>out tomorrow on a big datafile).
> >>
> >>@gabor: I don't understand the use of the grep command:
> >>        grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE)
> >>What is this expression  ("^[1-9][0-9. ]*$|Time") actually doing?
> >>I looked in the help page, but couldn't find a suitable answer.
> >
> >I briefly discussed it in the first paragraph of my response.  It
> >matches and returns only those lines that start (^ matches start of line)
> >with a digit, i.e. [1-9], and contains only digits, dots and spaces,
> >i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means
> >or) contains the word Time.
> >If you don't have lines like ... (which you did in your example) then
> >the regexp
> >could be simplified to "^[0-9. ]+$|Time".  You may need to match tabs too
> >if your input contains those.
> >
> >>
> >>
> >>Thanks to All
> >>
> >>
> >>Bart
> >>
> >>----- Original Message -----
> >>From: "Gabor Grothendieck" <[EMAIL PROTECTED]>
> >>To: "Bart Joosen" <[EMAIL PROTECTED]>
> >>Cc: <[email protected]>
> >>Sent: Thursday, March 01, 2007 6:35 PM
> >>Subject: Re: [R] How to read in this data format?
> >>
> >>
> >> > Read in the data using readLines, extract out
> >> > all desired lines (namely those containing only
> >> > numbers, dots and spaces or those with the
> >> > word Time) and remove Retention from all
> >> > lines so that all remaining lines have two
> >> > fields.  Now that we have desired lines
> >> > and all lines have two fields read them in
> >> > using read.table.
> >> >
> >> > Finally, split them into groups and restructure
> >> > them using "by" and in the last line we
> >> > convert the "by" output to a data frame.
> >> >
> >> > At the end we display an alternate function f
> >> > for use with by should we wish to generate long
> >> > rather than wide output (using the terminology
> >> > of the reshape command).
> >> >
> >> >
> >> > Lines <- "$$ Experiment Number:
> >> > $$ Associated Data:
> >> >
> >> > FUNCTION 1
> >> >
> >> > Scan            1
> >> > Retention Time  0.017
> >> >
> >> > 399.8112        184
> >> > 399.8742        0
> >> > 399.9372        152
> >> > ....
> >> >
> >> > Scan            2
> >> > Retention Time  0.021
> >> >
> >> > 399.8112        181
> >> > 399.8742        1
> >> > 399.9372        153
> >> > "
> >> >
> >> > # replace next line with: Lines. <- readLines("myfile.dat")
> >> > Lines. <- readLines(textConnection(Lines))
> >> > Lines. <- grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE)
> >> > Lines. <- gsub("Retention", "", Lines.)
> >> >
> >> > DF <- read.table(textConnection(Lines.), as.is = TRUE)
> >> > closeAllConnections()
> >> >
> >> > f <- function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
> >> > out.by <- by(DF, cumsum(DF[,1] == "Time"), f)
> >> > as.data.frame(do.call("rbind", out.by))
> >> >
> >> >
> >> > We could alternately consider producing long
> >> > format by replacing the function f with:
> >> >
> >> > f <- function(x) data.frame(x[-1,], id = x[1,2])
> >> >
> >> >
> >> > On 3/1/07, Bart Joosen <[EMAIL PROTECTED]> wrote:
> >> >> Hi,
> >> >>
> >> >> I recieved an ascii file, containing following information:
> >> >>
> >> >> $$ Experiment Number:
> >> >> $$ Associated Data:
> >> >>
> >> >> FUNCTION 1
> >> >>
> >> >> Scan            1
> >> >> Retention Time  0.017
> >> >>
> >> >> 399.8112        184
> >> >> 399.8742        0
> >> >> 399.9372        152
> >> >> ....
> >> >>
> >> >> Scan            2
> >> >> Retention Time  0.021
> >> >>
> >> >> 399.8112        181
> >> >> 399.8742        1
> >> >> 399.9372        153
> >> >> .....
> >> >>
> >> >>
> >> >> I would like to import this data in R into a dataframe, where there
> is
> >>a
> >> >> column time, the first numbers as column names, and the second
> numbers
> >>as
> >> >> data in the dataframe:
> >> >>
> >> >> Time    399.8112        399.8742        399.9372
> >> >> 0.017   184     0       152
> >> >> 0.021   181     1       153
> >> >>
> >> >> I did take a look at the read.table, read.delim, scan, ... But I 've
> >>no
> >> >> idea
> >> >> about how to solve this problem.
> >> >>
> >> >> Anyone?
> >> >>
> >> >>
> >> >> Thanks
> >> >>
> >> >> Bart
> >> >>
> >> >> ______________________________________________
> >> >> [email protected] mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >
> >>
> >>
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to read in this data format?

Reply via email to