Re: [R] count how many row i have in a txt file in a directory

Hans Ekbrand Sun, 26 Feb 2012 07:56:30 -0800

On Sun, Feb 26, 2012 at 03:03:58PM +0100, gianni lavaredo wrote:
> Dear Researchers,
> 
> I have a large TXT (X,Y,MyValue) file in a directory and I wish to import
> row by row the txt in a loop to save only the data they are inside a buffer
> (using inside.owin of spatstat) and delete the rest. The first step before
> to create a loop row-by-row is to know how many rows there are in the txt
> file without load in R to save memory problem.
> 
> some people know the specific function?


If the number of rows are many that even only three variables per row
will cause memory problems, then looping the file row-by-row will take
a very long time.

I would - instead of looping row-by-row - split the text file into
chunks small enough for a chunk to be read into R, and operated on
within R, without memory problems.

I create a test file of 10.000.000 rows

my.words <- replicate(10000, paste(LETTERS[sample.int(28, 10)], sep = "", 
collapse = ""))
my.df <- data.frame(x=rnorm(10000000), y=rnorm(10000000), my.val=rep(my.words, 
1000))
write.csv(my.df, file = "testmem.csv")

Split the file into smaller chunks, say 1.000.000 rows. I use the
split command in GNU coreutils,

$ split -l 1000000 testmem.csv

Loop through the cunks.

for(file.name in c("xaa", "xab" ...){
  chunk <- read.csv(file = file.name)
  [ match and add all the interesting rows to an object ]
}

Here's an example that for each chunk prints its third row.

for(file.name in c("xaa", "xab")){
  chunk <- read.csv(file = file.name)
  print(chunk[3,])
}

With a chunk of 1.000.000 rows, R needed about 250 MB RAM to process this loop.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] count how many row i have in a txt file in a directory

Reply via email to