On Sun, Feb 26, 2012 at 03:03:58PM +0100, gianni lavaredo wrote:
> Dear Researchers,
>
> I have a large TXT (X,Y,MyValue) file in a directory and I wish to import
> row by row the txt in a loop to save only the data they are inside a buffer
> (using inside.owin of spatstat) and delete the rest. The first step before
> to create a loop row-by-row is to know how many rows there are in the txt
> file without load in R to save memory problem.
>
> some people know the specific function?
If the number of rows are many that even only three variables per row
will cause memory problems, then looping the file row-by-row will take
a very long time.
I would - instead of looping row-by-row - split the text file into
chunks small enough for a chunk to be read into R, and operated on
within R, without memory problems.
I create a test file of 10.000.000 rows
my.words <- replicate(10000, paste(LETTERS[sample.int(28, 10)], sep = "",
collapse = ""))
my.df <- data.frame(x=rnorm(10000000), y=rnorm(10000000), my.val=rep(my.words,
1000))
write.csv(my.df, file = "testmem.csv")
Split the file into smaller chunks, say 1.000.000 rows. I use the
split command in GNU coreutils,
$ split -l 1000000 testmem.csv
Loop through the cunks.
for(file.name in c("xaa", "xab" ...){
chunk <- read.csv(file = file.name)
[ match and add all the interesting rows to an object ]
}
Here's an example that for each chunk prints its third row.
for(file.name in c("xaa", "xab")){
chunk <- read.csv(file = file.name)
print(chunk[3,])
}
With a chunk of 1.000.000 rows, R needed about 250 MB RAM to process this loop.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.