The 'normal' way of doing that with ff is to first convert your csv file completely to a ffdf object (which stores its data on disk so shouldn't give any memory problems). You can then use the chunk routine (see ?chunk) to divide your data in the required chunks.

Untested so may contain errors:

ffdf <- read.table.ffdf(...)

chnks <- chunk(from=1, to=nrow(yourffdf), by=5E6, method='seq')

for (chnk in chnks) {
  # read data
  data <- ffdf[chnk, ]
  # do your thing with the data
  # clean up
  rm(data)
  gc()
}


If you want to process your csv file directly in chunks, you could also have a look at the LaF package. Especially the process_blocks routine which does exactly that. The manual vignette (http://cran.r-project.org/web/packages/LaF/vignettes/LaF-manual.pdf)
contains some examples how to do that.

Jan



Quoting Mav <mastorvar...@gmail.com>:

Thank you Jan

My problem is the following:
For instance, I have 2 files with different number of rows (15 million and 8
million of rows each).
I would like to read the first one in chunks of 5 million each. However
between the first and second chunk, I would like to analyze those first 5
million of rows, write the analysis in a new csv and then proceed to read
and analyze the second chunk and so on until the third chunk. With the
second file, I would like to do the same...read the first chunk, analyze it
and continue to read the second and analyze it.

Basically my problem is that I manage to read the files....but with so many
rows...I cannot do any analyses (even filtering the rows) because of the RAM
restrictions.

Sorry if is still not clear.

Thank you

--
View this message in context: http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4503642.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to