On Mon, Nov 15, 2010 at 11:46 AM, Anthony Damico <ajdam...@gmail.com> wrote: > Hi Gabor, > > Thank you for the prompt reply. I definitely looked over all of the > examples on the code.google.com sqldf page before sending, which is why I > wrote the code > > read.csv.sql("ss09pusa.csv" , sql="create table ss09pusa as select * from > file" , dbname="sqlite") > > directly pulled from their code -- read.csv.sql("~/tmp.csv", sql = "create > table mytab as select * from file",dbname = "mydb") > > ..but I don't understand why this helps me around the memory problem, since > I think it still all gets read into memory. Is there a way to do this line > by line?
OK. Maybe its something else. The reading in of the file into the database should not be a resource problem provided you have enough disk space and appropriate permissions. sqldf / RSQLite are used to get sqlite to do it so that the data never goes through R at that stage so R limitations can't affect the reading in to the sqlite database. When you read it from the sqlite database then R limitations come into effect so you just have to be sure not to read too much in at a time. The use of create table ... as select ... is to prevent sqldf from deleting the table since sqldf is normally used in a fashion where you don't want to know about the back end databases so it tries to create them and delete them behind the scenes but here you want to explicitly use them so you have to work around that. Try this example. It should be reproducible so you just have to copy it and paste it into your R session. Uncomment the indicated line if you want to be able to remove any pre-existing mydb file in the current directory. Try it in a fresh R session just to be sure that nothing mucks it up. library(sqldf) # uncomment next line to make sure we are starting clean # if (file.exists("mydb")) file.remove("mydb") # create new database sqldf("attach 'mydb' as new") # create a new file. BOD is built into R and has 6 rows. write.table(BOD, file = "tmp.csv", quote = FALSE, sep = ",") # read new file into database read.csv.sql("tmp.csv", sql = "create table mytab as select * from file", dbname = "mydb") # how many records are in table? N <- sqldf("select count(*) from mytab", dbname = "mydb")[[1]] # read in chunks and display what we have read k <- 4 # no of records to read at once for(i in seq(0, N-1, k)) { s <- sprintf("select * from mytab limit %d, %d", i, k) print(sqldf(s, dbname = "mydb")) } On my machine I get this output: Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 Time demand 1 5 15.6 2 7 19.8 showing that it read the 6 line BOD data frame in chunks of 4 as required. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.