Rongui, I'm not familiar with SQLite, but using MySQL would solve your problem.
MySQL has a "LOAD DATA INFILE" statement that loads text/csv files rapidly. In R, assuming a test table exists in MySQL (blank table is fine), something like this would load the data directly in MySQL. library(DBI) library(RMySQL) dbSendQuery(mycon,"LOAD DATA INFILE 'C:/textfile.csv' INTO TABLE test3 FIELDS TERMINATED BY ',' ") #for csv files Then a normal SQL query would allow you to work with a manageable size of data. >From: bogdan romocea <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >CC: r-help <[email protected]> >Subject: Re: [R] Suggestion for big files [was: Re: A comment about R:] >Date: Thu, 5 Jan 2006 15:26:51 -0500 > >ronggui wrote: > > If i am familiar with > > database software, using database (and R) is the best choice,but > > convert the file into database format is not an easy job for me. > >Good working knowledge of a DBMS is almost invaluable when it comes to >working with very large data sets. In addition, learning SQL is piece >of cake compared to learning R. On top of that, knowledge of another >(SQL) scripting language is not needed (except perhaps for special >tasks): you can easily use R to generate the SQL syntax to import and >work with arbitrarily wide tables. (I'm not familiar with SQLite, but >MySQL comes with a command line tool that can run syntax files.) >Better start learning SQL today. > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of ronggui > > Sent: Thursday, January 05, 2006 12:48 PM > > To: jim holtman > > Cc: [email protected] > > Subject: Re: [R] Suggestion for big files [was: Re: A comment > > about R:] > > > > > > 2006/1/6, jim holtman <[EMAIL PROTECTED]>: > > > If what you are reading in is numeric data, then it would > > require (807 * > > > 118519 * 8) 760MB just to store a single copy of the object > > -- more memory > > > than you have on your computer. If you were reading it in, > > then the problem > > > is the paging that was occurring. > > In fact,If I read it in 3 pieces, each is about 170M. > > > > > > > > You have to look at storing this in a database and working > > on a subset of > > > the data. Do you really need to have all 807 variables in > > memory at the > > > same time? > > > > Yip,I don't need all the variables.But I don't know how to get the > > necessary variables into R. > > > > At last I read the data in piece and use RSQLite package to write it > > to a database.and do then do the analysis. If i am familiar with > > database software, using database (and R) is the best choice,but > > convert the file into database format is not an easy job for me.I ask > > for help in SQLite list,but the solution is not satisfying as that > > required the knowledge about the third script language.After searching > > the internet,I get this solution: > > > > #begin > > rm(list=ls()) > > f<-file("D:\wvsevs_sb_v4.csv","r") > > i <- 0 > > done <- FALSE > > library(RSQLite) > > con<-dbConnect("SQLite","c:\sqlite\database.db3") > > tim1<-Sys.time() > > > > while(!done){ > > i<-i+1 > > tt<-readLines(f,2500) > > if (length(tt)<2500) done <- TRUE > > tt<-textConnection(tt) > > if (i==1) { > > assign("dat",read.table(tt,head=T,sep=",",quote="")); > > } > > else assign("dat",read.table(tt,head=F,sep=",",quote="")) > > close(tt) > > ifelse(dbExistsTable(con, "wvs"),dbWriteTable(con,"wvs",dat,append=T), > > dbWriteTable(con,"wvs",dat) ) > > } > > close(f) > > #end > > It's not the best solution,but it works. > > > > > > > > > If you use 'scan', you could specify that you do not want > > some of the > > > variables read in so it might make a more reasonably sized objects. > > > > > > > > > On 1/5/06, François Pinard <[EMAIL PROTECTED]> wrote: > > > > [ronggui] > > > > > > > > >R's week when handling large data file. I has a data > > file : 807 vars, > > > > >118519 obs.and its CVS format. Stata can read it in in > > 2 minus,but In > > > > >my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M. > > > > > > > > Just (another) thought. I used to use SPSS, many, many > > years ago, on > > > > CDC machines, where the CPU had limited memory and no > > kind of paging > > > > architecture. Files did not need to be very large for > > being too large. > > > > > > > > SPSS had a feature that was then useful, about the capability of > > > > sampling a big dataset directly at file read time, quite before > > > > processing starts. Maybe something similar could help in > > R (that is, > > > > instead of reading the whole data in memory, _then_ sampling it.) > > > > > > > > One can read records from a file, up to a preset amount > > of them. If the > > > > file happens to contain more records than that preset > > number (the number > > > > of records in the whole file is not known beforehand), > > already read > > > > records may be dropped at random and replaced by other > > records coming > > > > from the file being read. If the random selection > > algorithm is properly > > > > chosen, it can be made so that all records in the > > original file have > > > > equal probability of being kept in the final subset. > > > > > > > > If such a sampling facility was built right within usual R reading > > > > routines (triggered by an extra argument, say), it could offer > > > > a compromise for processing large files, and also > > sometimes accelerate > > > > computations for big problems, even when memory is not at stake. > > > > > > > > -- > > > > François Pinard http://pinard.progiciels-bpi.ca > > > > > > > > ______________________________________________ > > > > [email protected] mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > > > > > -- > > > Jim Holtman > > > Cincinnati, OH > > > +1 513 247 0281 > > > > > > What the problem you are trying to solve? > > > > > > -- > > é»è£è´µ > > Deparment of Sociology > > Fudan University > > > > ______________________________________________ > > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > >______________________________________________ >[email protected] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! >http://www.R-project.org/posting-guide.html ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
