Re: [R] RODBC sqlQuery insert slow
On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote: Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() I sure will try to help you out here. I've been working with RODBC. I think what slows you down here is your loop with multiple paste commands. Have you considered the sqlSave() function with the append=T argument? I think you could replace your loop with: dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12]) sqlSave(channel,dat,logger,append=T) Of course, I haven't tested this so you may need some minor adjustments, but I think this will greatly speed up your insert job. Regards, Jerome -- Jerome Asselin, M.Sc., Agent de recherche, RHCE CHUM -- Centre de recherche 3875 rue St-Urbain, 3e etage // Montreal QC H2W 1V1 Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlQuery insert slow
Large for loops are slow. Try to avoid them using apply, sapply, etc. I've made the paste statements a lot shorter by using collapse. See ?paste for more info. Append.SQL - function(x, channel){ sql=INSERT INTO logger (time, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10) VALUES(d1=strptime(x[2],%d/%m/%y %H:%M:%S %p ', d1, ' ,, paste(x[3:12], collapse = , ), ) ) sqlQuery(channel, sql) } ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) apply(ti, 2, Append.SQL, channel = channel) nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be Do not put your faith in what statistics say until you have carefully considered what they do not say. ~William W. Watt A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. ~M.J.Moroney -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Bill Szkotnicki Verzonden: vrijdag 13 oktober 2006 15:09 Aan: [EMAIL PROTECTED] Onderwerp: [R] RODBC sqlQuery insert slow Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlQuery insert slow
Is there a reason why the data have to be inserted 1 row at a time? Is it possible to insert the entire table at once? sqlSave perhaps. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki Sent: Friday, October 13, 2006 9:09 AM To: [EMAIL PROTECTED] Subject: [R] RODBC sqlQuery insert slow Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail message is intended only for the named recipient(s) above. It may contain confidential information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlQuery insert slow
I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? A few weeks ago I had to solve a similar task. Inserting each row turned out to be horrible slow due to paste() and the data.frame-indexing. The estimated runtime would have been over 3 weeks, so I used MySQLs LOAD DATE INFILE syntax to speed things up. You must have FILE_PRIV = 'Y' set in the mysql.user-table to use this small hack, and I'm not that sure that this runs remotely. It is also assumed that your df has valid column-names. tmp_filename - tempfile() write.table(df, tmp_filename, na = \\N, row.names = FALSE, col.names = FALSE, quote = FALSE, sep = \t) query - paste( LOAD DATA LOCAL INFILE ', tmp_filename, ', INTO TABLE , your_table, (, toString(names(df)), );, sep = ) sqlQuery(channel, query) unlink(tmp_filename) The total runtime for the LOAD DATA INFILE querys was something below 5 minutes, inserting 3e+6 rows with 200 columns. Michel Lang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlQuery insert slow
Thanks for the help ... the sqlSave() function was the solution. The lesson, which has been stated many times before, is to avoid loops wherever possible! Bill # fast RODBC inserting dat - cbind(as.character(strptime(ti[,2],%d/%m/%y %H:%M:%S %p)),ti[,3:12]) # you need the as.character to make sure the time is stored correctly in mysql names(dat)=c(time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) sqlSave(channel,dat,logger,rownames=F,append=T) # very fast. # Jerome Asselin wrote: On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote: Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() I sure will try to help you out here. I've been working with RODBC. I think what slows you down here is your loop with multiple paste commands. Have you considered the sqlSave() function with the append=T argument? I think you could replace your loop with: dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12]) sqlSave(channel,dat,logger,append=T) Of course, I haven't tested this so you may need some minor adjustments, but I think this will greatly speed up your insert job. Regards, Jerome __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.