Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Jerome Asselin
On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote:
 Hello,
 I am trying to insert a lot of data into a table using windows R (2.3.1) 
 and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert statements 
 for each row and execute the insert query one row at a time. See the 
 loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?
 
 Thanks, Bill
 
 # R code
 ntry=dim(ti)[1]
 date()
 nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 for (i in 1:ntry) {
 sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
 d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
 sql=paste(sql,',d1,' )
 sql=paste(sql,,,ti[i,3] )
 sql=paste(sql,,,ti[i,4] )
 sql=paste(sql,,,ti[i,5] )
 sql=paste(sql,,,ti[i,6] )
 sql=paste(sql,,,ti[i,7] )
 sql=paste(sql,,,ti[i,8] )
 sql=paste(sql,,,ti[i,9] )
 sql=paste(sql,,,ti[i,10])
 sql=paste(sql,,,ti[i,11])
 sql=paste(sql,,,ti[i,12])
 sql=paste(sql,) )
 #print(sql)
 sqlQuery(channel, sql)
 }
 nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 nadded=nafter-nbefore;nadded
 date()

I sure will try to help you out here. I've been working with RODBC. I
think what slows you down here is your loop with multiple paste
commands.

Have you considered the sqlSave() function with the append=T argument? I
think you could replace your loop with:

dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12])
sqlSave(channel,dat,logger,append=T)

Of course, I haven't tested this so you may need some minor adjustments,
but I think this will greatly speed up your insert job.

Regards,
Jerome
-- 
Jerome Asselin, M.Sc., Agent de recherche, RHCE
CHUM -- Centre de recherche
3875 rue St-Urbain, 3e etage // Montreal QC  H2W 1V1
Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread ONKELINX, Thierry
Large for loops are slow. Try to avoid them using apply, sapply, etc.
I've made the paste statements a lot shorter by using collapse. See
?paste for more info.

Append.SQL - function(x, channel){
  sql=INSERT INTO logger (time, v1, v2, v3, v4, v5, v6, v7, v8, v9,
v10) VALUES(d1=strptime(x[2],%d/%m/%y %H:%M:%S %p ', d1, ' ,,
paste(x[3:12], collapse = , ), ) ) 
  sqlQuery(channel, sql)
}

ntry=dim(ti)[1]
date()
nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
apply(ti, 2, Append.SQL, channel = channel)
nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
nadded=nafter-nbefore;nadded
date()




ir. Thierry Onkelinx

Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature
and Forest

Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance

Gaverstraat 4

9500 Geraardsbergen

Belgium

tel. + 32 54/436 185

[EMAIL PROTECTED]

www.inbo.be 

 

Do not put your faith in what statistics say until you have carefully
considered what they do not say.  ~William W. Watt

A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney


-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens Bill Szkotnicki
Verzonden: vrijdag 13 oktober 2006 15:09
Aan: [EMAIL PROTECTED]
Onderwerp: [R] RODBC sqlQuery insert slow

Hello,
I am trying to insert a lot of data into a table using windows R (2.3.1)

and a mysql database via RODBC.
First I read a file with read.csv and then form sql insert statements 
for each row and execute the insert query one row at a time. See the 
loop below.
This turns out to be very slow.
Can anyone please suggest a way to speed it up?

Thanks, Bill

# R code
ntry=dim(ti)[1]
date()
nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
for (i in 1:ntry) {
sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
sql=paste(sql,',d1,' )
sql=paste(sql,,,ti[i,3] )
sql=paste(sql,,,ti[i,4] )
sql=paste(sql,,,ti[i,5] )
sql=paste(sql,,,ti[i,6] )
sql=paste(sql,,,ti[i,7] )
sql=paste(sql,,,ti[i,8] )
sql=paste(sql,,,ti[i,9] )
sql=paste(sql,,,ti[i,10])
sql=paste(sql,,,ti[i,11])
sql=paste(sql,,,ti[i,12])
sql=paste(sql,) )
#print(sql)
sqlQuery(channel, sql)
}
nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
nadded=nafter-nbefore;nadded
date()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Armstrong, Whit
Is there a reason why the data have to be inserted 1 row at a time?

Is it possible to insert the entire table at once?

sqlSave perhaps.



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki
 Sent: Friday, October 13, 2006 9:09 AM
 To: [EMAIL PROTECTED]
 Subject: [R] RODBC sqlQuery insert slow
 
 Hello,
 I am trying to insert a lot of data into a table using 
 windows R (2.3.1) and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert 
 statements for each row and execute the insert query one row 
 at a time. See the loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?
 
 Thanks, Bill
 
 # R code
 ntry=dim(ti)[1]
 date()
 nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for 
 (i in 1:ntry) { sql=INSERT INTO logger 
 (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
 d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) 
 sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) 
 sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) 
 sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) 
 sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] )
 sql=paste(sql,,,ti[i,10])
 sql=paste(sql,,,ti[i,11])
 sql=paste(sql,,,ti[i,12])
 sql=paste(sql,) )
 #print(sql)
 sqlQuery(channel, sql)
 }
 nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) 
 nadded=nafter-nbefore;nadded
 date()
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 




This e-mail message is intended only for the named recipient(s) above. It may 
contain confidential information. If you are not the intended recipient you are 
hereby notified that any dissemination, distribution or copying of this e-mail 
and any attachment(s) is strictly prohibited. If you have received this e-mail 
in error, please immediately notify the sender by replying to this e-mail and 
delete the message and any attachment(s) from your system. Thank you.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Michel Lang
 I am trying to insert a lot of data into a table using windows R (2.3.1)
 and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert statements
 for each row and execute the insert query one row at a time. See the
 loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?

A few weeks ago I had to solve a similar task. Inserting each row turned out 
to be horrible slow due to paste() and the data.frame-indexing. The estimated 
runtime would have been over 3 weeks, so I used MySQLs LOAD DATE INFILE 
syntax to speed things up. You must have FILE_PRIV = 'Y' set in the 
mysql.user-table to use this small hack, and I'm not that sure that this runs 
remotely. It is also assumed that your df has valid column-names.

tmp_filename - tempfile()

write.table(df, tmp_filename, na = \\N, row.names = FALSE, col.names = 
FALSE, quote = FALSE, sep = \t)

query - paste(
LOAD DATA LOCAL INFILE ',  tmp_filename, ',
 INTO TABLE , your_table,  (, toString(names(df)), );, sep = )

sqlQuery(channel, query)
unlink(tmp_filename)

The total runtime for the LOAD DATA INFILE querys was something below 5 
minutes, inserting 3e+6 rows with  200 columns.

Michel Lang

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Bill Szkotnicki
Thanks for the help ... the  sqlSave() function was the solution.
The lesson, which has been stated many times before,  is to avoid loops 
wherever possible!
Bill

# fast RODBC inserting
dat - cbind(as.character(strptime(ti[,2],%d/%m/%y %H:%M:%S 
%p)),ti[,3:12])
# you need the as.character to make sure the time is stored correctly in 
mysql
names(dat)=c(time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10)
sqlSave(channel,dat,logger,rownames=F,append=T) # very fast.
#

Jerome Asselin wrote:
 On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote:
   
 Hello,
 I am trying to insert a lot of data into a table using windows R (2.3.1) 
 and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert statements 
 for each row and execute the insert query one row at a time. See the 
 loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?

 Thanks, Bill

 # R code
 ntry=dim(ti)[1]
 date()
 nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 for (i in 1:ntry) {
 sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
 d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
 sql=paste(sql,',d1,' )
 sql=paste(sql,,,ti[i,3] )
 sql=paste(sql,,,ti[i,4] )
 sql=paste(sql,,,ti[i,5] )
 sql=paste(sql,,,ti[i,6] )
 sql=paste(sql,,,ti[i,7] )
 sql=paste(sql,,,ti[i,8] )
 sql=paste(sql,,,ti[i,9] )
 sql=paste(sql,,,ti[i,10])
 sql=paste(sql,,,ti[i,11])
 sql=paste(sql,,,ti[i,12])
 sql=paste(sql,) )
 #print(sql)
 sqlQuery(channel, sql)
 }
 nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 nadded=nafter-nbefore;nadded
 date()
 

 I sure will try to help you out here. I've been working with RODBC. I
 think what slows you down here is your loop with multiple paste
 commands.

 Have you considered the sqlSave() function with the append=T argument? I
 think you could replace your loop with:

 dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12])
 sqlSave(channel,dat,logger,append=T)

 Of course, I haven't tested this so you may need some minor adjustments,
 but I think this will greatly speed up your insert job.

 Regards,
 Jerome


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.