uhm, by "large update_or_insert" do you mean "multiple statements of
update_or_insert" or "update_or_insert with a huge number of fields" ?
it seems obvious to me that an "automatic" update_or_insert() on a table
with columns, e.g., a,b,c,d will have to do
select * from testtable where a=avalue and b=bvalue and c=cvalue and
d=dvalue
while a "raw mode", e.g, scanning only the 'a' column that you know is a
"primary key" would trigger a
select * from testtable where a=avalue
and the second can be faster.
Plus, if you know you have a lot more records to be updated instead of
being inserted, it's faster to do
update testtable set a=avalue, b=bvalue, c=cvalue, d=dvalue
and see the returned updated values than doing a select and then updating.
Web2py does not know in advance how many records are you going to insert
and how many are going to be updated, so it resorts to querying if there
are values BEFORE and then updating those values if no rows are returned.
However - always in the spirit of "let web2py users reproduce your issues"
I created a small script, that on my machine returned
Summary
started update_or_insert (full-auto-mode), 5000 UPDATE, 5000 INSERT
finished in 16.4003179073 sec
started update_or_insert (semi-auto-mode), 5000 UPDATE, 5000 INSERT
finished in 15.1824979782 sec
started update_or_insert (explicit-mode), 10000 UPDATE, 0 INSERT
finished in 22.3497450352 sec
started update_or_insert (semi-auto-mode), 10000 UPDATE, 0 INSERT
finished in 15.8405299187 sec
started update_or_insert (full-auto-mode), 10000 UPDATE, 0 INSERT
finished in 17.2484908104 sec
started update_or_insert (explicit-mode), 5000 UPDATE, 5000 INSERT
finished in 15.6090190411 sec
on sqlite and
Summary
started update_or_insert (full-auto-mode), 5000 UPDATE, 5000 INSERT
finished in 26.6851980686 sec
started update_or_insert (semi-auto-mode), 5000 UPDATE, 5000 INSERT
finished in 26.9120578766 sec
started update_or_insert (explicit-mode), 10000 UPDATE, 0 INSERT
finished in 23.3219130039 sec
started update_or_insert (semi-auto-mode), 10000 UPDATE, 0 INSERT
finished in 34.6620731354 sec
started update_or_insert (full-auto-mode), 10000 UPDATE, 0 INSERT
finished in 35.9468638897 sec
started update_or_insert (explicit-mode), 5000 UPDATE, 5000 INSERT
finished in 24.5943310261 sec
try it yourself (it's meant to create a table named "testtable" on the db,
place it into web2py's folder and start it with web2py.py -M -S yourapp -R
update_or_insert.py)
Seems to me that web2py's default update_or_insert() is not taking that
much, and chose a right balanced "algorithm".
BTW: I'll never end advising placing indexes on the right spots.... adding
an index on the d column lead to those results
Summary
started update_or_insert (full-auto-mode), 5000 UPDATE, 5000 INSERT
finished in 18.4528090954 sec
started update_or_insert (semi-auto-mode), 5000 UPDATE, 5000 INSERT
finished in 15.58761096 sec
started update_or_insert (explicit-mode), 10000 UPDATE, 0 INSERT
finished in 6.05162096024 sec
started update_or_insert (semi-auto-mode), 10000 UPDATE, 0 INSERT
finished in 15.5920169353 sec
started update_or_insert (full-auto-mode), 10000 UPDATE, 0 INSERT
finished in 19.1516299248 sec
started update_or_insert (explicit-mode), 5000 UPDATE, 5000 INSERT
finished in 10.5115959644 sec
--
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time
#prepare a large list of dicts
insertlist = []
for a in range(10000):
insertlist.append({
'a' : 'a%s' % a,
'b' : 'b%s' % a,
'c' : 'c%s' % a,
'd' : 'd%s' %a
})
db.define_table('testtable',
Field('a'),
Field('b'),
Field('c'),
Field('d')
)
db.commit()
def init_table(limit):
print 'truncating the table'
db.testtable.truncate()
db.commit()
start = time.time()
db.testtable.bulk_insert(insertlist[:limit])
db.commit()
total_lines = db(db.testtable).count()
print 'total lines in testtable %s' % total_lines
durations = {}
for limit in [10000, 5000]:
init_table(limit)
start = time.time()
stmt = 'started update_or_insert (full-auto-mode), %s UPDATE, %s INSERT' % (limit, 10000-limit)
print stmt
for v in insertlist:
db.testtable.update_or_insert(**v)
db.commit()
rtn = 'finished in %s sec' % (time.time() - start)
durations[stmt] = rtn
print rtn
total_lines = db(db.testtable).count()
print 'total lines in testtable %s' % total_lines
init_table(limit)
stmt = 'started update_or_insert (semi-auto-mode), %s UPDATE, %s INSERT' % (limit, 10000-limit)
print stmt
start = time.time()
for v in insertlist:
db.testtable.update_or_insert(db.testtable.d == v['d'], **v)
db.commit()
rtn = 'finished in %s sec' % (time.time() - start)
durations[stmt] = rtn
print rtn
total_lines = db(db.testtable).count()
print 'total lines in testtable %s' % total_lines
init_table(limit)
stmt = 'started update_or_insert (explicit-mode), %s UPDATE, %s INSERT' % (limit, 10000-limit)
print stmt
start = time.time()
for v in insertlist:
count = db(db.testtable.d == v['d']).update(**v)
if not count:
db.testtable.insert(**v)
db.commit()
rtn = 'finished in %s sec' % (time.time() - start)
durations[stmt] = rtn
print rtn
total_lines = db(db.testtable).count()
print 'total lines in testtable %s' % total_lines
print 'Summary'
for k,v in durations.iteritems():
print '%s\t\t\t%s' % (k,v)