uhm, by "large update_or_insert" do you mean "multiple statements of 
update_or_insert" or "update_or_insert with a huge number of fields" ?
it seems obvious to me that an "automatic" update_or_insert() on a table 
with columns, e.g., a,b,c,d will have to do
select * from testtable where a=avalue and b=bvalue and c=cvalue and 
d=dvalue
while a "raw mode", e.g, scanning only the 'a' column that you know is a 
"primary key" would trigger a
select * from testtable where a=avalue
and the second can be faster.
Plus, if you know you have a lot more records to be updated instead of 
being inserted, it's faster to do
update testtable set a=avalue, b=bvalue, c=cvalue, d=dvalue
and see the returned updated values than doing a select and then updating.
Web2py does not know in advance how many records are you going to insert 
and how many are going to be updated, so it resorts to querying if there 
are values BEFORE and then updating those values if no rows are returned.

However - always in the spirit of "let web2py users reproduce your issues" 
I created a small script, that on my machine returned
Summary
started update_or_insert (full-auto-mode), 5000 UPDATE, 5000 INSERT         
   finished in 16.4003179073 sec
started update_or_insert (semi-auto-mode), 5000 UPDATE, 5000 INSERT         
   finished in 15.1824979782 sec
started update_or_insert (explicit-mode), 10000 UPDATE, 0 INSERT           
    finished in 22.3497450352 sec
started update_or_insert (semi-auto-mode), 10000 UPDATE, 0 INSERT           
   finished in 15.8405299187 sec
started update_or_insert (full-auto-mode), 10000 UPDATE, 0 INSERT           
   finished in 17.2484908104 sec
started update_or_insert (explicit-mode), 5000 UPDATE, 5000 INSERT         
    finished in 15.6090190411 sec
on sqlite and
Summary
started update_or_insert (full-auto-mode), 5000 UPDATE, 5000 INSERT        
    finished in 26.6851980686 sec
started update_or_insert (semi-auto-mode), 5000 UPDATE, 5000 INSERT        
    finished in 26.9120578766 sec
started update_or_insert (explicit-mode), 10000 UPDATE, 0 INSERT        
       finished in 23.3219130039 sec
started update_or_insert (semi-auto-mode), 10000 UPDATE, 0 INSERT        
      finished in 34.6620731354 sec
started update_or_insert (full-auto-mode), 10000 UPDATE, 0 INSERT        
      finished in 35.9468638897 sec
started update_or_insert (explicit-mode), 5000 UPDATE, 5000 INSERT        
     finished in 24.5943310261 sec

try it yourself (it's meant to create a table named "testtable" on the db, 
place it into web2py's folder and start it with web2py.py -M -S yourapp -R 
update_or_insert.py)

Seems to me that web2py's default update_or_insert() is not taking that 
much, and chose a right balanced "algorithm".

BTW: I'll never end advising placing indexes on the right spots.... adding 
an index on the d column lead to those results
Summary
started update_or_insert (full-auto-mode), 5000 UPDATE, 5000 INSERT         
   finished in 18.4528090954 sec
started update_or_insert (semi-auto-mode), 5000 UPDATE, 5000 INSERT         
   finished in 15.58761096 sec
started update_or_insert (explicit-mode), 10000 UPDATE, 0 INSERT           
    finished in 6.05162096024 sec
started update_or_insert (semi-auto-mode), 10000 UPDATE, 0 INSERT           
   finished in 15.5920169353 sec
started update_or_insert (full-auto-mode), 10000 UPDATE, 0 INSERT           
   finished in 19.1516299248 sec
started update_or_insert (explicit-mode), 5000 UPDATE, 5000 INSERT         
    finished in 10.5115959644 sec




-- 



#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time
#prepare a large list of dicts
insertlist = []
for a in range(10000):
    insertlist.append({
        'a' : 'a%s' % a,
        'b' : 'b%s' % a,
        'c' : 'c%s' % a,
        'd' : 'd%s' %a        
    })

db.define_table('testtable', 
    Field('a'),
    Field('b'),
    Field('c'),
    Field('d')
)
db.commit()

def init_table(limit):
    print 'truncating the table'
    db.testtable.truncate()
    db.commit()
    start = time.time()
    db.testtable.bulk_insert(insertlist[:limit])
    db.commit()
    total_lines = db(db.testtable).count()
    print 'total lines in testtable %s' % total_lines

durations = {}

for limit in [10000, 5000]:
    init_table(limit)
    start = time.time()
    stmt = 'started update_or_insert (full-auto-mode), %s UPDATE, %s INSERT' % (limit, 10000-limit)
    print stmt
    for v in insertlist:
        db.testtable.update_or_insert(**v)
    db.commit()
    rtn = 'finished in %s sec' % (time.time() - start)
    durations[stmt] = rtn
    print rtn
    total_lines = db(db.testtable).count()
    print 'total lines in testtable %s' % total_lines

    init_table(limit)
    stmt = 'started update_or_insert (semi-auto-mode), %s UPDATE, %s INSERT' % (limit, 10000-limit)
    print stmt
    start = time.time()
    for v in insertlist:
        db.testtable.update_or_insert(db.testtable.d == v['d'], **v)
    db.commit()
    rtn = 'finished in %s sec' % (time.time() - start)
    durations[stmt] = rtn
    print rtn
    total_lines = db(db.testtable).count()
    print 'total lines in testtable %s' % total_lines

    init_table(limit)
    stmt = 'started update_or_insert (explicit-mode), %s UPDATE, %s INSERT' % (limit, 10000-limit)
    print stmt
    start = time.time()
    for v in insertlist:
        count = db(db.testtable.d == v['d']).update(**v)
        if not count:
            db.testtable.insert(**v)
    db.commit()
    rtn = 'finished in %s sec' % (time.time() - start)
    durations[stmt] = rtn
    print rtn
    total_lines = db(db.testtable).count()
    print 'total lines in testtable %s' % total_lines


print 'Summary'
for k,v in durations.iteritems():
    print '%s\t\t\t%s' % (k,v)

Reply via email to