[web2py] Re: Making the DAL orders-of-magnitude faster...

Arnon Marcus Wed, 20 Nov 2013 09:41:35 -0800

Everyone:

You may play with the numbers at the top in this version of the code - 
everything is now customizable for generating any number of chained-joined 
tables with any number of fields and inserting any number of records.


I find that the more joins you have, the more overhead you'll pay when 
parsing to a Rows object with regular .select().
It can be anywhere from a x15 speed-bump with just 2 tables, up-to x44 with 
an extreme-case of 50 tables.
Usually, it would hover around ~x20 or so - in real-world use-cases.
But this is using just simple string-based fields... It goes way-up from 
there when you introduce other data-types.
As evident with our real-world example of almost x100 penalty for regular 
.select() in a simple 2-table join of about 15 columns.

Caching gains maybe 5% improvement...

I also found the .select(processor=<>) approach to be anywhere from 10% 
slower to x6 slower than my .quickSelect() approach for the same queries. 
The gap get's *worse *for more *simple *the use-case:
i.e:
C:\Python26\python.exe 
F:/Experimental/web2py_272/web2py/test_executesql_0.3.py
Preparing....
Times are for 100 iterations, into 2 tables, of querying for 1 results, 
each with 3 values
started bench
speed_executesql() total=0.002431 avg=0.000024
speed_normal_rows() total=0.028979 avg=0.000290
speed_rows_with_cacheable() total=0.029703 avg=0.000297
speed_executesql_with_colnames() total=0.002551 avg=0.000026
speed_rows_with_simple_processor() total=0.015669 avg=0.000157

Process finished with exit code 0

speed_rows_with_simple_processor() is x6 times slower than 
speed_executesql_with_colnames()

This is when configured as such:
import timeit
from gluon.dal import DAL, Field


tableCount = 2
fieldCount = 1
recordCount = 1
times = 100


##prepare a large dict
...

Granted, this is hardly an issue, though...
But there is a gradient between the more simple and the more complex.

Try it out...

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

import timeit
from gluon.dal import DAL, Field

tableCount = 5
fieldCount = 10
recordCount = 1000
times = 100

##prepare a large dict
print 'Preparing....'

db = DAL('sqlite:memory')

tableNames = [('tb_'+str(i)) for i in range(tableCount)]
fieldNames = [('fl_'+str(i)) for i in range(fieldCount)]
#load 1k records into a test table

for t in range(tableCount):
    fields = [Field(fieldName) for fieldName in fieldNames]
    if t:
        fields.append(Field(tableNames[t-1], db[tableNames[t-1]]))
    db.define_table(tableNames[t], *fields)
db.commit()

for t in range(tableCount):
    db(db[tableNames[t]].id>0).delete()
db.commit()

for r in range(recordCount):
    ids = []
    for t in range(tableCount):
        tableName = tableNames[t]
        values = dict([(fieldName,('%s_%s'%(tableName,fieldName))) for fieldName in fieldNames])
        newId = db[tableName].insert(**values)
        ids.append(newId)
    for t in range(tableCount-1):
        db[tableNames[t+1]][ids[t+1]].update_record(**{tableNames[t]:ids[t]})

db.commit()

query = (db[tableNames[1]][tableNames[0]]==db[tableNames[0]].id)
for t in range(1, (tableCount-1)):
    query = query & (db[tableNames[t+1]][tableNames[t]]==db[tableNames[t]].id)

selection = [db[tableName].ALL for tableName in tableNames]
sql = db(query)._select(*selection)
sql_colnames = sql.split(' FROM')[0][7:].split(', ')

print("Times are for %i iterations, into %i tables, of querying for %i results, each with %i values" % (times,tableCount,recordCount,((fieldCount*tableCount)+(tableCount-1))))

def bench(cmd, imprt):
    t = timeit.Timer(cmd, imprt)
    s = t.timeit(number=times)
    print("%s total=%02f avg=%02f" % (cmd, s, (s/times)))
    return s

def myprocessor(rows, fields, colnames, blob_decode=True, cacheable = False):
    return [dict(zip(colnames,row)) for row in rows]

def speed_executesql():
    rtn = db.executesql(sql, as_dict=True)

def speed_executesql_with_colnames():
    rtn = db.executesql(sql, colnames=sql_colnames, as_dict=True)

def speed_normal_rows():
    rtn = db(query).select()

def speed_rows_with_cacheable():
    rtn = db(query).select(cacheable=True)

def speed_rows_with_simple_processor():
    rtn = db(query).select(processor=myprocessor)

def full_bench():
    print 'started bench'
    b1 = bench('speed_executesql()', 'from __main__ import speed_executesql')
    b2 = bench('speed_normal_rows()', 'from __main__ import speed_normal_rows')
    b3 = bench('speed_rows_with_cacheable()', 'from __main__ import speed_rows_with_cacheable')
    b4 = bench('speed_executesql_with_colnames()', 'from __main__ import speed_executesql_with_colnames')
    b5 = bench('speed_rows_with_simple_processor()', 'from __main__ import speed_rows_with_simple_processor, myprocessor')
    
if __name__ == '__main__':
    full_bench()

[web2py] Re: Making the DAL orders-of-magnitude faster...

Reply via email to