[web2py] Re: Making the DAL orders-of-magnitude faster...

Arnon Marcus Wed, 20 Nov 2013 09:13:12 -0800

I have elaborated the test-script - this time, with a customizable-number 
of chained-tables.
I went all the way, to test the hypothesis that it is the JOINs that are 
the culprit.
Lets see how we do with a *single *record, joining 50 tables of *1 field*each + 
*foreign-key-field 
*(except the first clumn):


C:\Python26\python.exe F:/Experimental/web2py_272/web2py/test_executesql_0.
3.py
Preparing....
Times are for 100 iterations, into 50 tables, of querying for 1 results,each 
with 99 values
started bench
speed_executesql() total=0.019787 avg=0.000198
speed_normal_rows() total=0.731611 avg=0.007316
speed_rows_with_cacheable() total=0.669495 avg=0.006695
speed_executesql_with_colnames() total=0.016620 avg=0.000166
speed_rows_with_simple_processor() total=0.364101 avg=0.003641


Process finished with exit code 0



Again - a x44 times increase in performance - quite consistent.

Now let's see a more modest and realistic example - 5 tables of 10 fields 
each - 1k records query:

C:\Python26\python.exe F:/Experimental/web2py_272/web2py/test_executesql_0.
3.py
Preparing....
Times are for 100 iterations, into 5 tables, of querying for 1000 results,each 
with 54 values
started bench
speed_executesql() total=2.619404 avg=0.026194
speed_normal_rows() total=52.479113 avg=0.524791
speed_rows_with_cacheable() total=48.437764 avg=0.484378
speed_executesql_with_colnames() total=2.789842 avg=0.027898
speed_rows_with_simple_processor() total=2.865223 avg=0.028652


This is x19 times speed-improvement between speed_executesql_with_colnames
 and speed_normal_rows
This is x17 times speed-improvement between ..simple_processor and ..
rows_with_cacheable

Here is the file:

On Wednesday, November 20, 2013 1:47:47 PM UTC+2, Niphlod wrote:
>
>
>
> On Tuesday, November 19, 2013 11:12:52 PM UTC+1, Arnon Marcus wrote:
>>
>> 10x Niphlod, I'll check that tomorrow...
>>
>> I can provide any code you like - this is solid and consistent.
>>
>
> Exactly what I'm asking. a self-contained testbase to reproduce (including 
> data, models and such).
>  
>
>>
>> The interesting (and somewhat disappointing) factoid I got from your 
>> results, was that PyPy was just as "slow" in un-parsed mode... ;)
>>
>
> pypy isn't a silver bullet for everything. 
>
> BTW, lots of the time is spent actually interacting with SQLite, so the 
> fact that a pure python library on pypy catches up with the C library 
> shipped with python is quite itself the proof that pypy is a reeeeaally 
> good project.
>
> BTW2: look at the "with_simple_processor" code. It does what your patch, 
> but in one line, and it works for any select().
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

import timeit
from gluon.dal import DAL, Field

tableCount = 50
fieldCount = 1
recordCount = 1
times = 100

##prepare a large dict
print 'Preparing....'

db = DAL('sqlite:memory')

tableNames = [('tb_'+str(i)) for i in range(tableCount)]
fieldNames = [('fl_'+str(i)) for i in range(fieldCount)]
#load 1k records into a test table

for t in range(tableCount):
    fields = [Field(fieldName) for fieldName in fieldNames]
    if t:
        fields.append(Field(tableNames[t-1], db[tableNames[t-1]]))
    db.define_table(tableNames[t], *fields)
db.commit()

for t in range(tableCount):
    db(db[tableNames[t]].id>0).delete()
db.commit()

for r in range(recordCount):
    ids = []
    for t in range(tableCount):
        tableName = tableNames[t]
        values = dict([(fieldName,('%s_%s'%(tableName,fieldName))) for fieldName in fieldNames])
        newId = db[tableName].insert(**values)
        ids.append(newId)
    for t in range(tableCount-1):
        #row = db(db[tableNames[t+1]].id==ids[t+1]).select(db[tableNames[t+1]].ALL).as_dict()
        #row[tableNames[t]] = ids[t]
        #row.update()
        db[tableNames[t+1]][ids[t+1]].update_record(**{tableNames[t]:ids[t]})

db.commit()

query = (db[tableNames[1]][tableNames[0]]==db[tableNames[0]].id)
for t in range(1, (tableCount-1)):
    query = query & (db[tableNames[t+1]][tableNames[t]]==db[tableNames[t]].id)

selection = [db[tableName].ALL for tableName in tableNames]
sql = db(query)._select(*selection)
sql_colnames = sql.split(' FROM')[0][7:].split(', ')

print("Times are for %i iterations, into %i tables, of querying for %i results, each with %i values" % (times,tableCount,recordCount,((fieldCount*tableCount)+(tableCount-1))))

def bench(cmd, imprt):
    t = timeit.Timer(cmd, imprt)
    s = t.timeit(number=times)
    print("%s total=%02f avg=%02f" % (cmd, s, (s/times)))
    return s

def myprocessor(rows, fields, colnames, blob_decode=True, cacheable = False):
    return [dict(zip(colnames,row)) for row in rows]

def speed_executesql():
    rtn = db.executesql(sql, as_dict=True)

def speed_executesql_with_colnames():
    rtn = db.executesql(sql, colnames=sql_colnames, as_dict=True)

def speed_normal_rows():
    rtn = db(query).select()

def speed_rows_with_cacheable():
    rtn = db(query).select(cacheable=True)

def speed_rows_with_simple_processor():
    rtn = db(query).select(processor=myprocessor)

def full_bench():
    print 'started bench'
    b1 = bench('speed_executesql()', 'from __main__ import speed_executesql')
    b2 = bench('speed_normal_rows()', 'from __main__ import speed_normal_rows')
    b3 = bench('speed_rows_with_cacheable()', 'from __main__ import speed_rows_with_cacheable')
    b4 = bench('speed_executesql_with_colnames()', 'from __main__ import speed_executesql_with_colnames')
    b5 = bench('speed_rows_with_simple_processor()', 'from __main__ import speed_rows_with_simple_processor, myprocessor')
    
if __name__ == '__main__':
    full_bench()

[web2py] Re: Making the DAL orders-of-magnitude faster...

Reply via email to