[sqlalchemy] Re: SA Performance (some profiling on simple selects)

az Fri, 11 Jul 2008 13:06:25 -0700

about (ways of) query compilation:
for example in dbcook i have query-filters expressed as plain python 
functions, and that adds 3 or 4 more levels additional to the usual 
query building. here the levels:


expr.translating:
    -a: py-func -> makeExpresion -> expr
    -b: expr -> walk( Translator( context) ) -> SA column expression
        context needs plain vars, and vars-classes/mappers
    -c: context binding: classes / mappers
    -d: context binding: plain vars
    -e: apply column-expression as select over a query
    -f: sql.bindparams

    way0: all everytime: a c, d,b,e,sql - no bindparams whatsoever
    way1: a c, store; then d,b,e,sql - no bindparams whatsoever
    way2: a b c, store; then d/bindparams, e,sql 
    way3: a b c, e, store; then d/bindparams, sql
    way4: a b c, e, sql, store; then d/bindparams

i guess similar breakdown can be done on the SA-query-building itself, 
with the most important division being query-into-statement,  
statement+bindparams+exec, instantiation.

i admit it's a bit hard to build the model and app in a way that 
allows usage of bindparams in their real power, which is: the whole 
query statement is like a function and the bindparams are arguments 
to that function. note i'm not talking about the constants that SA 
automaticaly makes into bindparams, i'm talking about real 
parameters.

i also must admit that i'm not too much sql-er and try to avoid it as 
much as possible; still, if speed means replacing a 1000 hierarchical 
per-node queries with one huge awfuly twisted query that gives those 
1000 in one long shot, so be it (well but it takes.. weeks to 
invent). i guess if SA was say 10 times faster i would not bother 
doing it until i run it on some networked server and find that the 
reality is about 10,000 nodes and it didn't realy matter if SA was 
faster or slower - it's just a wrong way.

not that i like overheads either... but in the long run i think it's 
correctness and consistency and re/usability that is important. What 
was running in 5 secs yesterday will run in 3 secs next year on 
same-price-equipment.

it would be quite interesting if u try these, in various independent 
combinations, and compare the results:
 a) the query-into-statement "compilation", then binding+executing 
that 1000 times
 b) keep the instances cached
 c) have just _one_ query returning same thing that current 
1000-queries do, in one long shot - and run that once.
i guess combining a) with c) is meaningless, hence the useful 
combinations seems: nothing, a, a+b, c, c+b
it may show interesting sides of the query execution

but give them more than 1000, maybe 3000 (~15sec).

ciao
svilen

On Friday 11 July 2008 12:19:26 Henk wrote:
> Hi
>
> I did some profiling a simple select type query to check SA
> performance for such use-cases
>
> The test consiste of a very simple Order record:
>
> order = Table('tblorder', metadata,
>     Column('order_id', Integer, primary_key = True),
>     Column('order_user_id', Integer),
>     Column('order_state', Integer),
>     Column('order_price', Integer),
>     Column('order_delivery_cost', Integer),
>     Column('order_creation_date', DateTime),
>     Column('order_modification_date', DateTime),
>     Column('order_last_payment_check', DateTime),
>     Column('order_delivery_address_id', Integer),
> )
>
> and some related log lines that keep track of what happens to the
> order:
>
> orderlog = Table('tblorderlog', metadata,
>     Column('orderlog_id', Integer, primary_key = True),
>     Column('orderlog_order_id', Integer,
> ForeignKey('tblorder.order_id')),
>     Column('orderlog_creation_date', DateTime),
>     Column('orderlog_message', String),
>     Column('orderlog_priority', Integer),
> )
>
> On the ORM side of things these 2 have a simple 1-1 mapping with
> classes Order and OrderLog and
> the loglines are mapped to the order using:
>
> 'orderLogs': relation(OrderLog, backref='order')
>
> The main loop of the test program is the following:
>
>         times = 1000
>         start = time.time()
>         for i in range(times):
>             order = Session.query(Order).get(72244)
>
>             for log in order.orderLogs:
>                 pass
>
>             Session.clear()
>
>         end = time.time()
>
> Note that it is the same Order that I am fetching all the time,
> because I am interested in SA overhead and not in the speed of the
> database (MySQL) itself, this way mysql will fetch from memory and
> not be disk bound (also in this test I made sure I am not network
> bound). Also note that the session is cleared each iteration trough
> the loop to force SA to perform the 2 queries and corresponding
> mapping overhead (1 qry for fetching order, 1 qry for fetching the
> 17 corresponding orderlog lines) on each iteration.
>
> Profiling with cProfile (python 2.5.1) gave the following results
> (sorted by total time spend in function):
>
>         2923724 function calls (2880671 primitive calls) in 5.965
> CPU seconds
>
>    Ordered by: internal time
>    List reduced from 498 to 20 due to restriction <20>
>
>    ncalls  tottime  percall  cumtime  percall
> filename:lineno(function)
>      2000    0.585    0.000    0.585    0.000 {method 'query' of
> '_mysql.connection' objects}
>     18000    0.192    0.000    1.715    0.000 mapper.py:
> 1407(_instance)
>     18000    0.160    0.000    0.801    0.000 mapper.py:
> 1381(populate_state)
>      2000    0.153    0.000    0.259    0.000 base.py:
> 1448(_init_metadata)
>     20000    0.153    0.000    0.198    0.000 times.py:
> 43(DateTime_or_None)
>     36000    0.147    0.000    0.147    0.000 identity.py:
> 91(__contains__)
>     16000    0.139    0.000    0.270    0.000 compiler.py:
> 239(visit_column)
>    112000    0.130    0.000    0.240    0.000 base.py:
> 1363(__getitem__)
>     94000    0.127    0.000    0.318    0.000 strategies.py:
> 71(new_execute)
>     18000    0.122    0.000    0.122    0.000 attributes.py:
> 745(__init__)
> 42000/2000    0.114    0.000    1.233    0.001 compiler.py:
> 176(process)
> 162036/162022    0.113    0.000    0.134    0.000 {getattr}
>     14000    0.106    0.000    0.541    0.000 compiler.py:
> 228(visit_label)
>      2000    0.103    0.000    1.222    0.001 compiler.py:
> 466(visit_select)
>    112000    0.103    0.000    0.110    0.000
> base.py:1617(_get_col) 32000    0.089    0.000    0.158    0.000
> compiler.py: 401(_truncated_identifier)
>     72000    0.086    0.000    0.096    0.000 util.py:806(add)
>    135101    0.085    0.000    0.085    0.000 {isinstance}
>     18000    0.073    0.000    0.181    0.000 identity.py:104(add)
>     46000    0.068    0.000    0.080    0.000 {built-in method sub}
>
> What we see here is that  of the total of 5.965 seconds, the test
> spend only about 0.585 seconds performing the actual SQL querys, or
> about 10% of the total time. The other 90% of the time we are
> spending on SA overhead (except for the times.py line which is
> somewhere in the MySQLdb driver).
>
> If I look at the cummulative output of the profiler (ommitted for
> brevity) I see for instance that SA spends 1.3 seconds compiling
> the SQL expressions (it apparently recompiles the same SQL
> expression on each iteration).
>
> My question to the SA community would be how much ORM overhead do
> you find acceptable? and how does SA compare to for instance to
> java's Hibernate in this regard or other ORM solutions?
>
> I would love to help out trying to reduce this overhead, but I have
> no idea what the SA devs have in mind for the future of SA with
> regard to speed (e.g. speed vs. feature completeness). In the
> tutorials and documentation there is currently a strong emphasis on
> use cases highlighting the very dynamic nature of building complex
> queries using SA's ORM tools. Not so much on how to perform simple
> (and static) queries fast using the ORM.
>
> Maybe there is a way to meoize the results of the compile step so
> that this does not need to be redone all the time while the Query
> remains the same?.
>
>
>
>
>
>
>
>
>
>
>
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: SA Performance (some profiling on simple selects)

Reply via email to