about (ways of) query compilation:
for example in dbcook i have query-filters expressed as plain python
functions, and that adds 3 or 4 more levels additional to the usual
query building. here the levels:
expr.translating:
-a: py-func -> makeExpresion -> expr
-b: expr -> walk( Translator( context) ) -> SA column expression
context needs plain vars, and vars-classes/mappers
-c: context binding: classes / mappers
-d: context binding: plain vars
-e: apply column-expression as select over a query
-f: sql.bindparams
way0: all everytime: a c, d,b,e,sql - no bindparams whatsoever
way1: a c, store; then d,b,e,sql - no bindparams whatsoever
way2: a b c, store; then d/bindparams, e,sql
way3: a b c, e, store; then d/bindparams, sql
way4: a b c, e, sql, store; then d/bindparams
i guess similar breakdown can be done on the SA-query-building itself,
with the most important division being query-into-statement,
statement+bindparams+exec, instantiation.
i admit it's a bit hard to build the model and app in a way that
allows usage of bindparams in their real power, which is: the whole
query statement is like a function and the bindparams are arguments
to that function. note i'm not talking about the constants that SA
automaticaly makes into bindparams, i'm talking about real
parameters.
i also must admit that i'm not too much sql-er and try to avoid it as
much as possible; still, if speed means replacing a 1000 hierarchical
per-node queries with one huge awfuly twisted query that gives those
1000 in one long shot, so be it (well but it takes.. weeks to
invent). i guess if SA was say 10 times faster i would not bother
doing it until i run it on some networked server and find that the
reality is about 10,000 nodes and it didn't realy matter if SA was
faster or slower - it's just a wrong way.
not that i like overheads either... but in the long run i think it's
correctness and consistency and re/usability that is important. What
was running in 5 secs yesterday will run in 3 secs next year on
same-price-equipment.
it would be quite interesting if u try these, in various independent
combinations, and compare the results:
a) the query-into-statement "compilation", then binding+executing
that 1000 times
b) keep the instances cached
c) have just _one_ query returning same thing that current
1000-queries do, in one long shot - and run that once.
i guess combining a) with c) is meaningless, hence the useful
combinations seems: nothing, a, a+b, c, c+b
it may show interesting sides of the query execution
but give them more than 1000, maybe 3000 (~15sec).
ciao
svilen
On Friday 11 July 2008 12:19:26 Henk wrote:
> Hi
>
> I did some profiling a simple select type query to check SA
> performance for such use-cases
>
> The test consiste of a very simple Order record:
>
> order = Table('tblorder', metadata,
> Column('order_id', Integer, primary_key = True),
> Column('order_user_id', Integer),
> Column('order_state', Integer),
> Column('order_price', Integer),
> Column('order_delivery_cost', Integer),
> Column('order_creation_date', DateTime),
> Column('order_modification_date', DateTime),
> Column('order_last_payment_check', DateTime),
> Column('order_delivery_address_id', Integer),
> )
>
> and some related log lines that keep track of what happens to the
> order:
>
> orderlog = Table('tblorderlog', metadata,
> Column('orderlog_id', Integer, primary_key = True),
> Column('orderlog_order_id', Integer,
> ForeignKey('tblorder.order_id')),
> Column('orderlog_creation_date', DateTime),
> Column('orderlog_message', String),
> Column('orderlog_priority', Integer),
> )
>
> On the ORM side of things these 2 have a simple 1-1 mapping with
> classes Order and OrderLog and
> the loglines are mapped to the order using:
>
> 'orderLogs': relation(OrderLog, backref='order')
>
> The main loop of the test program is the following:
>
> times = 1000
> start = time.time()
> for i in range(times):
> order = Session.query(Order).get(72244)
>
> for log in order.orderLogs:
> pass
>
> Session.clear()
>
> end = time.time()
>
> Note that it is the same Order that I am fetching all the time,
> because I am interested in SA overhead and not in the speed of the
> database (MySQL) itself, this way mysql will fetch from memory and
> not be disk bound (also in this test I made sure I am not network
> bound). Also note that the session is cleared each iteration trough
> the loop to force SA to perform the 2 queries and corresponding
> mapping overhead (1 qry for fetching order, 1 qry for fetching the
> 17 corresponding orderlog lines) on each iteration.
>
> Profiling with cProfile (python 2.5.1) gave the following results
> (sorted by total time spend in function):
>
> 2923724 function calls (2880671 primitive calls) in 5.965
> CPU seconds
>
> Ordered by: internal time
> List reduced from 498 to 20 due to restriction <20>
>
> ncalls tottime percall cumtime percall
> filename:lineno(function)
> 2000 0.585 0.000 0.585 0.000 {method 'query' of
> '_mysql.connection' objects}
> 18000 0.192 0.000 1.715 0.000 mapper.py:
> 1407(_instance)
> 18000 0.160 0.000 0.801 0.000 mapper.py:
> 1381(populate_state)
> 2000 0.153 0.000 0.259 0.000 base.py:
> 1448(_init_metadata)
> 20000 0.153 0.000 0.198 0.000 times.py:
> 43(DateTime_or_None)
> 36000 0.147 0.000 0.147 0.000 identity.py:
> 91(__contains__)
> 16000 0.139 0.000 0.270 0.000 compiler.py:
> 239(visit_column)
> 112000 0.130 0.000 0.240 0.000 base.py:
> 1363(__getitem__)
> 94000 0.127 0.000 0.318 0.000 strategies.py:
> 71(new_execute)
> 18000 0.122 0.000 0.122 0.000 attributes.py:
> 745(__init__)
> 42000/2000 0.114 0.000 1.233 0.001 compiler.py:
> 176(process)
> 162036/162022 0.113 0.000 0.134 0.000 {getattr}
> 14000 0.106 0.000 0.541 0.000 compiler.py:
> 228(visit_label)
> 2000 0.103 0.000 1.222 0.001 compiler.py:
> 466(visit_select)
> 112000 0.103 0.000 0.110 0.000
> base.py:1617(_get_col) 32000 0.089 0.000 0.158 0.000
> compiler.py: 401(_truncated_identifier)
> 72000 0.086 0.000 0.096 0.000 util.py:806(add)
> 135101 0.085 0.000 0.085 0.000 {isinstance}
> 18000 0.073 0.000 0.181 0.000 identity.py:104(add)
> 46000 0.068 0.000 0.080 0.000 {built-in method sub}
>
> What we see here is that of the total of 5.965 seconds, the test
> spend only about 0.585 seconds performing the actual SQL querys, or
> about 10% of the total time. The other 90% of the time we are
> spending on SA overhead (except for the times.py line which is
> somewhere in the MySQLdb driver).
>
> If I look at the cummulative output of the profiler (ommitted for
> brevity) I see for instance that SA spends 1.3 seconds compiling
> the SQL expressions (it apparently recompiles the same SQL
> expression on each iteration).
>
> My question to the SA community would be how much ORM overhead do
> you find acceptable? and how does SA compare to for instance to
> java's Hibernate in this regard or other ORM solutions?
>
> I would love to help out trying to reduce this overhead, but I have
> no idea what the SA devs have in mind for the future of SA with
> regard to speed (e.g. speed vs. feature completeness). In the
> tutorials and documentation there is currently a strong emphasis on
> use cases highlighting the very dynamic nature of building complex
> queries using SA's ORM tools. Not so much on how to perform simple
> (and static) queries fast using the ORM.
>
> Maybe there is a way to meoize the results of the compile step so
> that this does not need to be redone all the time while the Query
> remains the same?.
>
>
>
>
>
>
>
>
>
>
>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---