Re: [sqlalchemy] simple relationship join ends up in 1 s

Michael Bayer Tue, 16 Sep 2014 08:58:31 -0700

that jump in time between __iter__ and all() seems kind of odd, and I wonder if 
some of that is in setting up a larger span of memory to store the entire 
result in a list; using yield_per() often gives much better results, though it 
isn't compatible with collection-joined eager loading.   I'd normally break up 
the query in some other way.

Anyway, MySQLdb which is an ultra fast C library is taking .2 sec just to read 
the rows, then setting up objects (note there's more than one object to handle 
per row w/ joined eager loading), populating them and their collections and 
establishing their identity within the identity map is taking .8 sec, which is 
typical.

if it's any consolation, the instance()/_instance() pair is 25% faster in 1.0, 
see 
http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html#orm-full-object-fetches-25-faster.

short answer, yes, cPython is a very slow interpreter and creating new objects 
is expensive, putting them in dictionaries and creating weakrefs to them is 
expensive (weakrefs is key to how the Session doesn't run out of memory even if 
you do zillions of queries with it, as long as you throw away the objects), 
doing lookups to see if we've already seen this identity in the row before is 
expensive (that storage/lookup is how lots of things get accomplished, 
including eager loading), and all of these processes deliver to us features 
that people really like in the ORM.

if you don't want all those features, then you can load individual columns and 
you'll see that's 2-3 times faster (also sped up a little more in 1.0), and if 
you use Core directly to get the rows there is minimal overhead vs. what 
MySQLdb incurs directly.

More is coming in 1.0 to educate people about this, for the "load lots of rows" 
use case take a look at the performance suite in 
https://bitbucket.org/zzzeek/sqlalchemy/src/b9d430af752b7cc955932a54a8f8db18f46d89a6/examples/performance/large_resultsets.py?at=ticket_3100
 for many varieties of how to load 500K rows, here's the performance numbers I 
get on that test with MySQLdb:

test_orm_full_objects_list : Load fully tracked objects using the ORM, all at 
once into a list (500000 iterations); total time 11.050664 sec
test_orm_full_objects_chunks : Load fully tracked objects using the ORM using 
yield per (500000 iterations); total time 7.297299 sec
test_orm_bundles : Load lightweight "bundle" objects using the ORM. (500000 
iterations); total time 2.086640 sec
test_orm_columns : Load individual columns into named tuples using the ORM. 
(500000 iterations); total time 1.534253 sec
test_core_fetchall : Load Core result rows using fetchall. (500000 iterations); 
total time 1.114845 sec
test_core_fetchmany_w_streaming : Load Core result rows using 
fetchmany/streaming. (500000 iterations); total time 0.894745 sec
test_core_fetchmany : Load Core result rows using Core / fetchmany. (500000 
iterations); total time 0.968827 sec
test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI fetchall(), 
generate an object for each row. (500000 iterations); total time 1.017718 sec
test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't make 
any objects. (500000 iterations); total time 0.767289 sec

I added a test to differentiate between "list(query(Class))" vs. "for row in 
query(Class).yield_per(1000)" above; the only difference here is that 
instances() yields out 1000 objects at a time, or just loads the whole thing 
into a single list at once.    You can see a dramatic difference in time spent, 
7.29 sec vs. 11 seconds, just for cPython to set up memory big enough to hold 
all that data at once, so that is probably where the 1.3 seconds becomes 2.2 
seconds in your example; calling .all() and not batching the result into 
smaller chunks.   So when you deal with large fields of data, chunking can help 
a lot, but if you reduce the amount of memory per row by not using objects at 
all, then you do even better.

On Sep 16, 2014, at 5:47 AM, tatütata Okay <[email protected]> wrote:

> The columns only contain address strings and not much more.
> 
> Okay I did some profiling for the customerorder plus two joins on itemsbought 
> and products:
>          1008030 function calls (968253 primitive calls) in 2.220 seconds
> 
>    Ordered by: cumulative time
> 
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000    2.211    2.211 sqltest.py:151(test)
>         1    0.008    0.008    2.190    2.190 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2294(all)
>         1    0.000    0.000    1.319    1.319 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2407(__iter__)
>         1    0.000    0.000    1.312    1.312 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2421(_execute_and_instances)
>         4    0.000    0.000    1.303    0.326 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py:661(execute)
>         1    0.000    0.000    1.300    1.300 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py:320(_execute_on_connection)
>         1    0.000    0.000    1.300    1.300 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py:790(_execute_clauseelement)
>         4    0.000    0.000    1.298    0.324 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py:877(_execute_context)
>         9    0.000    0.000    1.293    0.144 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:164(execute)
>         7    0.000    0.000    1.293    0.185 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py:435(do_execute)
>         9    0.000    0.000    1.293    0.144 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:353(_query)
>      1709    0.010    0.000    0.864    0.001 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/loading.py:27(instances)
> 22084/5537    0.248    0.000    0.778    0.000 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/loading.py:330(_instance)
> 22052/5537    0.122    0.000    0.694    0.000 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/loading.py:284(populate_state)
>         9    0.000    0.000    0.685    0.076 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:358(_post_get_result)
>         9    0.000    0.000    0.684    0.076 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:324(_fetch_row)
>         9    0.206    0.023    0.684    0.076 {built-in method fetch_row}
>         9    0.000    0.000    0.608    0.068 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:315(_do_query)
>      3829    0.009    0.000    0.426    0.000 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py:1451(load_collection_from_joined_existing_row)
>    111604    0.073    0.000    0.409    0.000 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/connections.py:212(string_decoder)
> 10979/5505    0.016    0.000    0.344    0.000 
> /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py:1475(load_scalar_from_joined_new_row)
>    111604    0.124    0.000    0.336    0.000 {method 'decode' of 'str' 
> objects}
> 
> 
> as I can see creating objects and the query itself(?) takes a lot of 
> cpupower? But why? It is just a simple query? Is it really the object 
> creating? I just want to iterate over that data to display it in ordermenu 
> for the admin.
> Or shall I use plain execution of sql for that, is sqlalchemy more for 
> specific queries to get specific data and not the mass data?
> 
> 
> 
> On Fri, Sep 12, 2014 at 6:40 PM, Jonathan Vanasco <[email protected]> 
> wrote:
> I think you need to profile the application like Mike said. Just to be 
> thorough, I'd suggest you time the Query execution and the ORM population.
> 
> Do you have an example of the faster raw SQL that you're trying to generate ?
> 
> When you do a query on the commandline it is often faster because the 
> database is only displaying the initial data on the screen.  When you're 
> using something like SqlAlchemy, the app needs to get all the data and then 
> create objects.  This is usually lightweight, but if you have some columns 
> with a lot of data, there will be a hit.  When you profile, you'll have a 
> better idea if the query, the loader, of the connection is to blame.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] simple relationship join ends up in 1 s

Reply via email to