that jump in time between __iter__ and all() seems kind of odd, and I wonder if some of that is in setting up a larger span of memory to store the entire result in a list; using yield_per() often gives much better results, though it isn't compatible with collection-joined eager loading. I'd normally break up the query in some other way.
Anyway, MySQLdb which is an ultra fast C library is taking .2 sec just to read the rows, then setting up objects (note there's more than one object to handle per row w/ joined eager loading), populating them and their collections and establishing their identity within the identity map is taking .8 sec, which is typical. if it's any consolation, the instance()/_instance() pair is 25% faster in 1.0, see http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html#orm-full-object-fetches-25-faster. short answer, yes, cPython is a very slow interpreter and creating new objects is expensive, putting them in dictionaries and creating weakrefs to them is expensive (weakrefs is key to how the Session doesn't run out of memory even if you do zillions of queries with it, as long as you throw away the objects), doing lookups to see if we've already seen this identity in the row before is expensive (that storage/lookup is how lots of things get accomplished, including eager loading), and all of these processes deliver to us features that people really like in the ORM. if you don't want all those features, then you can load individual columns and you'll see that's 2-3 times faster (also sped up a little more in 1.0), and if you use Core directly to get the rows there is minimal overhead vs. what MySQLdb incurs directly. More is coming in 1.0 to educate people about this, for the "load lots of rows" use case take a look at the performance suite in https://bitbucket.org/zzzeek/sqlalchemy/src/b9d430af752b7cc955932a54a8f8db18f46d89a6/examples/performance/large_resultsets.py?at=ticket_3100 for many varieties of how to load 500K rows, here's the performance numbers I get on that test with MySQLdb: test_orm_full_objects_list : Load fully tracked objects using the ORM, all at once into a list (500000 iterations); total time 11.050664 sec test_orm_full_objects_chunks : Load fully tracked objects using the ORM using yield per (500000 iterations); total time 7.297299 sec test_orm_bundles : Load lightweight "bundle" objects using the ORM. (500000 iterations); total time 2.086640 sec test_orm_columns : Load individual columns into named tuples using the ORM. (500000 iterations); total time 1.534253 sec test_core_fetchall : Load Core result rows using fetchall. (500000 iterations); total time 1.114845 sec test_core_fetchmany_w_streaming : Load Core result rows using fetchmany/streaming. (500000 iterations); total time 0.894745 sec test_core_fetchmany : Load Core result rows using Core / fetchmany. (500000 iterations); total time 0.968827 sec test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI fetchall(), generate an object for each row. (500000 iterations); total time 1.017718 sec test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't make any objects. (500000 iterations); total time 0.767289 sec I added a test to differentiate between "list(query(Class))" vs. "for row in query(Class).yield_per(1000)" above; the only difference here is that instances() yields out 1000 objects at a time, or just loads the whole thing into a single list at once. You can see a dramatic difference in time spent, 7.29 sec vs. 11 seconds, just for cPython to set up memory big enough to hold all that data at once, so that is probably where the 1.3 seconds becomes 2.2 seconds in your example; calling .all() and not batching the result into smaller chunks. So when you deal with large fields of data, chunking can help a lot, but if you reduce the amount of memory per row by not using objects at all, then you do even better. On Sep 16, 2014, at 5:47 AM, tatütata Okay <[email protected]> wrote: > The columns only contain address strings and not much more. > > Okay I did some profiling for the customerorder plus two joins on itemsbought > and products: > 1008030 function calls (968253 primitive calls) in 2.220 seconds > > Ordered by: cumulative time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 0.000 0.000 2.211 2.211 sqltest.py:151(test) > 1 0.008 0.008 2.190 2.190 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2294(all) > 1 0.000 0.000 1.319 1.319 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2407(__iter__) > 1 0.000 0.000 1.312 1.312 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py:2421(_execute_and_instances) > 4 0.000 0.000 1.303 0.326 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py:661(execute) > 1 0.000 0.000 1.300 1.300 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py:320(_execute_on_connection) > 1 0.000 0.000 1.300 1.300 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py:790(_execute_clauseelement) > 4 0.000 0.000 1.298 0.324 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py:877(_execute_context) > 9 0.000 0.000 1.293 0.144 > /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:164(execute) > 7 0.000 0.000 1.293 0.185 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py:435(do_execute) > 9 0.000 0.000 1.293 0.144 > /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:353(_query) > 1709 0.010 0.000 0.864 0.001 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/loading.py:27(instances) > 22084/5537 0.248 0.000 0.778 0.000 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/loading.py:330(_instance) > 22052/5537 0.122 0.000 0.694 0.000 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/loading.py:284(populate_state) > 9 0.000 0.000 0.685 0.076 > /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:358(_post_get_result) > 9 0.000 0.000 0.684 0.076 > /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:324(_fetch_row) > 9 0.206 0.023 0.684 0.076 {built-in method fetch_row} > 9 0.000 0.000 0.608 0.068 > /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/cursors.py:315(_do_query) > 3829 0.009 0.000 0.426 0.000 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py:1451(load_collection_from_joined_existing_row) > 111604 0.073 0.000 0.409 0.000 > /home/dumbass/sources/venv/lib/python2.7/site-packages/MySQLdb/connections.py:212(string_decoder) > 10979/5505 0.016 0.000 0.344 0.000 > /home/dumbass/sources/venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py:1475(load_scalar_from_joined_new_row) > 111604 0.124 0.000 0.336 0.000 {method 'decode' of 'str' > objects} > > > as I can see creating objects and the query itself(?) takes a lot of > cpupower? But why? It is just a simple query? Is it really the object > creating? I just want to iterate over that data to display it in ordermenu > for the admin. > Or shall I use plain execution of sql for that, is sqlalchemy more for > specific queries to get specific data and not the mass data? > > > > On Fri, Sep 12, 2014 at 6:40 PM, Jonathan Vanasco <[email protected]> > wrote: > I think you need to profile the application like Mike said. Just to be > thorough, I'd suggest you time the Query execution and the ORM population. > > Do you have an example of the faster raw SQL that you're trying to generate ? > > When you do a query on the commandline it is often faster because the > database is only displaying the initial data on the screen. When you're > using something like SqlAlchemy, the app needs to get all the data and then > create objects. This is usually lightweight, but if you have some columns > with a lot of data, there will be a hit. When you profile, you'll have a > better idea if the query, the loader, of the connection is to blame. > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
