Re: [sqlalchemy] performance of SA

Mike Bayer Fri, 24 Jul 2015 11:56:29 -0700


On 7/24/15 2:44 PM, Mike Bayer wrote:

On 7/24/15 2:19 PM, Jonathon Nelson wrote:
I should preface this by saying I'm a huge fan of SQLAlchemy. Huge!However, when trying to extol the virtues of SQLAlchemy I inevitablyrun into this issue:
"But it's slow!"
My usual response to that is that, yes, it is somewhat slower thanraw MySQL or PostgreSQL or whatever DB-API you are using, but thatperformance hit (if it's present at all) is outweighed by <theeleventy billion awesome things listed here>.
I wrote some code (which I can send if it's useful) that comparesfetching 10 million rows from a table with 2 integer columns. Thecomparison is between raw psycopg2 and two variations usingSQLAlchemy: stream=False, stream=True.
raw psycopg2 is consistently in the 5.5 to 6.0 second range
SQLAlchemy is in the 20 second range.

The C extensions are in use, and this is with SQALchemy 1.0.6.
The run looks like this:

SQLAlchemy version: 1.0.6
psycopg2 version: 2.5.2 (dt dec pq3 ext)
PostgreSQL version: PostgreSQL 9.3.6 on x86_64-suse-linux-gnu,compiled by gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision212064], 64-bit
sqlalchemy.cresultproxy is available.
using psycopg2:
  10000000 rows in 5.78s (1729246.93 row/s)
using sqlalchemy, with stream=False:
  10000000 rows in 17.71s (564795.64 row/s)
using sqlalchemy, with stream=True:
  10000000 rows in 20.65s (484226.33 row/s)

I'm also seeing absolutely nothing like those times. Are you sure thatyou are actually iterating the rows you get back from psycopg2? Hereare the results on OSX with 500K rows using the "large resultsets"sample suite:

classics-MacBook-Pro:sqlalchemy classic$ python -m examples.performance--dburl postgresql://scott:tiger@localhost/test large_resultsets --num500000

Running setup once...

Tests to run: test_orm_full_objects_list, test_orm_full_objects_chunks,test_orm_bundles, test_orm_columns, test_core_fetchall,test_core_fetchmany_w_streaming, test_core_fetchmany,test_dbapi_fetchall_plus_append_objects, test_dbapi_fetchall_no_objecttest_orm_full_objects_list : Load fully tracked ORM objects into one biglist(). (500000 iterations); total time 12.009639 sectest_orm_full_objects_chunks : Load fully tracked ORM objects a chunk ata time using yield_per(). (500000 iterations); total time 9.584631 sectest_orm_bundles : Load lightweight "bundle" objects using the ORM.(500000 iterations); total time 3.525947 sectest_orm_columns : Load individual columns into named tuples using theORM. (500000 iterations); total time 2.946797 sectest_core_fetchall : Load Core result rows using fetchall. (500000iterations); total time 2.176137 sectest_core_fetchmany_w_streaming : Load Core result rows usingfetchmany/streaming. (500000 iterations); total time 2.268198 sectest_core_fetchmany : Load Core result rows using Core / fetchmany.(500000 iterations); total time 1.860707 sectest_dbapi_fetchall_plus_append_objects : Load rows using DBAPIfetchall(), generate an object for each row. (500000 iterations); totaltime 2.121502 sectest_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don'tmake any objects. (500000 iterations); total time 1.779020 sec

The very fastest for psycopg2 is 1.7 seconds, and a plain old Core loaddoes it in 2.17 seconds. That is only 20% slower, rather than the400% slower you are claiming.

Basically if you have folks claiming SQLA is slow, the answer is simple:it's not.

I would wager that any low-hanging performance fruit has already beenpicked, but what might a plan of attack look like here?
1. No idea what "stream" is, I guess you mean "stream_results"; thatwon't help speed, only memory usage, and even then only theoretically
2. Read everything inhttp://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html
3. Read, analyze, and run everything inhttp://docs.sqlalchemy.org/en/rel_1_0/orm/examples.html#examples-performance
4. datatypes, datatypes, datatypes. they matter, a lot. Theprofiling you'll do in #2 will reveal if any of these are getting inthe way
5. Fetching 10000000 rows all at once is not a realistic use case; itwill take up a large amount of memory which in turn adds a lot ofoverhead to the interpreter, because the driver buffers results, andyou can see turning that off is not very helpful as server sidecursors have their own overhead, so you should be batching.
6. To get perspective on pure C vs. pure Python, try comparing yourpsycopg2 script to the identical script, but drop in pg8000 instead.
7. Try your script in SQLA 0.7, 0.8, and 0.9. You should be able seeI've not been entirely idle in this area of concern.
--
Jon Nelson
Dyn / Senior Software Engineer
p. +1 (603) 263-8029 <tel:%2B1%20%28603%29%20263-8029>
--
You received this message because you are subscribed to the GoogleGroups "sqlalchemy" group.To unsubscribe from this group and stop receiving emails from it,send an email to [email protected]<mailto:[email protected]>.To post to this group, send email to [email protected]<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the GoogleGroups "sqlalchemy" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.To post to this group, send email to [email protected]<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] performance of SA

Reply via email to