On 7/24/15 2:44 PM, Mike Bayer wrote:
On 7/24/15 2:19 PM, Jonathon Nelson wrote:
I should preface this by saying I'm a huge fan of SQLAlchemy. Huge!
However, when trying to extol the virtues of SQLAlchemy I inevitably
run into this issue:
"But it's slow!"
My usual response to that is that, yes, it is somewhat slower than
raw MySQL or PostgreSQL or whatever DB-API you are using, but that
performance hit (if it's present at all) is outweighed by <the
eleventy billion awesome things listed here>.
I wrote some code (which I can send if it's useful) that compares
fetching 10 million rows from a table with 2 integer columns. The
comparison is between raw psycopg2 and two variations using
SQLAlchemy: stream=False, stream=True.
raw psycopg2 is consistently in the 5.5 to 6.0 second range
SQLAlchemy is in the 20 second range.
The C extensions are in use, and this is with SQALchemy 1.0.6.
The run looks like this:
SQLAlchemy version: 1.0.6
psycopg2 version: 2.5.2 (dt dec pq3 ext)
PostgreSQL version: PostgreSQL 9.3.6 on x86_64-suse-linux-gnu,
compiled by gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision
212064], 64-bit
sqlalchemy.cresultproxy is available.
using psycopg2:
10000000 rows in 5.78s (1729246.93 row/s)
using sqlalchemy, with stream=False:
10000000 rows in 17.71s (564795.64 row/s)
using sqlalchemy, with stream=True:
10000000 rows in 20.65s (484226.33 row/s)
I'm also seeing absolutely nothing like those times. Are you sure that
you are actually iterating the rows you get back from psycopg2? Here
are the results on OSX with 500K rows using the "large resultsets"
sample suite:
classics-MacBook-Pro:sqlalchemy classic$ python -m examples.performance
--dburl postgresql://scott:tiger@localhost/test large_resultsets --num
500000
Running setup once...
Tests to run: test_orm_full_objects_list, test_orm_full_objects_chunks,
test_orm_bundles, test_orm_columns, test_core_fetchall,
test_core_fetchmany_w_streaming, test_core_fetchmany,
test_dbapi_fetchall_plus_append_objects, test_dbapi_fetchall_no_object
test_orm_full_objects_list : Load fully tracked ORM objects into one big
list(). (500000 iterations); total time 12.009639 sec
test_orm_full_objects_chunks : Load fully tracked ORM objects a chunk at
a time using yield_per(). (500000 iterations); total time 9.584631 sec
test_orm_bundles : Load lightweight "bundle" objects using the ORM.
(500000 iterations); total time 3.525947 sec
test_orm_columns : Load individual columns into named tuples using the
ORM. (500000 iterations); total time 2.946797 sec
test_core_fetchall : Load Core result rows using fetchall. (500000
iterations); total time 2.176137 sec
test_core_fetchmany_w_streaming : Load Core result rows using
fetchmany/streaming. (500000 iterations); total time 2.268198 sec
test_core_fetchmany : Load Core result rows using Core / fetchmany.
(500000 iterations); total time 1.860707 sec
test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI
fetchall(), generate an object for each row. (500000 iterations); total
time 2.121502 sec
test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't
make any objects. (500000 iterations); total time 1.779020 sec
The very fastest for psycopg2 is 1.7 seconds, and a plain old Core load
does it in 2.17 seconds. That is only 20% slower, rather than the
400% slower you are claiming.
Basically if you have folks claiming SQLA is slow, the answer is simple:
it's not.
I would wager that any low-hanging performance fruit has already been
picked, but what might a plan of attack look like here?
1. No idea what "stream" is, I guess you mean "stream_results"; that
won't help speed, only memory usage, and even then only theoretically
2. Read everything in
http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html
3. Read, analyze, and run everything in
http://docs.sqlalchemy.org/en/rel_1_0/orm/examples.html#examples-performance
4. datatypes, datatypes, datatypes. they matter, a lot. The
profiling you'll do in #2 will reveal if any of these are getting in
the way
5. Fetching 10000000 rows all at once is not a realistic use case; it
will take up a large amount of memory which in turn adds a lot of
overhead to the interpreter, because the driver buffers results, and
you can see turning that off is not very helpful as server side
cursors have their own overhead, so you should be batching.
6. To get perspective on pure C vs. pure Python, try comparing your
psycopg2 script to the identical script, but drop in pg8000 instead.
7. Try your script in SQLA 0.7, 0.8, and 0.9. You should be able see
I've not been entirely idle in this area of concern.
--
Jon Nelson
Dyn / Senior Software Engineer
p. +1 (603) 263-8029 <tel:%2B1%20%28603%29%20263-8029>
--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.