Re: [sqlalchemy] Is ORM the right choice for large scale data manipulation?

Michael Bayer Wed, 07 Jul 2010 11:59:49 -0700

On Jul 7, 2010, at 2:10 PM, Krishnakant Mane wrote:

> Hello all,
> I will be interested to know if using ORM for a large scale data operation is 
> the right approach when it comes to scalability.
> Has SQLAlchemy been put to test ever for a select query which involves 
> getting thousands of records and specially when joins are involved?
> I have a financial software which needs such a system.
> There will be lot of joins and will involve lot of complex queries.
> If not then should I use the layer directly below the ORM?
> Should I use some kind of query building tools which come with sqlalchemy?


The ORMs speed is only impacted by those operations that occur primarily among 
in-Python objects, that is, loading many objects into memory, persisting the 
state of a large set of in-memory objects back to the database.    Queries 
which are resource intensive on the database side are not impacted by the usage 
of the ORM.  So the performance impact of specific joins is not affected by 
whether or not you're using the ORM, since the speed impact of a join occurs 
within the database, not the manipulation of in-Python data.   

The ORM fetches rows more slowly than a raw ResultProxy or DBAPI cursor.  If 
you are pushing tens of thousands of rows of data in one grab out onto a socket 
or filehandle, bypassing the ORM can be used if the observed performance in a 
particular situation is too slow.  

The ORM can also become cumbersome is if you are trying to bulk insert tens of 
thousands, hundreds of thousands, or millions of rows.  The ORM fetches back 
state information about all changes persisted to the database, so that 
subsequent data operations in the flush can be populated with dependent state 
and so that post-flush, the in-Python datamodel represents the current state of 
the database (or will load that state as requested).   The ability of the ORM 
to do this means you can write complex data manipulation code in an extremely 
succinct and simple way, but you're offloading more of the effort to the Python 
interpreter to organize the operation, and also adding the requirement that 
many more SQL statements must be emitted as the "bulk" style of execution does 
not correspond well to individually tailored attribute lists nor does allow 
state information to be passed back.   This is a tradeoff that would need to be 
considered on a case by case basis.

Virtually everyone who wants to receive the huge benefits of using the ORM does 
so.   For those areas of the application which, after testing with the ORM, are 
observed to need more raw speed which can't be gained by tweaking the approach 
(tweaking here includes:  using eager loading, using caching, limiting how much 
data is flushed at a time, etc.),  they can be modified to bypass the ORM and 
instead use the more verbose "sql expression" style.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Is ORM the right choice for large scale data manipulation?

Reply via email to