Re: [sqlalchemy] Improving performance for ORM queries: skip IdentityMap / create detached instances

Michael Bayer Fri, 07 Dec 2012 12:45:48 -0800

On Dec 7, 2012, at 3:01 PM, Theo Nonsense wrote:

> I like using the ORM to query and have results translated to objects.  I'm 
> currently using declarative for mapping and I'm trying to figure out a good 
> way to ignore the overhead of the IdentityMap and other ORM niceties when 
> they're not needed.  Specifically, when dealing with a relatively low number 
> of results (~30k), the overhead of mapper._instance(), identity.add(), 
> mapper.populate_state(), and other functions that generally keep track of the 
> state / changes of an object is out weighing the benefit for some cases.  In 
> circumstances where the results are retrieved, but never need a database 
> connection after that point, I'd like to be able to avoid that overhead.  In 
> other words, I'd like to have simple objects with business logic that are 
> used throughout the codebase, but I also want to use the ORM wherever it is 
> not causing a performance bottleneck.  Something like:
> 
> Session.query(User).options(instrument_results=False, 
> create_detached=True).all()
> 
> This way whenever there is some location where the results do not need to be 
> tracked, I can just specify it in the query and not incur the cost of the 
> tracking.  This is just an example.  I'm looking for ways to achieve the same 
> goal.  They don't necessarily have to be args to .options().
> 
> - The first option is to use the ORM to build the queries, but issue them 
> directly through the Core / Session.execute().  The results would then be 
> translated to objects manually, which is the first dislike with this 
> approach.  The ORM already knows how to create the objects.  Also, I'd like 
> to use the same objects as the ORM ones so that the business logic can all 
> live in the same place.  However, creating the ORM objects means that they'll 
> be instrumented, which I'd like to avoid.
> 
> Also, I've seen a few other posts here and on StackExchange regarding the 
> notion of read-only or long-lived objects, but none seem to be what I'm 
> looking for.
> 
> - Custom Mapper / ClassManager / Instrumentation manager for immutable domain 
> models - https://github.com/andreypopp/saimmutable
> 
> This approach is interesting, but doesn't seem to allow toggling for queries 
> / loads that are performance bottlenecks.  I'd like to be able to only enable 
> the quicker / simpler path when needed.  I suppose I could have a table 
> mapped to two different classes through different mappers.  One mapper is the 
> default one and one that ignores instrumentation.  It might be possible to 
> make one a sub-class of the other or provide a mixin for business logic.  Or 
> it might be possible to have one class mapped through two different mappers?  
> Even if this all worked, it would mean multiple classes for each table and 
> doesn't avoid the overhead of the IdentityMap.
> 
> - Detaching the instances from the session - 
> https://groups.google.com/forum/?fromgroups=#!searchin/sqlalchemy/detached/sqlalchemy/8rFy5JGGfeo/IN28lfg-Je8J
> 
> This approach incurs the cost of the IdentityMap and Instrumentation when 
> translating the results.  By the time expunge() can be called, it is too late.


Well there's a bit of a contradiction here, you're saying, you don't want the 
identity map or mapper._instance() or any of that, but then you're saying, "the 
ORM already knows how to create the objects".    I'd advise a deep dive into 
the mechanics to learn intimately how that all works.  In particular, the 
identity map is extremely central to how relationship loading works, both eager 
loading where such a construct is required, as well as lazy loading, where it 
provides a critical performance boost by allowing objects that are already 
present to be used without any SQL, or at least without being loaded 
redundantly.  Instrumentation is required for lazy loading - without lazy 
loading, you'd need to ensure that all queries occur up front for all 
attributes.

If you don't like the performance hit of identity map, less effort would be, 
contribute one for us written in C.   Or see if pypy can help.   

I would note that Query can load individual columns, where you do get to skip 
all the overhead of object loads, and you get back a named-tuple-like object.  
So if you don't care about relationship loading and just want tuple-like 
objects, that mechanism is there right now, and it wouldn't be much effort at 
all to add a helper that expands a given mapped object into it's individual 
per-column attributes.

It's really relationships that require a lot of the complexity to loading.     
Other ORMs have the approach where a relationship attribute basically 
lazy-loads the related collection every time.   SQLAlchemy's approach saves on 
SQL as an already-loaded object caches its related collections and object 
references.















> 
> ----
> 
> It seems that at the very least the Session would need a way to know to 
> ignore the IdentityMap and the mapper would need a way to know to ignore 
> instrumentation.  Any thoughts on how to elegantly solve the problem?  Is 
> there a way to tell the Session to create detached instances, possibly 
> through before_attach()?  Maybe it is possible to have a custom mapper that 
> knows how to ignore instrumentation and identity mapping for specific results 
> based on a flag?
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To view this discussion on the web visit 
> https://groups.google.com/d/msg/sqlalchemy/-/DL0FLhGQYKYJ.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Improving performance for ORM queries: skip IdentityMap / create detached instances

Reply via email to