Re: [sqlalchemy] Re: Easy way to referesh .dict

Michael Bayer Sun, 05 Sep 2010 17:45:23 -0700

On Sep 5, 2010, at 7:23 PM, Thadeus Burgess wrote:

> Seems that things break when you use __dict__.... So don't use it.
> 
> Use a getattr and a setattr. If you really want, you can implement
> getitem and setitem that just wrap setattr and getattr on your model.
> 
> 
> Why doesn't sqlalchemy basemodel do this already? Everything in python
> is a dictionary, seems natural to provide dict access as an
> alternative by default.

First off, I don't know that its accurate to say "everything in Python is a 
dictionary", if you mean a dict with foo['bar'] style access.  Everything in 
Python is certainly based on a type with an attribute namespace, though, and 
our usage of descriptors is how we instrument that namespace on a user defined 
class.

If __dict__ access is supported as an end-user system of working with 
transparently persisted objects, how would you go about tracking change events 
?   Not to mention allowing stale/unloaded attributes to fire load events.   We 
use Python descriptors for this purpose, as they are designed for just this use 
case, are the simplest and perform the best.   There is the concept of 
replacing obj.__dict__ with a custom dictionary that instruments __getitem__, 
__setitem__, but then there's no way to affect the state of an object 
internally, such as when its being loaded from the DB which is an extremely 
performance critical block, without triggering those events as well when 
they're not appropriate, unless additional complex and performance-impacting 
schemes were devised to circumvent (suffice to say its been considered, long 
ago).  Subclasses of dict, even if they do nothing, already perform more poorly 
than a raw dict due to the way Python optimizes non-subclassed dictionaries.    
So we prefer not to hardwire the extra latency and complexity of that approach. 

Years ago, when there was still a hint of controversy over the __dict__ issue, 
we had an approach whereby upon load, every attribute would be populated as it 
is now into __dict__ directly, and additionally into a second, private 
dictionary.  At flush time, every single attribute on every object in the 
session would run a comparison of the "original" value versus the __dict__ 
value in order to detect changes, since we tried making the assumption that 
__dict__ might have been modified by the end user (basically, assuming the use 
case you're asking about here).    

If you've worked with Python for any amount of time you'd know that this 
approach is crushingly slow, compared to detecting only actual "set" and 
"delete" operations as events - both on the populate side as well as the "did 
anything change?" side.   Once autoflush was introduced, its especially 
extremely critical that we can detect that no changes have occured wihtin a 
session in O(1) time.   The previous approach was one of the worst examples of 
horrendous amounts of processing time being spent on an almost completely vapor 
use case, and the project suffered (the nature of which I leave that as an 
exercise for the reader) until we rewrote all that.   In the end nobody really 
needed __dict__ and it was silly that we weren't using descriptors as they were 
designed.

So in exchange for not-bone-crushingly-slow performance (actually quite fast), 
flawless change tracking, and crisp, transparent refreshing of stale attributes 
from the database, the user has to give up being able to populate and access 
__dict__ directly for regular operations. 

Suffice to say any ORM in Python you use, not to mention any other state 
management system, uses descriptors, to a lesser or greater degree depending on 
the system's reliance on intelligent state management.   The __dict__ in turn 
is how the descriptors are usually bypassed.

You can certainly provide dict-like access to your objects using the usual 
__getitem__/__setitem__ approach.   Its also possible to entirely modify how 
SQLAlchemy persists state on the object, using not __dict__ but some other 
means, and in fact, you could plug in a change-tracking dict of your own and 
wire it all up, get your __dict__ that works and fires off the events 
SQLAlchemy looks for, and get all the requisite complexity and performance 
degradation (see examples/custom_attributes/custom_management.py) ... but 
there's no reason to do that unless you were integrating with some other object 
management system (which is why we even have that extension point...I'd much 
prefer it wasn't needed).    We did an integration with Trellis and one with 
Zope securitty policies, both of which required a very open ended approach 
where __dict__ is not the usual thing we'd see.

While you may see this as bad news that __dict__ access is never going to be 
built in as a feature, I'd see it as great news - you're coming to SQLAlchemy 
over five years into the project long after we've made and resolved every dumb 
mistake imaginable, been through all kinds of serious API upheavals and drama, 
and today SQLAlchemy is a fierce, battle tested library deployed in thousands 
of environments with very few issues.   Attribute instrumentation took us a 
really fricking long time to get right, and there's still work to be done.

Regarding commit, it expires all attributes so that new data, now available 
once the transaction is new, is fetched- this is mentioned at 
http://www.sqlalchemy.org/docs/orm/session.html#committing (note the navigation 
is new for the docs).   It can be disabled for situations where concurrent 
modifications to rows are not a concern.

> 
> --
> Thadeus
> 
> 
> 
> 
> 
> On Fri, Sep 3, 2010 at 9:08 PM, Thadeus Burgess <[email protected]> wrote:
>> If I have a record object.
>> 
>> me = Person.query.get(id)
>> 
>> and I access me.__dict__ everything looks good.
>> 
>> However when I execute a db.session.commit()
>> 
>> the me.__dict__ disappears and only contains _sa_state_instance
>> 
>> The second I access an attribute of the me instance, __dict__ comes back.
>> 
>> What is the best way to always make sure the __dict__ instance is
>> always populated with the object data without knowing any of the
>> column names ahead of time ?
>> 
>> --
>> Thadeus
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Easy way to referesh .__dict__

Reply via email to

Re: [sqlalchemy] Re: Easy way to referesh .dict