[sqlalchemy] Re: using for_update returns stale data if old row already exists in identity_map

Michael Bayer Wed, 15 Apr 2009 22:32:08 -0700


On Apr 15, 2009, at 8:57 PM, dykang wrote:


>
> ah, but since the ORM actually is forced to execute the query anyway,
> why not update the object in the identity map with the correct data,
> and raise an exception if the current object is dirty? It seems bad
> procedure
> to be loading an object for update when it's already been modified.

I tried an experiment with postgres using serializable isolation, and  
as it turns out if you've already read the row into the current  
transaction, then a concurrent transaction modifies the row, then you  
request select...for update of that same row a second time in the  
first transaction, it throws a concurrent modification error.    so  
with serializable, the primary isolation mode we have in mind, the use  
case never comes up.

if you're in read committed, it re-reads the latest data from the row  
from the outside regardless of the usage of FOR UPDATE or not.  the PG  
docs don't say anything about FOR UPDATE changing the isolation  
characteristics of the SELECT...only the locking (and as you can tell  
we're using PG, not MySQL, as the baseline for "best" behavior).   So  
to really work with non-serializable isolation and have the ORM return  
data similar to what the database does, data should always be  
refreshed with every SELECT, not just those with FOR UPDATE.    the  
autoflush feature, which is generally turned on, prevents the issue of  
any pending data being overwritten - its always flushed out before the  
SELECT occurs.

This is a lot simpler to implement, that of "populate_existing" on at  
all times, and would possibly be a flag that folks could use if they  
decided they are expliclty using non-serializable isolation, would  
like to have the details of that behavior available to them (i.e. they  
really want the same row to change its value throughout the  
transaction), and they're willing to take the performance penalty of  
re-populating all attributes every time.

my reasons for not enabling this by default are that its a performance  
hit and would be disastrous to use without autoflush.    If it were to  
work in theory without autoflush, it would have to verify attributes  
as having no pending changes before populating, else raise an error.    
The performance and complexity overhead of that would be infeasable,  
not to mention that it's solving a problem that is better solved by  
choosing a stricter isolation level.  Keeping the feature specific to  
just FOR UPDATE doesn't seem to address the full need of "i want to  
work in non-serializable isolation", since any SELECT returns fresh  
data.  Maybe the FOR UPDATE case more strongly suggests the feature  
than the non FOR UPDATE case, but I can't make that decision across  
the board for all users without a broader discussion - in particular  
we support many different databases, and who knows what each one does  
with each kind of isolation they offer.   MSSQL can always be counted  
on to blow up any assumptions you've made.

there is an ancient flag on mapper() called "always_refresh" which is  
the equivalent of "populate_existing" always turned on, but in modern  
SQLA this would be better suited as a Session flag.   The behavior can  
be achieved in any current SQLA release by using a Query subclass that  
turns on its populate_existing() flag at construction time.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: using for_update returns stale data if old row already exists in identity_map

Reply via email to