[sqlalchemy] Re: using for_update returns stale data if old row already exists in identity_map

dykang Wed, 15 Apr 2009 17:57:22 -0700

On Apr 15, 5:37 pm, Michael Bayer <[email protected]> wrote:
> On Apr 15, 2009, at 8:13 PM, dykang wrote:
>
>
>
>
>
> > The following code is a simple reproduction of the issue.
>
> > # create your metadata and session
> > import sqlalchemy
>
> > table = sqlalchemy.Table (
> >        'test_table', meta,
> >        sqlalchemy.Column('id', sqlalchemy.Integer, primary_key=True),
> >        sqlalchemy.Column('foo', sqlalchemy.Integer),
> >        mysql_engine='InnoDB'
> > )
>
> > class Bar(object):
> >        pass
>
> > mapper =sqlalchemy.orm.mapper(Bar, table)
> > meta.create_all()
>
> > row1 = Bar()
> > row1.foo = 1
> > session.begin()
> > session.save(row1)
> > session.flush()
> > s = table.update(table.c.foo == 1, values={table.c.foo: 2})
> > session.connection(Bar).execute(s)
> > session.commit()
>
> > s = table.select(id == 1, for_update=True)
> > row2 = session.query(Bar).from_statement(s).all()[0]
> > print row2.foo
>
> I cant reproduce that.  The session expires all data after commit().    
> How are you creating your session ?
>
>
>
> >    I am using sqlalchemy 0.4.7 and have tried it in 0.4.8 as well. I
> > have not tried updating to 0.5.x yet, as I have some code that I'm not
> > ready to migrate yet. Am I doing something wrong, or do I have a bad
> > expectation? when using for_update, I do not expect to ever get stale
> > data.
>
> that explains that.  0.4 doesn't include expire-on-commit, so you  
> should upgrade.  Otherwise, call session.expire_all() if you're stuck  
> with 0.4 to see the results of your non-ORM UPDATE statements within  
> your current session.  the usage of for_update has no effect on this  
> behavior, it only determines if SQLA sends out a FOR UPDATE within the  
> SELECT, thereby placing a read/write lock on the row.

since I am using isolation level "read committed" this actually
happens even if the update
is done in  a different process. it's obviously less likely to happen,
but it can.
you can reproduce it by

changing bottom part to this and following the instructions in the
comments
row1 = Bar()
row1.foo = 1
session.begin()
session.save(row1)
session.flush()
#s = table.update(table.c.foo == 1, values={table.c.foo: 2})
#session.connection(Bar).execute(s)
session.commit()

import pdb;pdb.set_trace()
# in another window, enter mysql and run "update test_table set
foo=2"; commit;
# then in this window, type "continue"
s = table.select(id == 1, for_update=True)
row2 = session.query(Bar).from_statement(s).all()[0]
print row2.foo



>
> >    In fact, since I am using transaction isolation level "read
> > committed" even without the for_update, I would want to get the
> > correct, updated value. Since the identity_map is NOT a cache, and the
> > select statement is always executed, I would expect the updated values
> > to always be returned.
>
> the ORM is always going to use the values in the identity map within  
> the session, and it is a cache in some ways.   Its a cache of data as  
> it exists within the current transaction, since in the typical  
> isolated transaction, there's no need to reload data until the  
> transaction is complete.

in a situation where you load for update, you should always get up to
date data.
i believe that even in a repeatable read scenario, if I executed the
query with a locking read,
it would return me the most up to date data, and not a repeated read.

>
> In this sense, your usage of a non-isolated transaction mode (and the  
> expectation of receiving rows that way) is somewhat incompatible with  
> SQLAlchemy's notion of a unit of work.    However, there's a "reload  
> fresh data every time" mode if you use the populate_existing() method  
> on Query, which is provided for those few folks who want to work that  
> way (not many folks want to work this way...I can't see what  
> advantages there are to working non-isolated other than evading  
> deadlocks).    you can add a method to Session which just returns a  
> Query object with populate_existing() already set, if you like.

I was, indeed, avoiding deadlocks. I tend to try to keep the time that
I hold a
lock as short as possible. This usually means not locking a row until
I know,
for sure, that I am going to update it.

>
> The reason you wouldn't want to use populate_existing every time is  
> because it will blow away any changes pending on your attributes,  
> which is why its never a default behavior.       but if you're trying  
> to do a total read-only thing, it might fit the bill.

ah, but since the ORM actually is forced to execute the query anyway,
why not update the object in the identity map with the correct data,
and raise an exception if the current object is dirty? It seems bad
procedure
to be loading an object for update when it's already been modified.

The situation where I get stale data with for_update=True is a real
issue though.
As far as I can tell, this should never happen.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---
[sqlalchemy] Re: using for_update returns stale data if old row already exists in identity_map

Reply via email to