I ran into a `StaleDataError` from PostgreSQL the other day when a client
app sent us
two update requests back-to-back without waiting for the first request
to send a response.
Given
class Mood(Base):
id = Column(BigInteger, primary_key=True, autoincrement=False,
nullable=False)
name = Column(VARCHAR(length=255), nullable=False, index=True,
unique=True)
# etc
class Avatar(Base):
id = Column(BigInteger, primary_key=True, autoincrement=False,
nullable=False)
version = Column(Integer, nullable=False, default=0)
moods = relationship(Mood, secondary=lambda: avatar_moods_table)
# etc
# The association link between avatars and moods.
avatar_moods_table = Table(
'avatar_moods',
Base.metadata,
Column('avatar_id', BigInteger, ForeignKey(Avatar.id,
ondelete='CASCADE'),
nullable=False, primary_key=True, index=True),
Column('mood_id', BigInteger, ForeignKey(Mood.id,
ondelete='CASCADE'),
nullable=False, primary_key=True),
)
The state of the Avatar before either client request was that there were
three moods,
`['Grumpy', 'Tired', 'Disgruntled']`. Both client requests' payloads
included
`moods = ['Elated']`, as well as other data that (probably) differed
between the two.
The code to update the moods looks like
orm_model.moods = [Mood.get_by_name(self.session, mood) for mood in
logic_model.moods]
where `logic_model` is the deserialized request body and `orm_model` is the
SQLAlchemy object
retrieved from the database.
The aggregated centralized log looks like
first interleaved request, 254c1046-0da701ca, at 19:09:03, which
switches moods from `['Grumpy', 'Tired', 'Disgruntled']` to `['Elated']`
Apr 15 19:09:03 [3843:MainThread] INFO [254c1046-0da701ca] REST
request: PUT http://<hostname>/avatar/<avatar_id> received...
Second interleaved request, 711bc322-df5afba6, at 19:09:04, also with
moods=`['Elated']`
Apr 15 19:09:04 [4572:MainThread] INFO [711bc322-df5afba6] REST
request: PUT http://<hostname>/avatar/<avatar_id> received...
Second request fails
Apr 15 19:09:05 [4572:MainThread] WARNI [711bc322-df5afba6] REST
request: PUT http://<hostname>/avatar/<avatar_id> failed: 400 Bad Request
(GENERAL): 'DELETE statement on table 'avatar_moods' expected to delete 3
row(s); Only 0 were matched.' (DELETE statement on table 'avatar_moods'
expected to delete 3 row(s); Only 0 were matched.)
First request succeeds
Apr 15 19:09:05 [3843:MainThread] INFO [254c1046-0da701ca] REST
response: [1570ms] PUT http://<hostname>/avatar/<avatar_id> responded: 200
OK
Here's what I think happened:
Request 1 arrived
* The AvatarRestResource started a session and a transaction
* The AvatarLogicResource read the current state of the avatar from
PostgreSQL,
which included 3 moods (presumably)
* LR checked the Version in the request body (3) against the one in the ORM
model (also version 3).
* The AvatarLogicResource set the moods on the ORM model to 1 mood,
`['Elated']`.
* When the transaction committed, just before returning from the Rest
Resource,
SQLAlchemy deleted the 3 rows in the avatar_moods association table
and inserted 1 new row, along with any other updates.
Request 2 arrived, while Request 1 was in progress.
It arrived in a different worker process or even a different server box.
* The AvatarRestResource started another session and transaction.
ACID transactions provide Isolation, so it should only have seen
the committed state of the database. Presumably, Request 1 hadn't
completed,
so none of its changes should have been visible in Request 2
* LR#2 read the ORM model and got 3 moods
* LR#2 also set the moods to `['Elated']`.
* RR#2 committed the transaction and it blew up with the error,
`DELETE statement on table 'avatar_moods' expected to delete 3 row(s);
Only 0 were matched.`
Based on my reading of some of the below links about MVCC, #1's transaction
completed first.
Transaction #2 in PostgreSQL now had visibility to those updated rows in
the DB,
so it returned "0 rows deleted" to SQLAlchemy.
This confounded SA's expectations and it blew up.
I think.
Postgres is using Multiversion Concurrency Control:
* https://devcenter.heroku.com/articles/postgresql-concurrency
* http://www.slideshare.net/robertsosinski/mvcc-robert-sosinski
* http://www.postgresql.org/docs/9.4/static/mvcc.html
* https://wiki.postgresql.org/wiki/MVCC
Finally, my question: What, if anything, could we have done to mitigate
this on the server?
Looking at dependency.py for 0.9.9, it looks like SQLAlchemy considers this
to be an unrecoverable error.
Note: I think the client app must enforce serialization of requests
if it wants a consistent worldview from its own actions.
We do have a `version` column in the Avatar table, but that's used to
handle concurrent/stale updates from two different devices.
Some requests are long running and a case could be made for accepting more
requests
while the earlier ones are still being handled—assuming no conflict—to
make the client app more responsive.
--
/George V. Reilly
PS Congratulations on releasing SQLAlchemy 1.0! I've been a happy user
since 2008.
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.