Hi neutron folks,

There is an ongoing effort to refactor some neutron DB logic to be
compatible with galera/mysql which doesn't support locking
(with_lockmode('update')).

Some code paths that used locking in the past were rewritten to retry the
operation if they detect that an object was modified concurrently.
The problem here is that all DB operations (CRUD) are performed in the
scope of some transaction that makes complex operations to be executed in
atomic manner.
For mysql the default transaction isolation level is 'REPEATABLE READ'
which means that once the code issue a query within a transaction, this
query will return the same result while in this transaction (e.g. the
snapshot is taken by the DB during the first query and then reused for the
same query).
In other words, the retry logic like the following will not work:

def allocate_obj():
    with session.begin(subtrans=True):
         for i in xrange(n_retries):
              obj = session.query(Model).filter_by(filters)
              count =
session.query(Model).filter_by(id=obj.id).update({'allocated':
True})
              if count:
                   return obj

since usually methods like allocate_obj() is called from within another
transaction, we can't simply put transaction under 'for' loop to fix the
issue.

The particular issue here is https://bugs.launchpad.net/neutron/+bug/1382064
with the proposed fix:
https://review.openstack.org/#/c/129288

So far the solution proven by tests is to change transaction isolation
level for mysql to be 'READ COMMITTED'.
The patch suggests changing the level for particular transaction where
issue occurs (per sqlalchemy, it will be reverted to engine default once
transaction is committed)
This isolation level allows the code above to see different result in each
iteration.
At the same time, any code that relies that repeated query under same
transaction gives the same result may potentially break.

So the question is: what do you think about changing the default isolation
level to READ COMMITTED for mysql project-wise?
It is already so for postgress, however we don't have much concurrent test
coverage to guarantee that it's safe to move to a weaker isolation level.

Your feedback is appreciated.

Thanks,
Eugene.
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to