On Mon, May 19, 2014 at 10:30 AM, Jay Pipes <jaypi...@gmail.com> wrote:
> Stackers, > > On Friday in Atlanta, I had the pleasure of moderating the database > session at the Ops Meetup track. We had lots of good discussions and heard > important feedback from operators on DB topics. > > For the record, I would not bring this point up so publicly unless I > believed it was a serious problem affecting a large segment of users. When > doing an informal survey of the users/operators in the room at the start of > the session, out of approximately 200 people in the room, only a single > person was using PostgreSQL, about a dozen were using standard MySQL > master/slave replication, and the rest were using MySQL Galera clustering. > So, this is a real issue for a large segment of the operators -- or at > least the ones at the session. :) > > Peter Boros, from Percona, was able to provide some insight on MySQL > Galera topics, and one issue came up that is likely the cause of a lot of > heartache for operators who use MySQL Galera (or Percona XtraDB Cluster). > > We were discussing whether people had seen deadlock issues [1] when using > MySQL Galera in their deployment, and were brainstorming on why deadlocks > might be seen. I had suggested that perhaps Nova's use of autoincrementing > primary keys may have been the cause. Peter pretty quickly dispatched that > notion, saying that Galera automatically handles autoincrementing keys > using managed innodb_autoincrement_increment and > innodb_autoincrement_offset config options. > > I think at that point I mentioned that there were a number of places that > were using the SELECT ... FOR UPDATE construct in Nova (in SQLAlchemy, it's > the with_lockmode('update') modification of the query object). Peter > promptly said that was a problem. MySQL Galera does not support SELECT ... > FOR UPDATE, since it has no concept of cross-node locking of records and > results are non-deterministic. > > So... what to do? > > For starters, some information on the use of with_lockmode() in Nova and > Neutron... > > Within Nova, there are actually only a few places where > with_lockmode('update') is used. Unfortunately, the use of > with_lockmode('update') is in the quota code, which tends to wrap largish > blocks of code within the Nova compute execution code. > > Within Neutron, however, the use of with_lockmode('update') is all over > the place. There are 44 separate uses of it in 11 different files. > > We have a number of options: > > 1) Stop using MySQL Galera for databases of projects that contain > with_lockmode('update') > > 2) Put a big old warning in the docs somewhere about the problem of > potential deadlocks or odd behaviour with Galera in these projects > I think we should do this for Icehouse regardless of what other paths we take. > > 3) For Nova and Neutron, remove the use of with_lockmode('update') and > instead use a coarse-grained file lock or a distributed lock manager for > those areas where we need deterministic reads or quiescence. > > 4) For the Nova db quota driver, refactor the driver to either use a > non-locking method for reservation and quota queries or move the driver out > into its own projects (or use something like Climate and make sure that > Climate uses a non-blocking algorithm for those queries...) > > Thoughts? > > -jay > > [1] http://lists.openstack.org/pipermail/openstack/2014-May/007202.html > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev