Some comments inline. Salvatore
On 19 May 2014 20:32, sridhar basam <sridhar.ba...@gmail.com> wrote: > > > > On Mon, May 19, 2014 at 1:30 PM, Jay Pipes <jaypi...@gmail.com> wrote: > >> Stackers, >> >> On Friday in Atlanta, I had the pleasure of moderating the database >> session at the Ops Meetup track. We had lots of good discussions and heard >> important feedback from operators on DB topics. >> >> For the record, I would not bring this point up so publicly unless I >> believed it was a serious problem affecting a large segment of users. When >> doing an informal survey of the users/operators in the room at the start of >> the session, out of approximately 200 people in the room, only a single >> person was using PostgreSQL, about a dozen were using standard MySQL >> master/slave replication, and the rest were using MySQL Galera clustering. >> So, this is a real issue for a large segment of the operators -- or at >> least the ones at the session. :) >> >> > We are one of those operators that use Galera for replicating our mysql > databases. We used to see issues with deadlocks when having multiple mysql > writers in our mysql cluster. As a workaround we have our haproxy > configuration in an active-standby configuration for our mysql VIP. > > I seem to recall we had a lot of the deadlocks happen through Neutron. > When we go through our Icehouse testing, we will redo our multimaster mysql > setup and provide feedback on the issues we see. > The SELECT... FOR UPDATE issue is going to be a non trivial one for neutron as well. Some components, like IPAM, heavily rely on it. However, Neutron is a lot more susceptible to deadlock problems than nova because it does not implement at the moment a retry mechanism. This is something which should be added during the Juno release cycle regardless of all the other enhancement currently being planned, such as task oriented operations. > > thanks, > Sridhar > > > >> Peter Boros, from Percona, was able to provide some insight on MySQL >> Galera topics, and one issue came up that is likely the cause of a lot of >> heartache for operators who use MySQL Galera (or Percona XtraDB Cluster). >> >> We were discussing whether people had seen deadlock issues [1] when using >> MySQL Galera in their deployment, and were brainstorming on why deadlocks >> might be seen. I had suggested that perhaps Nova's use of autoincrementing >> primary keys may have been the cause. Peter pretty quickly dispatched that >> notion, saying that Galera automatically handles autoincrementing keys >> using managed innodb_autoincrement_increment and >> innodb_autoincrement_offset config options. >> >> I think at that point I mentioned that there were a number of places that >> were using the SELECT ... FOR UPDATE construct in Nova (in SQLAlchemy, it's >> the with_lockmode('update') modification of the query object). Peter >> promptly said that was a problem. MySQL Galera does not support SELECT ... >> FOR UPDATE, since it has no concept of cross-node locking of records and >> results are non-deterministic. >> >> So... what to do? >> >> For starters, some information on the use of with_lockmode() in Nova and >> Neutron... >> >> Within Nova, there are actually only a few places where >> with_lockmode('update') is used. Unfortunately, the use of >> with_lockmode('update') is in the quota code, which tends to wrap largish >> blocks of code within the Nova compute execution code. >> >> Within Neutron, however, the use of with_lockmode('update') is all over >> the place. There are 44 separate uses of it in 11 different files. >> >> I will report on a separate thread on this, so that we can have an assessment of where locking statements are used and why. > We have a number of options: >> > I thin option 0 should be to rework/redesign the code, where possible, to avoid DB-level locking at all. > >> 1) Stop using MySQL Galera for databases of projects that contain >> with_lockmode('update') >> > This looks hideous, but I am afraid this is what all people wishing to deploy Icehouse should consider doing. > >> 2) Put a big old warning in the docs somewhere about the problem of >> potential deadlocks or odd behaviour with Galera in these projects >> >> 3) For Nova and Neutron, remove the use of with_lockmode('update') and >> instead use a coarse-grained file lock or a distributed lock manager for >> those areas where we need deterministic reads or quiescence. >> > We had an attempt at implementing a sort of distributed lock for neutron: https://review.openstack.org/#/c/34695/ Beyond the implementation reservations on this patch, one thing that should be noticed, probably needless to say, is that distributed coordination is something that should never be taken in a light-hearted way. Once all the non-locking solution have been ruled out, distributed coordination among processes could be considered. In that case I think it might be better to use some OTS software rather than working out some home grown solution (I surely do not see space for a new project here) On a side note, I'm rather ignorant on python frameworks for distributed coordination... concoord? Is zookeper something that should be ruled out because of language restrictions? > >> 4) For the Nova db quota driver, refactor the driver to either use a >> non-locking method for reservation and quota queries or move the driver out >> into its own projects (or use something like Climate and make sure that >> Climate uses a non-blocking algorithm for those queries...) >> >> Thoughts? >> >> -jay >> >> [1] http://lists.openstack.org/pipermail/openstack/2014-May/007202.html >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev