Excerpts from Salvatore Orlando's message of 2015-02-23 04:07:38 -0800: > Lazy-Stacker summary: > I am doing some work on Neutron IPAM code for IP Allocation, and I need to > found whether it's better to use db locking queries (SELECT ... FOR UPDATE) > or some sort of non-blocking algorithm. > Some measures suggest that for this specific problem db-level locking is > more efficient even when using multi-master DB clusters, which kind of > counters recent findings by other contributors [2]... but also backs those > from others [7]. >
Thanks Salvatore, the story and data you produced is quite interesting. > > With the test on the Galera cluster I was expecting a terrible slowdown in > A-1 because of deadlocks caused by certification failures. I was extremely > disappointed that the slowdown I measured however does not make any of the > other algorithms a viable alternative. > On the Galera cluster I did not run extensive collections for A-2. Indeed > primary key violations seem to triggers db deadlock because of failed write > set certification too (but I have not yet tested this). > I run tests with 10 threads on each node, for a total of 30 workers. Some > results are available at [15]. There was indeed a slow down in A-1 (about > 20%), whereas A-3 performance stayed pretty much constant. Regardless, A-1 > was still at least 3 times faster than A-3. > As A-3's queries are mostly select (about 75% of them) use of caches might > make it a lot faster; also the algorithm is probably inefficient and can be > optimised in several areas. Still, I suspect it can be made faster than > A-1. At this stage I am leaning towards adoption db-level-locks with > retries for Neutron's IPAM. However, since I never trust myself, I wonder > if there is something important that I'm neglecting and will hit me down > the road. > The thing is, nobody should actually be running blindly with writes being sprayed out to all nodes in a Galera cluster. So A-1 won't slow down _at all_ if you just use Galera as an ACTIVE/PASSIVE write master. It won't scale any worse for writes, since all writes go to all nodes anyway. For reads we can very easily start to identify hot-spot reads that can be sent to all nodes and are tolerant of a few seconds latency. > In the medium term, there are a few things we might consider for Neutron's > "built-in IPAM". > 1) Move the allocation logic out of the driver, thus making IPAM an > independent service. The API workers will then communicate with the IPAM > service through a message bus, where IP allocation requests will be > "naturally serialized" This would rely on said message bus guaranteeing ordered delivery. That is going to scale far worse, and be more complicated to maintain, than Galera with a few retries on failover. > 2) Use 3-party software as dogpile, zookeeper but even memcached to > implement distributed coordination. I have nothing against it, and I reckon > Neutron can only benefit for it (in case you're considering of arguing that > "it does not scale", please also provide solid arguments to support your > claim!). Nevertheless, I do believe API request processing should proceed > undisturbed as much as possible. If processing an API requests requires > distributed coordination among several components then it probably means > that an asynchronous paradigm is more suitable for that API request. > If we all decide that having a load balancer sending all writes and reads to one Galera node is not acceptable for some reason, then we should consider a distributed locking method that might scale better, like ZK/etcd or the like. But I think just figuring out why we want to send all writes and reads to all nodes is a better short/medium term goal. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev