Re: [openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey

Alex Glikson Wed, 30 Oct 2013 11:15:07 -0700

There is a ZK-backed driver in Nova service heartbeat mechanism (
https://blueprints.launchpad.net/nova/+spec/zk-service-heartbeat) -- would 
be interesting to know whether it is widely used (might be worth asking at 
the general ML, or user groups). There have been also discussions on using 
it for other purposes (some listed towards the bottom at 
https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat). While I am not 
aware of any particular progress with implementing any of them, I think 
they still make sense and could be useful.

Regards,
Alex

From:   Clint Byrum <cl...@fewbar.com>
To:     openstack-dev <openstack-dev@lists.openstack.org>, 
Date:   30/10/2013 07:45 PM
Subject:        [openstack-dev] [Heat] Locking and ZooKeeper - a space 
oddysey

So, recently we've had quite a long thread in gerrit regarding locking
in Heat:

https://review.openstack.org/#/c/49440/

In the patch, there are two distributed lock drivers. One uses SQL,
and suffers from all the problems you might imagine a SQL based locking
system would. It is extremely hard to detect dead lock holders, so we
end up with really long timeouts. The other is ZooKeeper.

I'm on record as saying we're not using ZooKeeper. It is a little
embarrassing to have taken such a position without really thinking things
through. The main reason I feel this way though, is not because ZooKeeper
wouldn't work for locking, but because I think locking is a mistake.

The current multi-engine paradigm has a race condition. If you have a
stack action going on, the state is held in the engine itself, and not
in the database, so if another engine starts working on another action,
they will conflict.

The locking paradigm is meant to prevent this. But I think this is a
huge mistake.

The engine should store _all_ of its state in a distributed data store
of some kind. Any engine should be aware of what is already happening
with the stack from this state and act accordingly. That includes the
engine currently working on actions. When viewed through this lense,
to me, locking is a poor excuse for serializing the state of the engine
scheduler.

It feels like TaskFlow is the answer, with an eye for making sure
TaskFlow can be made to work with distributed state. I am not well
versed on TaskFlow's details though, so I may be wrong. It worries me
that TaskFlow has existed a while and doesn't seem to be solving real
problems, but maybe I'm wrong and it is actually in use already.

Anyway, as a band-aid, we may _have_ to do locking. For that, ZooKeeper
has some real advantages over using the database. But there is hesitance
because it is not widely supported in OpenStack. What say you, OpenStack
community? Should we keep ZooKeeper out of our.. zoo?

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey

Reply via email to