On Tue, Jun 17, 2014 at 4:36 AM, Matthew Booth <mbo...@redhat.com> wrote: > On 17/06/14 00:28, Joshua Harlow wrote: >> So this is a reader/write lock then? >> >> I have seen https://github.com/python-zk/kazoo/pull/141 come up in the >> kazoo (zookeeper python library) but there was a lack of a maintainer for >> that 'recipe', perhaps if we really find this needed we can help get that >> pull request 'sponsored' so that it can be used for this purpose? >> >> >> As far as resiliency, the thing I was thinking about was how correct do u >> want this lock to be? >> >> If u say go with memcached and a locking mechanism using it this will not >> be correct but it might work good enough under normal usage. So that¹s why >> I was wondering about what level of correctness do you want and what do >> you want to happen if a server that is maintaining the lock record dies. >> In memcaches case this will literally be 1 server, even if sharding is >> being used, since a key hashes to one server. So if that one server goes >> down (or a network split happens) then it is possible for two entities to >> believe they own the same lock (and if the network split recovers this >> gets even weirder); so that¹s what I was wondering about when mentioning >> resiliency and how much incorrectness you are willing to tolerate. > > From my POV, the most important things are: > > * 2 nodes must never believe they hold the same lock > * A node must eventually get the lock > > I was expecting to implement locking on all three backends as long as > they support it. I haven't looked closely at memcached, but if it can > detect a split it should be able to have a fencing race with the > possible lock holder before continuing. This is obviously undesirable, > as you will probably be fencing an otherwise correctly functioning node, > but it will be correct.
There's a team working on a pluggable library for distributed coordination: http://git.openstack.org/cgit/stackforge/tooz Doug > > Matt > >> >> -----Original Message----- >> From: Matthew Booth <mbo...@redhat.com> >> Organization: Red Hat >> Date: Friday, June 13, 2014 at 1:40 AM >> To: Joshua Harlow <harlo...@yahoo-inc.com>, "OpenStack Development Mailing >> List (not for usage questions)" <openstack-dev@lists.openstack.org> >> Subject: Re: [openstack-dev] [nova] Distributed locking >> >>> On 12/06/14 21:38, Joshua Harlow wrote: >>>> So just a few thoughts before going to far down this path, >>>> >>>> Can we make sure we really really understand the use-case where we think >>>> this is needed. I think it's fine that this use-case exists, but I just >>>> want to make it very clear to others why its needed and why distributing >>>> locking is the only *correct* way. >>> >>> An example use of this would be side-loading an image from another >>> node's image cache rather than fetching it from glance, which would have >>> very significant performance benefits in the VMware driver, and possibly >>> other places. The copier must take a read lock on the image to prevent >>> the owner from ageing it during the copy. Holding a read lock would also >>> assure the copier that the image it is copying is complete. >>> >>>> This helps set a good precedent for others that may follow down this >>>> path >>>> that they also clearly explain the situation, how distributed locking >>>> fixes it and all the corner cases that now pop-up with distributed >>>> locking. >>>> >>>> Some of the questions that I can think of at the current moment: >>>> >>>> * What happens when a node goes down that owns the lock, how does the >>>> software react to this? >>> >>> This can be well defined according to the behaviour of the backend. For >>> example, it is well defined in zookeeper when a node's session expires. >>> If the lock holder is no longer a valid node, it would be fenced before >>> deleting its lock, allowing other nodes to continue. >>> >>> Without fencing it would not be possible to safely continue in this case. >>> >>>> * What resources are being locked; what is the lock target, what is its >>>> lifetime? >>> >>> These are not questions for a locking implementation. A lock would be >>> held on a name, and it would be up to the api user to ensure that the >>> protected resource is only used while correctly locked, and that the >>> lock is not held longer than necessary. >>> >>>> * What resiliency do you want this lock to provide (this becomes a >>>> critical question when considering memcached, since memcached is not >>>> really the best choice for a resilient distributing locking backend)? >>> >>> What does resiliency mean in this context? We really just need the lock >>> to be correct >>> >>>> * What do entities that try to acquire a lock do when they can't acquire >>>> it? >>> >>> Typically block, but if a use case emerged for trylock() it would be >>> simple to implement. For example, in the image side-loading case we may >>> decide that if it isn't possible to immediately acquire the lock it >>> isn't worth waiting, and we just fetch it from glance anyway. >>> >>>> A useful thing I wrote up a while ago, might still be useful: >>>> >>>> https://wiki.openstack.org/wiki/StructuredWorkflowLocks >>>> >>>> Feel free to move that wiki if u find it useful (its sorta a high-level >>>> doc on the different strategies and such). >>> >>> Nice list of implementation pros/cons. >>> >>> Matt >>> >>>> >>>> -Josh >>>> >>>> -----Original Message----- >>>> From: Matthew Booth <mbo...@redhat.com> >>>> Organization: Red Hat >>>> Reply-To: "OpenStack Development Mailing List (not for usage questions)" >>>> <openstack-dev@lists.openstack.org> >>>> Date: Thursday, June 12, 2014 at 7:30 AM >>>> To: "OpenStack Development Mailing List (not for usage questions)" >>>> <openstack-dev@lists.openstack.org> >>>> Subject: [openstack-dev] [nova] Distributed locking >>>> >>>>> We have a need for a distributed lock in the VMware driver, which I >>>>> suspect isn't unique. Specifically it is possible for a VMware >>>>> datastore >>>>> to be accessed via multiple nova nodes if it is shared between >>>>> clusters[1]. Unfortunately the vSphere API doesn't provide us with the >>>>> primitives to implement robust locking using the storage layer itself, >>>>> so we're looking elsewhere. >>>>> >>>>> The closest we seem to have in Nova currently are service groups, which >>>>> currently have 3 implementations: DB, Zookeeper and Memcached. The >>>>> service group api currently provides simple membership, but for locking >>>>> we'd be looking for something more. >>>>> >>>>> I think the api we'd be looking for would be something along the lines >>>>> of: >>>>> >>>>> Foo.lock(name, fence_info) >>>>> Foo.unlock(name) >>>>> >>>>> Bar.fence(fence_info) >>>>> >>>>> Note that fencing would be required in this case. We believe we can >>>>> fence by terminating the other Nova's vSphere session, but other >>>>> options >>>>> might include killing a Nova process, or STONITH. These would be >>>>> implemented as fencing drivers. >>>>> >>>>> Although I haven't worked through the detail, I believe lock and unlock >>>>> would be implementable in all 3 of the current service group drivers. >>>>> Fencing would be implemented separately. >>>>> >>>>> My questions: >>>>> >>>>> * Does this already exist, or does anybody have patches pending to do >>>>> something like this? >>>>> * Are there other users for this? >>>>> * Would service groups be an appropriate place, or a new distributed >>>>> locking class? >>>>> * How about if we just used zookeeper directly in the driver? >>>>> >>>>> Matt >>>>> >>>>> [1] Cluster ~= hypervisor >>>>> -- >>>>> Matthew Booth >>>>> Red Hat Engineering, Virtualisation Team >>>>> >>>>> Phone: +442070094448 (UK) >>>>> GPG ID: D33C3490 >>>>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 >>>>> >>>>> _______________________________________________ >>>>> OpenStack-dev mailing list >>>>> OpenStack-dev@lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>> >>> >>> -- >>> Matthew Booth >>> Red Hat Engineering, Virtualisation Team >>> >>> Phone: +442070094448 (UK) >>> GPG ID: D33C3490 >>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 >> > > > -- > Matthew Booth > Red Hat Engineering, Virtualisation Team > > Phone: +442070094448 (UK) > GPG ID: D33C3490 > GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev