> -----Original Message----- > From: Clint Byrum [mailto:cl...@fewbar.com] > Sent: Thursday, November 13, 2014 8:00 PM > To: openstack-dev > Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops > > Excerpts from Zane Bitter's message of 2014-11-13 09:55:43 -0800: > > On 13/11/14 09:58, Clint Byrum wrote: > > > Excerpts from Zane Bitter's message of 2014-11-13 05:54:03 -0800: > > >> On 13/11/14 03:29, Murugan, Visnusaran wrote: > > >>> Hi all, > > >>> > > >>> Convergence-POC distributes stack operations by sending resource > > >>> actions over RPC for any heat-engine to execute. Entire stack > > >>> lifecycle will be controlled by worker/observer notifications. > > >>> This distributed model has its own advantages and disadvantages. > > >>> > > >>> Any stack operation has a timeout and a single engine will be > > >>> responsible for it. If that engine goes down, timeout is lost > > >>> along with it. So a traditional way is for other engines to > > >>> recreate timeout from scratch. Also a missed resource action > > >>> notification will be detected only when stack operation timeout > happens. > > >>> > > >>> To overcome this, we will need the following capability: > > >>> > > >>> 1.Resource timeout (can be used for retry) > > >> > > >> I don't believe this is strictly needed for phase 1 (essentially we > > >> don't have it now, so nothing gets worse). > > >> > > > > > > We do have a stack timeout, and it stands to reason that we won't > > > have a single box with a timeout greenthread after this, so a > > > strategy is needed. > > > > Right, that was 2, but I was talking specifically about the resource > > retry. I think we agree on both points. > > > > >> For phase 2, yes, we'll want it. One thing we haven't discussed > > >> much is that if we used Zaqar for this then the observer could > > >> claim a message but not acknowledge it until it had processed it, > > >> so we could have guaranteed delivery. > > >> > > > > > > Frankly, if oslo.messaging doesn't support reliable delivery then we > > > need to add it. > > > > That is straight-up impossible with AMQP. Either you ack the message > > and risk losing it if the worker dies before processing is complete, > > or you don't ack the message until it's processed and you become a > > blocker for every other worker trying to pull jobs off the queue. It > > works fine when you have only one worker; otherwise not so much. This > > is the crux of the whole "why isn't Zaqar just Rabbit" debate. > > > > I'm not sure we have the same understanding of AMQP, so hopefully we can > clarify here. This stackoverflow answer echoes my understanding: > > http://stackoverflow.com/questions/17841843/rabbitmq-does-one- > consumer-block-the-other-consumers-of-the-same-queue > > Not ack'ing just means they might get retransmitted if we never ack. It > doesn't block other consumers. And as the link above quotes from the > AMQP spec, when there are multiple consumers, FIFO is not guaranteed. > Other consumers get other messages. > > So just add the ability for a consumer to read, work, ack to oslo.messaging, > and this is mostly handled via AMQP. Of course that also likely means no > zeromq for Heat without accepting that messages may be lost if workers die. > > Basically we need to add something that is not "RPC" but instead "jobqueue" > that mimics this: > > http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messagin > g/rpc/dispatcher.py#n131 > > I've always been suspicious of this bit of code, as it basically means that if > anything fails between that call, and the one below it, we have lost contact, > but as long as clients are written to re-send when there is a lack of reply, > there shouldn't be a problem. But, for a job queue, there is no reply, and so > the worker would dispatch, and then acknowledge after the dispatched call > had returned (including having completed the step where new messages are > added to the queue for any newly-possible children). > > Just to be clear, I believe what Zaqar adds is the ability to peek at a > specific > message ID and not affect it in the queue, which is entirely different than > ACK'ing the ones you've already received in your session. > > > Most stuff in OpenStack gets around this by doing synchronous calls > > across oslo.messaging, where there is an end-to-end ack. We don't want > > that here though. We'll probably have to make do with having ways to > > recover after a failure (kick off another update with the same data is > > always an option). The hard part is that if something dies we don't > > really want to wait until the stack timeout to start recovering. > > > > I fully agree. Josh's point about using a coordination service like Zookeeper > to > maintain liveness is an interesting one here. If we just make sure that all > the > workers that have claimed work off the queue are alive, that should be > sufficient to prevent a hanging stack situation like you describe above. > > > > Zaqar should have nothing to do with this and is, IMO, a poor choice > > > at this stage, though I like the idea of using it in the future so > > > that we can make Heat more of an outside-the-cloud app. > > > > I'm inclined to agree that it would be hard to force operators to > > deploy Zaqar in order to be able to deploy Heat, and that we should > > probably be cautious for that reason. > > > > That said, from a purely technical point of view it's not a poor > > choice at all - it has *exactly* the semantics we want (unlike AMQP), > > and at least to the extent that the operator wants to offer Zaqar to > > users anyway it completely eliminates a whole backend that they would > > otherwise have to deploy. It's a tragedy that all of OpenStack has not > > been designed to build upon itself in this way and it causes me > > physical pain to know that we're about to perpetuate it. > > > > >>> 2.Recover from engine failure (loss of stack timeout, resource > > >>> action > > >>> notification) > > >>> > > >>> Suggestion: > > >>> > > >>> 1.Use task queue like celery to host timeouts for both stack and > resource. > > >> > > >> I believe Celery is more or less a non-starter as an OpenStack > > >> dependency because it uses Kombu directly to talk to the queue, vs. > > >> oslo.messaging which is an abstraction layer over Kombu, Qpid, > > >> ZeroMQ and maybe others in the future. i.e. requiring Celery means > > >> that some users would be forced to install Rabbit for the first time. > > >> > > >> One option would be to fork Celery and replace Kombu with > > >> oslo.messaging as its abstraction layer. Good luck getting that > > >> maintained though, since Celery _invented_ Kombu to be it's > abstraction layer. > > >> > > > > > > A slight side point here: Kombu supports Qpid and ZeroMQ. > > > Oslo.messaging > > > > You're right about Kombu supporting Qpid, it appears they added it. I > > don't see ZeroMQ on the list though: > > > > > http://kombu.readthedocs.org/en/latest/userguide/connections.html#tran > > sport-comparison > > > > They, confusingly, call it zmq, and it may not be in a recent release: > > https://github.com/celery/kombu/blob/master/kombu/transport/zmq.py > > > > is more about having a unified API than a set of magic backends. It > > > actually boggles my mind why we didn't just use kombu (cue 20 > > > reactions with people saying it wasn't EXACTLY right), but I think > > > we're committed > > > > Well, we also have to take into account the fact that Qpid support was > > added only during the last 9 months, whereas oslo.messaging was > > implemented 3 years ago and time travel hasn't been invented yet (for > > any definition of 'yet'). > > > > Go back in time 3 years ago, and perhaps we could have done all the work > we've done in kombu. Hindsight though. > > > > to oslo.messaging now. Anyway, celery would need no such refactor, > > > as kombu would be able to access the same bus as everything else just > fine. > > > > Interesting, so that would make it easier to get Celery added to the > > global requirements, although we'd likely still have headaches to deal > > with around configuration. > > > > Yeah, I'm not advocating for celery, just pointing out that it has become more > like what we already deploy. :) > > > >>> 2.Poll database for engine failures and restart timers/ retrigger > > >>> resource retry (IMHO: This would be a traditional and weighs > > >>> heavy) > > >>> > > >>> 3.Migrate heat to use TaskFlow. (Too many code change) > > >> > > >> If it's just handling timed triggers (maybe this is closer to #2) > > >> and not migrating the whole code base, then I don't see why it > > >> would be a big change (or even a change at all - it's basically new > functionality). > > >> I'm not sure if TaskFlow has something like this already. If not we > > >> could also look at what Mistral is doing with timed tasks and see > > >> if we could spin some of it out into an Oslo library. > > >> > > > > > > I feel like it boils down to something running periodically checking > > > for scheduled tasks that are due to run but have not run yet. I > > > wonder if we can actually look at Ironic for how they do this, > > > because Ironic polls power state of machines constantly, and uses a > > > hash ring to make sure only one conductor is polling any one machine > > > at a time. If we broke stacks up into a hash ring like that for the > > > purpose of singleton tasks like timeout checking, that might work out > nicely. > > > > +1 for something like this, and +2 if we can get it from a library we > > don't have to write ourselves (whether it be TaskFlow or something > > spun out of Mistral or Ironic into Oslo). > > > > Right, those things are fairly generic and would definitely fit nicely in a > library. > > So, the simplest possible solution, I think, is to lock resource id + graph > version. Since we are scared of Zookeeper, we'll need a periodic job in the > engines that looks for stale locks, or we have to wait for another stack > operation to check for them.
There is spec with something similar https://review.openstack.org/#/c/122597/ However I'd rather do convergence in a way where we won't have to monitor that. I mean in a way we don't care how many engines work as long as at least one of them works. > _______________________________________________ > OpenStack-dev mailing list > OpenStackfirstname.lastname@example.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStackemail@example.com http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev