Re: [openstack-dev] [oslo][mistral] Saga of process than ack and where can we go from here...

Joshua Harlow Fri, 03 Jun 2016 09:17:41 -0700

Deja, Dawid wrote:

On Thu, 2016-05-05 at 11:08 +0700, Renat Akhmerov wrote:

On 05 May 2016, at 01:49, Mehdi Abaakouk <sil...@sileht.net
<mailto:sil...@sileht.net>> wrote:


Le 2016-05-04 10:04, Renat Akhmerov a écrit :

No problem. Let’s not call it RPC (btw, I completely agree with that).
But it’s one of the messaging patterns and hence should be under
oslo.messaging I guess, no?


Yes and no, we currently have two APIs (rpc and notification). And
personally I regret to have the notification part in oslo.messaging.

RPC and Notification are different beasts, and both are today limited
in terms of feature because they share the same driver implementation.

Our RPC errors handling is really poor, for example Nova just put
instance in ERROR when something bad occurs in oslo.messaging layer.
This enforces deployer/user to fix the issue manually.

Our Notification system doesn't allow fine grain routing of message,
everything goes into one configured topic/queue.

And now we want to add a new one... I'm not against this idea,
but I'm not a huge fan.

Thoughts from folks (mistral and oslo)?

Also, I was not at the Summit, should I conclude the Tooz+taskflow
approach (that ensure the idempotent of the application within the
library API) have not been accepted by mistral folks ?

Speaking about idempotency, IMO it’s not a central question that we
should be discussing here. Mistral users should have a choice: if they
manage to make their actions idempotent it’s excellent, in many cases
idempotency is certainly possible, btw. If no, then they know about
potential consequences.


You shouldn't mix the idempotency of the user task and the idempotency
of a Mistral action (that will at the end run the user task).
You can have your Mistral task runner implementation idempotent and just
make the workflow to use configurable in case the user task is
interrupted or badly finished even if the user task is idempotent or not.
This makes the thing very predictable. You will know for example:
* if the user task has started or not,
* if the error is due to a node power cut when the user task runs,
* if you can safely retry a not idempotent user task on an other node,
* you will not be impacted by rabbitmq restart or TCP connection issues,
* ...

With the oslo.messaging approach, everything will just end up in a
generic MessageTimeout error.

The RPC API already have this kind of issue. Applications have
unfortunately
dealt with that (and I think they want something better now).
I'm just not convinced we should add a new "working queue" API in
oslo.messaging for tasks scheduling that have the same issue we already
have with RPC.

Anyway, that's your choice, if you want rely on this poor structure,
I will
not be against, I'm not involved in Mistral. I just want everybody is
aware
of this.

And even in this case there’s usually a number
of measures that can be taken to mitigate those consequences (reruning
workflows from certain points after manually fixing problems, rollback
scenarios etc.).


taskflow allows to describe and automate this kind of workflow really
easily.

What I’m saying is: let’s not make that crucial decision now about
what a messaging framework should support or not, let’s make it more
flexible to account for variety of different usage scenarios.


I think the confusion is in the "messaging" keyword, currently
oslo.messaging
is a "RPC" framework and a "Notification" framework on top of 'messaging'
frameworks.

Messaging framework we uses are 'kombu', 'pika', 'zmq' and 'pingus'.

It’s normal for frameworks to give more rather than less.


I disagree, here we mix different concepts into one library, all concepts
have to be implemented by different 'messaging framework',
So we fortunately give less to make thing just works in the same way
with all
drivers for all APIs.

One more thing, at the summit we were discussing the possibility to
define at-most-once/at-least-once individually for Mistral tasks. This
is demanded because there cases where we need to do it, advanced users
may choose one or another depending on a task/action semantics.
However, it won’t be possible to implement w/o changes in the
underlying messaging framework.


If we goes that way, oslo.messaging users and Mistral users have to
be aware
that their job/task/action/whatever will perhaps not be called
(at-most-once)
or perhaps called twice (at-least-once).

The oslo.messaging/Mistral API and docs must be clear about this
behavior to
not having bugs open against oslo.messaging because script written
via Mistral
API is not executed as expected "sometimes".
"sometimes" == when deployers have trouble with its rabbitmq (or
whatever)
broker and even just when a deployer restart a broker node or when a TCP
issue occurs. At this end the backtrace in theses cases always trows only
oslo.messaging trace (the well known MessageTimeout...).


Also oslo.messaging is already a fragile brick used by everybody that
a very small subset of people maintain (thanks to them).

I'm afraid that adding such new API will increase the needed
maintenance for this lib while currently not many people care about
(the whole lib not the new API).

I also wonder if other project have the same needs (that always help
to design a new API).


Mehdi,

What are you proposing? Can you confirm that we should be just dealing
with this problem on our own in Mistral? If so, that works well for
us. Initially we didn’t want to switch to oslo.messaging from direct
access to RabbitMQ for this and also other reasons. But we got a
strong feedback from the community that said “you guys need to reuse
technologies from the community and hence switch to oslo.messaging”.
So we did, assuming that we would fix all needed issues in
oslo.messaging relatively soon. Now it’s been ~2 years since then and
we keep struggling with all that stuff.

When I see these discussions again and again where people try to
convince that at-least-one delivery is a bad thing I can’t participate
in them anymore. We spent a lot of time thinking about it and
experimenting with it and know all pros and cons.

Renat Akhmerov
@Nokia


Maybe this could be resolved in oslo.messaging by following one of
Python slogans /we are all responsible users here/ [1].

What I'm proposing is to let the consumer of the message decide when to
send ACK, because it knows best when to do so. I can think of scenarios
when it is required to send ACK in a middle of message process e.g.
after receiving message I want to store it in the DB before sending an
ACK and send it when message is safely stored. Having that we could
implement whatever delivery model we want in mistral on top of
oslo.messaging.

From my understanding (and some of the oslo.messaging folks can correctme if I am wrong); but they (the oslo.messaging maintainers) don't feelcomfortable allowing such a option to be made possible because of howdoing such a thing alters the principles of oslo.messaging and increasesthe complexity of the code-base (and subsequent testing, bug reports,feature support that come along with enabling such a thing).

Thus why I think the preference was to have this model (which isn'treally the `rpc` kind of model that oslo.messaging has been targeting atthat point, but is more like a work-queue) be in another library with aclear API that explicitly is targeted at this kind of model. Thusinstead of having a multi-personality codebase with hidden features likethis (say in oslo.messaging) instead it gets its own codebase and APIthat is 'just right' (or more close to being 'right') for it's concept(vs trying to stuff it into oslo.messaging).


[1] https://en.wikipedia.org/wiki/Python_syntax_and_semantics#Objects

Thanks,
Dawid Deja

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][mistral] Saga of process than ack and where can we go from here...

Reply via email to