On 07/07/2015 05:48 PM, Clint Byrum wrote:
all of the call sites I checked _do not appear to resend_, they simply explode on timeout waiting for reply. This is how calling code should work and I'm ok with code in nova, cinder, et. al. being written this way, because I'd expect my messaging layer to be at least somewhat reliable
In my opinion, the calling code has better context for determining whether or not to retry. Tackling reliability issues end-to-end is often much more efficient also.
[...]
I think you'll find that once you try to make oslo.messaging handle the retrying, that with the broker simply being ack'd all the time, you risk duplicating RPC calls if you retry in a loop.
Resending the request will always risk duplicating the call (unless the caller can verify that the previous request was not executed in some call specific way). Whether or not you acknowledge the request (and whether you do it before or after the processing of the request), the response can still get lost (neither requests nor responses are currently confirmed by the broker).
There is a message id 'cache' used to try and detect (and then ignore) duplicates. It's not clear to me how effective that is in practice as it only tracks the last 16 ids for a given listener. In any case if the listener process is restarted, or if the call is redelivered to a different server in a group, then the id cache would be of no use.
The pattern is well established in RabbitMQ that acks should happen _AFTER_ the message has been consumed and thus should not be duplicated, not before.
That is the pattern for at-least-once delivery, where either processing is able to detect that a resent message was already processed or where reprocessing it is preferable to not processing it at all.
I *believe* olso.messaging (or impl_rabbit at least) was aiming for an at-most-once guarantee (i.e. avoiding duplication at the expense of dropped messages). That may be why the acknowledgement is done before processing, though since the acknowledgement is asynchronous, that only narrows the window it doesn't eliminate it.
I may of course be wrong. It would be great to have some one more qualified to comment on the intentions of the design provide some clarity.
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev