Re: [openstack-dev] [oslo] [messaging] 'retry' option

2014-06-30 Thread Gordon Sim

On 06/28/2014 10:49 PM, Mark McLoughlin wrote:

On Fri, 2014-06-27 at 17:02 +0100, Gordon Sim wrote:

A question about the new 'retry' option. The doc says:

  By default, cast() and call() will block until the
  message is successfully sent.

What does 'successfully sent' mean here?


Unclear, ambiguous, probably driver dependent etc.

The 'blocking' we're talking about here is establishing a connection
with the broker. If the connection has been lost, then cast() will block
until the connection has been re-established and the message 'sent'.


Understood, but to my mind, that is really an implementation detail.


  Does it mean 'written to the wire' or 'accepted by the broker'?

For the impl_qpid.py driver, each send is synchronous, so it means
accepted by the broker[1].

What does the impl_rabbit.py driver do? Does it just mean 'written to
the wire', or is it using RabbitMQ confirmations to get notified when
the broker accepts it (standard 0-9-1 has no way of doing this).


I don't know, but it would be nice if someone did take the time to
figure it out and document it :)


Having googled around a bit, it appears that kombu v3.* has a 
'confirm_publish' transport option when using the 'pyamqp' transport. 
That isn't available in the 2.* versions, which appear to be what is 
used in oslo.messaging, and I can't find that option specified anywhere 
either in the oslo.messaging codebase.


Running a series of casts using the latest impl_rabbit.py driver and 
examining the data on the wire also shows no confirms being sent.


So for impl_rabbit, the send is not acknowledged, but the delivery to 
consumers is. For impl_qpid its the other way round; the send is 
acknowledged but the delivery to consumers is not (though a prefetch of 
1 is used limiting the loss to one message).



Seriously, some docs around the subtle ways that the drivers differ from
one another would be helpful ... particularly if it exposed incorrect
assumptions API users are currently making.


I'm happy to try and contribute to that.


If the intention is to block until accepted by the broker that has
obvious performance implications. On the other hand if it means block
until written to the wire, what is the advantage of that? Was that a
deliberate feature or perhaps just an accident of implementation?

The use case for the new parameter, as described in the git commit,
seems to be motivated by wanting to avoid the blocking when sending
notifications. I can certainly understand that desire.

However, notifications and casts feel like inherently asynchronous
things to me, and perhaps having/needing the synchronous behaviour is
the real issue?


It's not so much about sync vs async, but a failure mode. By default, if
we lose our connection with the broker, we wait until we can
re-establish it rather than throwing exceptions (requiring the API
caller to have its own retry logic) or quietly dropping the message.


Even when you have no failure, your calling thread has to wait until the 
point the send is deemed successful before returning. So it is 
synchronous with respect to whatever that success criteria is.


In the case where success is deemed to be acceptance by the broker 
(which is the case for the impl_qpid.py driver at present, whether 
intentional or not), the call is fully synchronous.


If on the other hand success is merely writing the message to the wire, 
then any failure may well cause message loss regardless of the retry 
option. The reconnect and retry in this case is only of limited value. 
It can avoid certain losses, but not others.



The use case for ceilometer is to allow its RPCPublisher to have a
publishing policy - block until the samples have been sent, queue (in an
in-memory, fixed-length queue) if we don't have a connection to the
broker, or drop it if we don't have a connection to the broker.

   https://review.openstack.org/77845

I do understand the ambiguity around what message delivery guarantees
are implicit in cast() isn't ideal, but that's not what adding this
'retry' parameter was about.


Sure, I understand that. The retry option is necessitated by an 
(existing) implicit behaviour. However in my view that behaviour is 
implementations specific and of limited value in terms of the semantic 
contract of the call.


--Gordon.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][messaging] 'retry' option

2014-06-27 Thread Gordon Sim

I do apologise for omitting the subject qualifiers in my previous mail!

On 06/27/2014 05:02 PM, Gordon Sim wrote:

A question about the new 'retry' option. The doc says:

 By default, cast() and call() will block until the
 message is successfully sent.

What does 'successfully sent' mean here? Does it mean 'written to the
wire' or 'accepted by the broker'?

For the impl_qpid.py driver, each send is synchronous, so it means
accepted by the broker[1].

What does the impl_rabbit.py driver do? Does it just mean 'written to
the wire', or is it using RabbitMQ confirmations to get notified when
the broker accepts it (standard 0-9-1 has no way of doing this).

If the intention is to block until accepted by the broker that has
obvious performance implications. On the other hand if it means block
until written to the wire, what is the advantage of that? Was that a
deliberate feature or perhaps just an accident of implementation?

The use case for the new parameter, as described in the git commit,
seems to be motivated by wanting to avoid the blocking when sending
notifications. I can certainly understand that desire.

However, notifications and casts feel like inherently asynchronous
things to me, and perhaps having/needing the synchronous behaviour is
the real issue? Calls by contrast, are inherently synchronous, but at
present the retry controls only the sending of the request. If the
server fails, the call may timeout regardless of the value of 'retry'.

Just in passing, I'd suggest that renaming the new parameter
max_reconnects, would make it's current behaviour and values clearer.
The name 'retry' sounds like a yes/no type value, and retry=0 v. retry=1
is the reverse of what I would intuitively expect.

--Gordon.

[1] I've personally considered that somewhat unnecessary.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev