On 28/06/14 22:49 +0100, Mark McLoughlin wrote:
On Fri, 2014-06-27 at 17:02 +0100, Gordon Sim wrote:
A question about the new 'retry' option. The doc says:

     By default, cast() and call() will block until the
     message is successfully sent.

What does 'successfully sent' mean here?

Unclear, ambiguous, probably driver dependent etc.

The 'blocking' we're talking about here is establishing a connection
with the broker. If the connection has been lost, then cast() will block
until the connection has been re-established and the message 'sent'.

 Does it mean 'written to the wire' or 'accepted by the broker'?

For the impl_qpid.py driver, each send is synchronous, so it means
accepted by the broker[1].

What does the impl_rabbit.py driver do? Does it just mean 'written to
the wire', or is it using RabbitMQ confirmations to get notified when
the broker accepts it (standard 0-9-1 has no way of doing this).

I don't know, but it would be nice if someone did take the time to
figure it out and document it :)

Seriously, some docs around the subtle ways that the drivers differ from
one another would be helpful ... particularly if it exposed incorrect
assumptions API users are currently making.

+1

We should also keep this in mind for the amqp 1.0 driver. It's
mandatory to have a detailed documentation of it. Specifically, I'm
interested in having documented how its architecture differs from
other drivers. The more we can write down about it, the better.

If the intention is to block until accepted by the broker that has
obvious performance implications. On the other hand if it means block
until written to the wire, what is the advantage of that? Was that a
deliberate feature or perhaps just an accident of implementation?

The use case for the new parameter, as described in the git commit,
seems to be motivated by wanting to avoid the blocking when sending
notifications. I can certainly understand that desire.

However, notifications and casts feel like inherently asynchronous
things to me, and perhaps having/needing the synchronous behaviour is
the real issue?

It's not so much about sync vs async, but a failure mode. By default, if
we lose our connection with the broker, we wait until we can
re-establish it rather than throwing exceptions (requiring the API
caller to have its own retry logic) or quietly dropping the message.

The use case for ceilometer is to allow its RPCPublisher to have a
publishing policy - block until the samples have been sent, queue (in an
in-memory, fixed-length queue) if we don't have a connection to the
broker, or drop it if we don't have a connection to the broker.

 https://review.openstack.org/77845

I do understand the ambiguity around what message delivery guarantees
are implicit in cast() isn't ideal, but that's not what adding this
'retry' parameter was about.

Besides documenting what each driver does, it might also be useful to
document what the expected guarantees are. The way you just described
them, in terms of connections, sound quite reasonable to me.

 Calls by contrast, are inherently synchronous, but at
present the retry controls only the sending of the request. If the
server fails, the call may timeout regardless of the value of 'retry'.

Just in passing, I'd suggest that renaming the new parameter
max_reconnects, would make it's current behaviour and values clearer.
The name 'retry' sounds like a yes/no type value, and retry=0 v. retry=1
is the reverse of what I would intuitively expect.

Sounds reasonable. Would you like to submit a patch? Quick turnaround is
important, because if Ceilometer starts using this retry parameter
before we rename it, I'm not sure it'll be worth the hassle.

+1

Flavio

--
@flaper87
Flavio Percoco

Attachment: pgpSNaEPjQ0Ig.pgp
Description: PGP signature

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to