On 07/08/2014 02:00 AM, Noel Burton-Krahn wrote:
The thing is, that produces errors exactly like what I'm seeing in nova if rabbit dies and we reconnect to a new rabbit instance.
A call timing out while waiting for a response is a fairly general problem for which there could be different causes.
I'm tracing through the nova calls in the rabbit reconnect case to confirm that acknowledge is always being called when it should be.
Even if it is, the acknowledgement could be lost if the connection to rabbitmq fails. However I don't think that is likely to be the cause of the time out. Unlike in the example, in a real oslo.messaging based service the fact that the request is redelivered shouldn't be a problem. The reply issued to it may be ignored or dropped, but the subsequent requests will be processed.
I'm not completely clear on what the timing is in your original problem. You say the timeout happens after a restart. Is it immediately after (i.e. could some connections still be detecting the failure)? Or long enough after that you are confident everything has failed over correctly?
(Obviously a failure or restart *during* a call may well result in a timeout; that is the expected semantics at present).
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : [email protected] Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
