> On 21 Dec 2016, at 16:04, Henrik Johansen <[email protected]> 
> wrote:
> 
>> 
>> On 21 Dec 2016, at 14:57 , Sven Van Caekenberghe <[email protected]> wrote:
>> 
>> Hendrik,
>> 
>> Thank you for this detailed feedback.
>> 
>>> On 21 Dec 2016, at 12:44, Henrik Johansen <[email protected]> 
>>> wrote:
>>> 
>>> Hi Sven!
>>> One thing I noticed when testing the RabbitMQ client with keepalive > 0, 
>>> was connection being closed and all subscriptions lost when receiving large 
>>> payloads, due to timeout before the keepalive packet could be sent before 
>>> waiting to receive next object.
>> 
>> Indeed, I used my Stamp (STOMP) Rabbit MQ client as a model for the MQTT one 
>> - there is a lot of similarity, especially in concept.
>> 
>> Keep alive processing is not that easy. I tried to do it by using read 
>> timeouts as a source of regular opportunities to check for the need to 
>> process keep alive logic. But of course, if you have no outstanding read (in 
>> a loop), that won't work.
>> 
>> The fact that receiving a large payload would trigger an actual keep alive 
>> time out is not something that I have seen myself. It seems weird that the 
>> reading/transferring of incoming data would not count as activity against 
>> keep alive, no ?
> 
> I never had a chance to investigate fully, but I distinctly remember having 
> the same reaction!
> It's quite awhile ago now, so my memory might be hazy, take the following 
> with an appropriate amount of grains of salt.
> 
> The first times I encountered it, it seemed quite random, occuring after 
> extended periods of client inactivity after receiving only small payloads...
> Setting a much shorter keepalive timeout than the default was/is very useful 
> in reproducing/verifying if it is an issue.
> The timeouts then occurred relatively shortly after I'd received a single 
> payload with no other activity, and disappeared once I removed the resetting 
> of lastActivity timestamp on reads, indicating that for at least rabbitmq 
> (3.5 was the version at the time, I believe), receiving data was *not* being 
> counted as keep-alive activity.
> 
> The issue of receiving large payloads blocking writing in time was still 
> unresolved, consistently cut off in the middle of (its own!) multi-MB payload 
> transfers due to keepalive packet not being sent.
> I couldn't see a solution other than abandoning the elegant single-threaded 
> approach and do keepalive as a separate high-priority process, but the 
> architectural choice for the app  changed at this point to not include a MQ 
> in first delivery, so the more involved rewrite for doing so before deploying 
> in production, kinda got stranded :/

Instinctively it feels like messaging and multi-MB payloads would not be a good 
fit, at least I would suspect that they are an edge case. But maybe I am wrong.

I think that the code in StampMedium>>#readBodyBytes: and 
StampMedium>>#readBodyString: could be refactored to use a loop with chunk 
buffers and at the same time check the elapsed time to fire back keep alive 
pings if necessary. A ping too much would not hurt, better safe than sorry. But 
is is hard to catch any/all network slowdowns, any IO operation could 
hang/timeout.

But all this would assume that the problematic situation can be recreated.

> Cheers,
> Henry


Reply via email to