> On 21 Dec 2016, at 16:04, Henrik Johansen <[email protected]> > wrote: > >> >> On 21 Dec 2016, at 14:57 , Sven Van Caekenberghe <[email protected]> wrote: >> >> Hendrik, >> >> Thank you for this detailed feedback. >> >>> On 21 Dec 2016, at 12:44, Henrik Johansen <[email protected]> >>> wrote: >>> >>> Hi Sven! >>> One thing I noticed when testing the RabbitMQ client with keepalive > 0, >>> was connection being closed and all subscriptions lost when receiving large >>> payloads, due to timeout before the keepalive packet could be sent before >>> waiting to receive next object. >> >> Indeed, I used my Stamp (STOMP) Rabbit MQ client as a model for the MQTT one >> - there is a lot of similarity, especially in concept. >> >> Keep alive processing is not that easy. I tried to do it by using read >> timeouts as a source of regular opportunities to check for the need to >> process keep alive logic. But of course, if you have no outstanding read (in >> a loop), that won't work. >> >> The fact that receiving a large payload would trigger an actual keep alive >> time out is not something that I have seen myself. It seems weird that the >> reading/transferring of incoming data would not count as activity against >> keep alive, no ? > > I never had a chance to investigate fully, but I distinctly remember having > the same reaction! > It's quite awhile ago now, so my memory might be hazy, take the following > with an appropriate amount of grains of salt. > > The first times I encountered it, it seemed quite random, occuring after > extended periods of client inactivity after receiving only small payloads... > Setting a much shorter keepalive timeout than the default was/is very useful > in reproducing/verifying if it is an issue. > The timeouts then occurred relatively shortly after I'd received a single > payload with no other activity, and disappeared once I removed the resetting > of lastActivity timestamp on reads, indicating that for at least rabbitmq > (3.5 was the version at the time, I believe), receiving data was *not* being > counted as keep-alive activity. > > The issue of receiving large payloads blocking writing in time was still > unresolved, consistently cut off in the middle of (its own!) multi-MB payload > transfers due to keepalive packet not being sent. > I couldn't see a solution other than abandoning the elegant single-threaded > approach and do keepalive as a separate high-priority process, but the > architectural choice for the app changed at this point to not include a MQ > in first delivery, so the more involved rewrite for doing so before deploying > in production, kinda got stranded :/
Instinctively it feels like messaging and multi-MB payloads would not be a good fit, at least I would suspect that they are an edge case. But maybe I am wrong. I think that the code in StampMedium>>#readBodyBytes: and StampMedium>>#readBodyString: could be refactored to use a loop with chunk buffers and at the same time check the elapsed time to fire back keep alive pings if necessary. A ping too much would not hurt, better safe than sorry. But is is hard to catch any/all network slowdowns, any IO operation could hang/timeout. But all this would assume that the problematic situation can be recreated. > Cheers, > Henry
