2015-02-02 12:09 GMT+01:00 David Lang <[email protected]>: > Thanks for taking the time to look this over. Just eliminating my bad > guesses helps. > > my pleasure --- I just lost that in the big email spike ;)
Rainer > David Lang > > On Mon, 2 Feb 2015, Rainer Gerhards wrote: > > Date: Mon, 2 Feb 2015 11:21:29 +0100 >> From: Rainer Gerhards <[email protected]> >> To: David Lang <[email protected]> >> Cc: rsyslog-users <[email protected]> >> Subject: Re: [rsyslog] RELP performance was: Re: [RFC: Ingestion Relay] >> End-to-end reliable 'at-least-once' message delivery at large scale >> >> >> Answering this on request of David. I don't want to dig down into >> optimizing RELP, though, as there is so much others going on that I can't >> do any work on it in any case. >> >> Facts inline below. >> >> 2015-01-25 3:43 GMT+01:00 David Lang <[email protected]>: >> >> On Sat, 24 Jan 2015, David Lang wrote: >>> >>> - RELP or not >>> >>>> * My benchmarks on a 1G 20-core box achieved maximum of 27 MB/s >>>>>> (uncompressed) with RELP with multiple producers. Compare it with >>>>>> ptcp: >>>>>> 98 >>>>>> MB/s uncompressed, 220 MB/s with stream-compression. So efficiency is >>>>>> one >>>>>> of the points of contention. >>>>>> >>>>>> >>>>> I think I know what's going on here. If I'm correct, it will require >>> some >>> "interesting" changes to how the internal queue is used (although similar >>> changes may happen as part of the pull work that Rainer is working on) >>> >>> Rainer, please check my logic. >>> >>> the worker thread locks the queue and marks a batch of messages as being >>> worked on. >>> >>> The worker thread gets to the RELP action and starts sending these >>> messages, and handling acks as they arrive. >>> >>> The worker thread cannot consider the RELP action a success and go on to >>> the next part of the config, or the next batch of messasges until all the >>> messages in this batch have been acked (or some fail) >>> >>> >> That's not the case. It's RELP that will ensure that all messages are >> processed. Once they are handed over to librelp, and the call returns, the >> worker can be (sufficiently) sure that librelp will deliver them. So the >> queue processing will be pushed back only when librelp blocks. It all >> depends on relp window size. >> >> Note that we identified that it is possible to lose up to relp windows >> size >> messages if rsyslog goes down in a failure situation. We already talked >> about cures (callbacks to saving in such cases), but nobody so far had >> time >> to implement this (nor did someone really complain hard about that). >> >> >> >>> This means that the window of unacked messages cannot be larger than the >>> size of the batch of messages being processed. Even if there are no other >>> actions that the worker is doing on this batch of messages, this causes >>> the >>> output of messages to 'throb', with a bunch of messages being sent, and >>> then a pause when all the messages of this batch have been sent, but not >>> all acks have been received, then the next batch is processed and the >>> cycle >>> repeats. If there are other actions that take place from the same queue, >>> the throbbing is even worse because all those other actions take place >>> between the time that the RELP module is sending messages. >>> >>> >>> no, batch begin is just sent as a hint to librelp (which does some >> optimization I don't remember out of my head based on that -- I think it >> is >> related to filling up buffers). >> >> >> If there are multiple RELP actions, it gets even worse, because we can't >>> start sending on the second RELP action until the first RELP action says >>> the message is delivered (just in case there is logic in place to do >>> something different if the delivery fails on the first action) >>> >>> >> There is no relationship between the two. >> >> I guess the performance degradion in contrast to TCP stems back to the >> fact >> that TCP has a much larger window (in terms of messages). Try setting the >> relp window to 100,000 messages and I guess the performance is much more >> equal -- but that of course also means you may lose as many messages. >> >> Rainer >> >> >>> First question, is my logic on what's going on plausible. >>> >>> If so, how can this be fixed? >> >>> >>> >>> One way I see to fix this, a worker thread would have to be able to >>> 'complete' a batch and start working on the next batch without marking >>> all >>> the messages in the first batch as being delivered (because we are >>> waiting >>> for acks that may never come, they aren't delivered yet). And if there >>> are >>> multiple RELP actions, it can't mark the message as being delivered and >>> able to be removed from the queue until all of the actions have acked the >>> message. >>> >>> >>> The other possible way of handling this is to have the RELP output module >>> maintain it's own internal queue, and let the worker thread delete the >>> message from the action queue as soon as the RELP action starts working >>> on >>> it, and if the RELP action can't deliver it, the message would have to be >>> re-added to the >>> action queue from the internal queue. but at that point, I don't see how >>> to sanely handle the "do this if the prior action failed" type of logic, >>> and the log message would end up getting re-delivered to all other >>> outputs. >>> >>> >>> Whatever is being done for the pull output module may be able to help >>> here, because the pull output has a similar problem, you don't want to >>> hold >>> up delivery of other messages just because the clients haven't requested >>> the messages yet. but for the pull output, rsyslog can consider the >>> message >>> delivered as soon as it's handed off to the pull module. >>> >>> David Lang >>> >>> >> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

