Re: [rsyslog] RELP performance was: Re: [RFC: Ingestion Relay] End-to-end reliable 'at-least-once' message delivery at large scale

David Lang Sat, 24 Jan 2015 18:43:48 -0800

On Sat, 24 Jan 2015, David Lang wrote:

- RELP or not
 * My benchmarks on a 1G 20-core box achieved maximum of 27 MB/s
(uncompressed) with RELP with multiple producers. Compare it with ptcp: 98
MB/s uncompressed, 220 MB/s with stream-compression. So efficiency is one
of the points of contention.

I think I know what's going on here. If I'm correct, it will require some"interesting" changes to how the internal queue is used (although similarchanges may happen as part of the pull work that Rainer is working on)


Rainer, please check my logic.

the worker thread locks the queue and marks a batch of messages as being workedon.

The worker thread gets to the RELP action and starts sending these messages, andhandling acks as they arrive.

The worker thread cannot consider the RELP action a success and go on to thenext part of the config, or the next batch of messasges until all the messagesin this batch have been acked (or some fail)

This means that the window of unacked messages cannot be larger than the size ofthe batch of messages being processed. Even if there are no other actions thatthe worker is doing on this batch of messages, this causes the output ofmessages to 'throb', with a bunch of messages being sent, and then a pause whenall the messages of this batch have been sent, but not all acks have beenreceived, then the next batch is processed and the cycle repeats. If there areother actions that take place from the same queue, the throbbing is even worsebecause all those other actions take place between the time that the RELP moduleis sending messages.

If there are multiple RELP actions, it gets even worse, because we can't startsending on the second RELP action until the first RELP action says the messageis delivered (just in case there is logic in place to do something different ifthe delivery fails on the first action)


First question, is my logic on what's going on plausible.

If so, how can this be fixed?

One way I see to fix this, a worker thread would have to be able to 'complete' abatch and start working on the next batch without marking all the messages inthe first batch as being delivered (because we are waiting for acks that maynever come, they aren't delivered yet). And if there are multiple RELP actions,it can't mark the message as being delivered and able to be removed from thequeue until all of the actions have acked the message.

The other possible way of handling this is to have the RELP output modulemaintain it's own internal queue, and let the worker thread delete the messagefrom the action queue as soon as the RELP action starts working on it, and ifthe RELP action can't deliver it, the message would have to be re-added to theaction queue from the internal queue. but at that point, I don't see how tosanely handle the "do this if the prior action failed" type of logic, and thelog message would end up getting re-delivered to all other outputs.

Whatever is being done for the pull output module may be able to help here,because the pull output has a similar problem, you don't want to hold updelivery of other messages just because the clients haven't requested themessages yet. but for the pull output, rsyslog can consider the messagedelivered as soon as it's handed off to the pull module.


David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] RELP performance was: Re: [RFC: Ingestion Relay] End-to-end reliable 'at-least-once' message delivery at large scale

Reply via email to