On Sat, 24 Jan 2015, David Lang wrote:

- RELP or not
 * My benchmarks on a 1G 20-core box achieved maximum of 27 MB/s
(uncompressed) with RELP with multiple producers. Compare it with ptcp: 98
MB/s uncompressed, 220 MB/s with stream-compression. So efficiency is one
of the points of contention.

I think I know what's going on here. If I'm correct, it will require some "interesting" changes to how the internal queue is used (although similar changes may happen as part of the pull work that Rainer is working on)

Rainer, please check my logic.

the worker thread locks the queue and marks a batch of messages as being worked on.

The worker thread gets to the RELP action and starts sending these messages, and handling acks as they arrive.

The worker thread cannot consider the RELP action a success and go on to the next part of the config, or the next batch of messasges until all the messages in this batch have been acked (or some fail)

This means that the window of unacked messages cannot be larger than the size of the batch of messages being processed. Even if there are no other actions that the worker is doing on this batch of messages, this causes the output of messages to 'throb', with a bunch of messages being sent, and then a pause when all the messages of this batch have been sent, but not all acks have been received, then the next batch is processed and the cycle repeats. If there are other actions that take place from the same queue, the throbbing is even worse because all those other actions take place between the time that the RELP module is sending messages.

If there are multiple RELP actions, it gets even worse, because we can't start sending on the second RELP action until the first RELP action says the message is delivered (just in case there is logic in place to do something different if the delivery fails on the first action)

First question, is my logic on what's going on plausible.

If so, how can this be fixed?


One way I see to fix this, a worker thread would have to be able to 'complete' a batch and start working on the next batch without marking all the messages in the first batch as being delivered (because we are waiting for acks that may never come, they aren't delivered yet). And if there are multiple RELP actions, it can't mark the message as being delivered and able to be removed from the queue until all of the actions have acked the message.


The other possible way of handling this is to have the RELP output module maintain it's own internal queue, and let the worker thread delete the message from the action queue as soon as the RELP action starts working on it, and if the RELP action can't deliver it, the message would have to be re-added to the action queue from the internal queue. but at that point, I don't see how to sanely handle the "do this if the prior action failed" type of logic, and the log message would end up getting re-delivered to all other outputs.


Whatever is being done for the pull output module may be able to help here, because the pull output has a similar problem, you don't want to hold up delivery of other messages just because the clients haven't requested the messages yet. but for the pull output, rsyslog can consider the message delivered as soon as it's handed off to the pull module.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to