On Sat, 24 Jan 2015, David Lang wrote:
- RELP or not
* My benchmarks on a 1G 20-core box achieved maximum of 27 MB/s
(uncompressed) with RELP with multiple producers. Compare it with ptcp: 98
MB/s uncompressed, 220 MB/s with stream-compression. So efficiency is one
of the points of contention.
I think I know what's going on here. If I'm correct, it will require some
"interesting" changes to how the internal queue is used (although similar
changes may happen as part of the pull work that Rainer is working on)
Rainer, please check my logic.
the worker thread locks the queue and marks a batch of messages as being worked
on.
The worker thread gets to the RELP action and starts sending these messages, and
handling acks as they arrive.
The worker thread cannot consider the RELP action a success and go on to the
next part of the config, or the next batch of messasges until all the messages
in this batch have been acked (or some fail)
This means that the window of unacked messages cannot be larger than the size of
the batch of messages being processed. Even if there are no other actions that
the worker is doing on this batch of messages, this causes the output of
messages to 'throb', with a bunch of messages being sent, and then a pause when
all the messages of this batch have been sent, but not all acks have been
received, then the next batch is processed and the cycle repeats. If there are
other actions that take place from the same queue, the throbbing is even worse
because all those other actions take place between the time that the RELP module
is sending messages.
If there are multiple RELP actions, it gets even worse, because we can't start
sending on the second RELP action until the first RELP action says the message
is delivered (just in case there is logic in place to do something different if
the delivery fails on the first action)
First question, is my logic on what's going on plausible.
If so, how can this be fixed?
One way I see to fix this, a worker thread would have to be able to 'complete' a
batch and start working on the next batch without marking all the messages in
the first batch as being delivered (because we are waiting for acks that may
never come, they aren't delivered yet). And if there are multiple RELP actions,
it can't mark the message as being delivered and able to be removed from the
queue until all of the actions have acked the message.
The other possible way of handling this is to have the RELP output module
maintain it's own internal queue, and let the worker thread delete the message
from the action queue as soon as the RELP action starts working on it, and if
the RELP action can't deliver it, the message would have to be re-added to the
action queue from the internal queue. but at that point, I don't see how to
sanely handle the "do this if the prior action failed" type of logic, and the
log message would end up getting re-delivered to all other outputs.
Whatever is being done for the pull output module may be able to help here,
because the pull output has a similar problem, you don't want to hold up
delivery of other messages just because the clients haven't requested the
messages yet. but for the pull output, rsyslog can consider the message
delivered as soon as it's handed off to the pull module.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.