On Tue, 22 Oct 2013, Boylan, James wrote:

The way global variables are currently implemented, (processing against each message but only preserving the last one through is fairly consistent with what one would expect. The only concern I would have is if there a chance of a race condition where it might be set by multiple worker threads at the same time thus preventing someone from checking on the value that was stored for some of the messages.

global variables are racy between different threads

From a use perspective, I would say they would be the following off the top of 
my head:

1. Counting messages
2. Load balancing
3. Triggering an action if the previous message had a specific field and the 
new message had a specific field

global variables won't work well if you are thinking in terms of adjacent messages. there are just too many ways that the messages won't end up being adjacent, and may even be processed in the wrong order.

if you have two messages arrive in order (not guaranteed), two worker threads can pick up the two messages, the first message being the last of the batch processed by one thread, while the second message being the first of the batch processed by the second thread. In this case, the second message is going to be processed before the first message.

This is one of the many ways that messages can get out of order.

but when you are thinking in terms of tens of thousands of messages per second, having a global variable not get set for a few hundred messages is still pretty close to 'instantly'

And it's currently the only way of changing rsyslog's behavior other than changing the config file and doing a complete restart, which will almost certinly loose several seconds worth of logs

There may be more, but those are the ones I can think of eight now.

At the end of the day, I agree with Rainer. The risk of a race condition causes concern on it's ability to effectively be used for load balancing as well as the accuracy of the other options. Removing it for now and waiting to see if it is even missed before planning a major engine rewrite seems prudent.

I disagree on this race condition being significant in terms of load balancing.

remember that a worker thread grabs a batch of messages and then starts looking at the tests. It may be that every message in the batch must be processed by every test and action, or it may be that the first test results in a stop for every message in the batch. As a result, the amount of work that needs to be done is highly variable. Over time this will average out and all workers will end up doing about the same amount of work.

If you load balance by batch, then it will still average out over time.

If the counter updates are racy, once in a while you will end up not incrimenting the counter and the load will not shift as soon as expected. But the chance of this is equal no matter what the current value of the counter is, so again, over time, the load will continue to be distributed approximatly evenly.

I've run into exactly the same discussion when talking about load balancing traffic across a farm of different servers.

remember that log messages vary drastically in length, and there are classes of log messages that get thrown away.

However even with these issues, a simple approach of 'every N messages, shift which server is receiving the messages randomly' distributes the load so evenly that after 5 years of running all the systems receiving the messages have the same amount of traffic (at least to within 1% or so)

David Lang


-- James

-- Sent from my mobile --

----- Reply message -----
From: "Rainer Gerhards" <[email protected]>
To: "rsyslog-users" <[email protected]>
Subject: [rsyslog] global variable use cases
Date: Tue, Oct 22, 2013 4:54 am



On Tue, Oct 22, 2013 at 11:37 AM, David Lang <[email protected]> wrote:

On Tue, 22 Oct 2013, Rainer Gerhards wrote:

 On Tue, Oct 22, 2013 at 11:09 AM, David Lang <[email protected]> wrote:

 On Tue, 22 Oct 2013, Rainer Gerhards wrote:

 On Tue, Oct 22, 2013 at 10:23 AM, David Lang <[email protected]> wrote:



Most folks would consider this to be an error ;) I agree to Pavel that
this is so far from being what one expects, that even documenting how
things work will not really help getting away from that error-like feeling.
Add the fact that the result is non-deterministic as it depends on the
current batch size, I would also conclude that this is errornous *design*.

I probably was not clear enoug in my initial question: as I said
yesterday evening, I have now removed global variable support. It won't be
available in 7.5.6. My question thus was on the use cases that could guide
us if working toward a correct implementation makes any sense. This would
require a big design change.


the use case wasn't trying to be something as precise as you are talking
about, but rather a much corser ability that after you see message X show
up, start doing something different. This could be to write files to a
different location because a failover has happened and you are now the
active box, it could be that if a particular type of log message shows up
you want to log debug messages, or something else.

given that logs can be re-ordered by the network, by relays in the
middle,
or by the results of batch processing, it was never going to be an exact
thing, so if a change only takes effect for the next batch, that's not
unreasonable.

yes, this is edging towards log correlation territory, but it's an
extremely handy way to be able to turn things on and off, or to cause
logs
to be handled differently without requiring on-the-fly editing of config
files and a complete restart.

the log losses from a complete restart are massive compared to the global
variable impresision of only setting the variable on the next batch.



global variables are not that useful as something that you would set for
each message that you process, but if they are only occasionally changed,
they can be very useful and not a significant performance issue.


 It's not the performance issue -- it's the "strong smell of
incorrectness".
Remember that I must support that whole thing in the future. People will
continously ask why it "does not work", and some of them will
(even/hopefully/probably) have support contracts, so I am actually in
charge to deliver a solution. If I then just describe "it is as it is"
that
will create very unsatisfied customers. So better do it right or do it not
at all (which is what I currently opt for, and my peers will kill me if I
knowingly introduce a feature that potentially creates great
user-desatisfaction).




by default the batch size is still 1 correct? it's only people who are
scaleing up who will be setting it higher.


no, for a very long time not. I think when we introduced batching, the
default was 8, and today it is either 256 or 512 (all for main q). A lot of
optimizations (think transaction support) depend on relatively large
batches.


I think that if you just document that setting global variables is racy,
and as such it's not suitable for accruate counting, only for changing
rsyslog behavior without a restart you should be good on the expectations
front. the current documentation leans heavily on the counting aspect of
things, but that can be changed. remove any suggestion of future atomic
opertations and emphisise that atomic operations are not possible.


OK, so let's go back to use cases: you mean

"change behaviour after an approximate number of messages has been
processed"

right?


being able to enable or disable e-mail messages, change what the
destination address is, change the filename or patch when the box becomes
active are all very useful items that I would hate to loose.


all done based on counts?

In any case, let's see what happens when 7.5.6 does no longer support
global vars. Maybe some folks show up and tell their actual pain, and then
we can see if there is a better cure...

Rainer


even the load balancing hack works 'well enough' once you accept that you
are balancing per batch rather than per message (even if you did balance
per message, you really have no idea how expensive a particulare message is
going to be, so you are not really balancing the work precisely, you are
only doing so statistically, and balancing per batch rather than per
message is just as valid statistically)


David Lang
______________________________**_________________
rsyslog mailing list
http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to