On Tue, 22 Oct 2013, Rainer Gerhards wrote:
Date: Tue, 22 Oct 2013 15:48:39 +0200
From: Rainer Gerhards <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] mmglobal was Re: global variable use cases
On Tue, Oct 22, 2013 at 3:40 PM, Rainer Gerhards
<[email protected]>wrote:
just an important one:
On Tue, Oct 22, 2013 at 3:31 PM, David Lang <[email protected]> wrote:
NO! Because you once again use global space between statements. Problem
reapears. The operation MUST happen and result returned immediately. This
would work:
set $.myhostcounts = globalinc(var='$/hostcounts' subvar='$hostname')
because it is in a single statement. Note the difference (two vs. one
stmts).
you aren't always going to be able to combine things into a single
statement, sometimes you will be dealing with multiple counters you want to
manage (say hostname based and application based).
than it does NOT work! Keep on your mind that each statement is executed
for *all* messages, and *then*, the rule engine advances to the next
statement. That's the SIMD mode -- even though it's of course not true
SIMD, the analogy is very strong (doing real SIMD would be much too
fine-grained and result in giantic overhead).
or what if you want to do an if condition based on a global variable, and
then modify it based on the result? (think reset a counter for example)
my idea is that any function that modifies a global variable must do so
instantly and atomically, and any read of the global variable must also be
instant.
But that DOES NOT help you with the real problem. Maybe it helps a bit if
you play with the current implementation in various cases. I am trying to
get this message over, but I fail, we are not on the same line. The
"problem" is that the SIMD engine and global state do not play well
together as soon as multiple statements are involved.
I just got another approach trying to explain this. Again, take a conf with
two statements (doesn't matter what they are):
stmt1
stmt 2
Then we have 4 messages a,b,c,d.
I think you expect this execution order:
stmt1 a
stmt2 a
stmt1 b
stmt2 b
stmt3 a
stmt3 b
stmt4 a
stmt4 b
but in reality, it is this one:
stmt1 a
stmt1 b
stmt1 c
stmt1 d
stmt2 a
stmt2 b
stmt3 c
stmt3 d
And now think about the implications if stmt1 modifies some global state
and stmt2 uses it. Then do the same for some property that is specific to
the message (aka "contained inside" a,b,c,d). It may help to get a piece of
paper and a pencil ;) You'll see a big difference...
Ok, I was only partially understanding this. I was understanding that all
statments were evaluated, then all actions take, not that this was happening for
individual statements no matter what the type.
what happens if you have
if function1 then function2
for a couple messages?
I think this is going to be a source of ongoing confusion for users. I don't see
it being reasonable to create a global() function that can do everything that
anyone ever wants to do with global variables in one shot. At least not until
you start allowing a series of statements to be passed to the funtion,
effectively turning it into
atomic(
statement1
statement2
statement3
)
that combines these statements into one as far as the SIMD engine is concerned.
going off on a slight tangent, reading on SIMD (since I didn't recognize the
abbrevidation and had been misunderstanding your prior explanation), this
approach seems odd to take on modern CPUs. by executing the same statement for
each message in turn, and then going on to the next staement, it seems like you
are thrashing the CPU cache as I would expect that the cache footprint for the
data being manipulated in a mesage to be overwelmingly larger than the cache
footprint of the statement being executed. It would seem that you would be in
much better shape to run all the commands against a single message, keeping that
message in the cache as much as possible, and then going on to the next message.
Not to mention the increased memory use if processing each message sets a bunch
of variables, because the current approach requires that all the variables for
all the messages be generated at the same time, rather than creating all the
variables for one message, then being able to throw them all away when you go to
work on the next message.
Is this approach also happening on batches of messages being parsed on the input
side? if so, the cache thrashing may explain some of the slowdown on the input
side.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.