Re: [rsyslog] mmglobal was Re: global variable use cases

David Lang Tue, 22 Oct 2013 07:24:58 -0700

On Tue, 22 Oct 2013, Rainer Gerhards wrote:

Date: Tue, 22 Oct 2013 15:48:39 +0200
From: Rainer Gerhards <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] mmglobal was Re: global variable use cases


On Tue, Oct 22, 2013 at 3:40 PM, Rainer Gerhards
<[email protected]>wrote:

just an important one:


On Tue, Oct 22, 2013 at 3:31 PM, David Lang <[email protected]> wrote:

NO! Because you once again use global space between statements. Problem

reapears. The operation MUST happen and result returned immediately. This
would work:

set $.myhostcounts = globalinc(var='$/hostcounts' subvar='$hostname')

because it is in a single statement. Note the difference (two vs. one
stmts).


you aren't always going to be able to combine things into a single
statement, sometimes you will be dealing with multiple counters you want to
manage (say hostname based and application based).


than it does NOT work! Keep on your mind that each statement is executed
for *all* messages, and *then*, the rule engine advances to the next
statement. That's the SIMD mode -- even though it's of course not true
SIMD, the analogy is very strong (doing real SIMD would be much too
fine-grained and result in giantic overhead).

or what if you want to do an if condition based on a global variable, and
then modify it based on the result? (think reset a counter for example)

my idea is that any function that modifies a global variable must do so
instantly and atomically, and any read of the global variable must also be
instant.


But that DOES NOT help you with the real problem. Maybe it helps a bit if
you play with the current implementation in various cases. I am trying to
get this message over, but I fail, we are not on the same line. The
"problem" is that the SIMD engine and global state do not play well
together as soon as multiple statements are involved.

I just got another approach trying to explain this. Again, take a conf with
two statements (doesn't matter what they are):

stmt1
stmt 2

Then we have 4 messages a,b,c,d.

I think you expect this execution order:

stmt1 a
stmt2 a
stmt1 b
stmt2 b
stmt3 a
stmt3 b
stmt4 a
stmt4 b

but in reality, it is this one:

stmt1 a
stmt1 b
stmt1 c
stmt1 d
stmt2 a
stmt2 b
stmt3 c
stmt3 d

And now think about the implications if stmt1 modifies some global state
and stmt2 uses it. Then do the same for some property that is specific to
the message (aka "contained inside" a,b,c,d). It may help to get a piece of
paper and a pencil ;) You'll see a big difference...

Ok, I was only partially understanding this. I was understanding that allstatments were evaluated, then all actions take, not that this was happening forindividual statements no matter what the type.


what happens if you have

if function1 then function2

for a couple messages?

I think this is going to be a source of ongoing confusion for users. I don't seeit being reasonable to create a global() function that can do everything thatanyone ever wants to do with global variables in one shot. At least not untilyou start allowing a series of statements to be passed to the funtion,effectively turning it into


atomic(
  statement1
  statement2
  statement3
)

that combines these statements into one as far as the SIMD engine is concerned.

going off on a slight tangent, reading on SIMD (since I didn't recognize theabbrevidation and had been misunderstanding your prior explanation), thisapproach seems odd to take on modern CPUs. by executing the same statement foreach message in turn, and then going on to the next staement, it seems like youare thrashing the CPU cache as I would expect that the cache footprint for thedata being manipulated in a mesage to be overwelmingly larger than the cachefootprint of the statement being executed. It would seem that you would be inmuch better shape to run all the commands against a single message, keeping thatmessage in the cache as much as possible, and then going on to the next message.Not to mention the increased memory use if processing each message sets a bunchof variables, because the current approach requires that all the variables forall the messages be generated at the same time, rather than creating all thevariables for one message, then being able to throw them all away when you go towork on the next message.

Is this approach also happening on batches of messages being parsed on the inputside? if so, the cache thrashing may explain some of the slowdown on the inputside.


David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] mmglobal was Re: global variable use cases

Reply via email to