On Tue, 22 Oct 2013, Rainer Gerhards wrote:

Date: Tue, 22 Oct 2013 15:48:39 +0200
From: Rainer Gerhards <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] mmglobal was Re: global variable use cases

On Tue, Oct 22, 2013 at 3:40 PM, Rainer Gerhards
<[email protected]>wrote:

just an important one:


On Tue, Oct 22, 2013 at 3:31 PM, David Lang <[email protected]> wrote:

NO! Because you once again use global space between statements. Problem

reapears. The operation MUST happen and result returned immediately. This
would work:

set $.myhostcounts = globalinc(var='$/hostcounts' subvar='$hostname')

because it is in a single statement. Note the difference (two vs. one
stmts).


you aren't always going to be able to combine things into a single
statement, sometimes you will be dealing with multiple counters you want to
manage (say hostname based and application based).


than it does NOT work! Keep on your mind that each statement is executed
for *all* messages, and *then*, the rule engine advances to the next
statement. That's the SIMD mode -- even though it's of course not true
SIMD, the analogy is very strong (doing real SIMD would be much too
fine-grained and result in giantic overhead).


or what if you want to do an if condition based on a global variable, and
then modify it based on the result? (think reset a counter for example)

my idea is that any function that modifies a global variable must do so
instantly and atomically, and any read of the global variable must also be
instant.


But that DOES NOT help you with the real problem. Maybe it helps a bit if
you play with the current implementation in various cases. I am trying to
get this message over, but I fail, we are not on the same line. The
"problem" is that the SIMD engine and global state do not play well
together as soon as multiple statements are involved.


I just got another approach trying to explain this. Again, take a conf with
two statements (doesn't matter what they are):

stmt1
stmt 2

Then we have 4 messages a,b,c,d.

I think you expect this execution order:

stmt1 a
stmt2 a
stmt1 b
stmt2 b
stmt3 a
stmt3 b
stmt4 a
stmt4 b

but in reality, it is this one:

stmt1 a
stmt1 b
stmt1 c
stmt1 d
stmt2 a
stmt2 b
stmt3 c
stmt3 d

And now think about the implications if stmt1 modifies some global state
and stmt2 uses it. Then do the same for some property that is specific to
the message (aka "contained inside" a,b,c,d). It may help to get a piece of
paper and a pencil ;) You'll see a big difference...

Ok, I was only partially understanding this. I was understanding that all statments were evaluated, then all actions take, not that this was happening for individual statements no matter what the type.

what happens if you have

if function1 then function2

for a couple messages?

I think this is going to be a source of ongoing confusion for users. I don't see it being reasonable to create a global() function that can do everything that anyone ever wants to do with global variables in one shot. At least not until you start allowing a series of statements to be passed to the funtion, effectively turning it into

atomic(
  statement1
  statement2
  statement3
)

that combines these statements into one as far as the SIMD engine is concerned.

going off on a slight tangent, reading on SIMD (since I didn't recognize the abbrevidation and had been misunderstanding your prior explanation), this approach seems odd to take on modern CPUs. by executing the same statement for each message in turn, and then going on to the next staement, it seems like you are thrashing the CPU cache as I would expect that the cache footprint for the data being manipulated in a mesage to be overwelmingly larger than the cache footprint of the statement being executed. It would seem that you would be in much better shape to run all the commands against a single message, keeping that message in the cache as much as possible, and then going on to the next message. Not to mention the increased memory use if processing each message sets a bunch of variables, because the current approach requires that all the variables for all the messages be generated at the same time, rather than creating all the variables for one message, then being able to throw them all away when you go to work on the next message.

Is this approach also happening on batches of messages being parsed on the input side? if so, the cache thrashing may explain some of the slowdown on the input side.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to