On Tue, Oct 22, 2013 at 4:17 PM, David Lang <[email protected]> wrote:

> On Tue, 22 Oct 2013, Rainer Gerhards wrote:
>
>  Date: Tue, 22 Oct 2013 15:48:39 +0200
>> From: Rainer Gerhards <[email protected]>
>>
>> Reply-To: rsyslog-users <[email protected]>
>> To: rsyslog-users <[email protected]>
>> Subject: Re: [rsyslog] mmglobal was Re: global variable use cases
>>
>>
>> On Tue, Oct 22, 2013 at 3:40 PM, Rainer Gerhards
>> <[email protected]>**wrote:
>>
>>  just an important one:
>>>
>>>
>>> On Tue, Oct 22, 2013 at 3:31 PM, David Lang <[email protected]> wrote:
>>>
>>>  NO! Because you once again use global space between statements. Problem
>>>>
>>>>  reapears. The operation MUST happen and result returned immediately.
>>>>> This
>>>>> would work:
>>>>>
>>>>> set $.myhostcounts = globalinc(var='$/hostcounts' subvar='$hostname')
>>>>>
>>>>> because it is in a single statement. Note the difference (two vs. one
>>>>> stmts).
>>>>>
>>>>>
>>>> you aren't always going to be able to combine things into a single
>>>> statement, sometimes you will be dealing with multiple counters you
>>>> want to
>>>> manage (say hostname based and application based).
>>>>
>>>>
>>> than it does NOT work! Keep on your mind that each statement is executed
>>> for *all* messages, and *then*, the rule engine advances to the next
>>> statement. That's the SIMD mode -- even though it's of course not true
>>> SIMD, the analogy is very strong (doing real SIMD would be much too
>>> fine-grained and result in giantic overhead).
>>>
>>>
>>>  or what if you want to do an if condition based on a global variable,
>>>> and
>>>> then modify it based on the result? (think reset a counter for example)
>>>>
>>>> my idea is that any function that modifies a global variable must do so
>>>> instantly and atomically, and any read of the global variable must also
>>>> be
>>>> instant.
>>>>
>>>>
>>> But that DOES NOT help you with the real problem. Maybe it helps a bit if
>>> you play with the current implementation in various cases. I am trying to
>>> get this message over, but I fail, we are not on the same line. The
>>> "problem" is that the SIMD engine and global state do not play well
>>> together as soon as multiple statements are involved.
>>>
>>>
>>>  I just got another approach trying to explain this. Again, take a conf
>> with
>> two statements (doesn't matter what they are):
>>
>> stmt1
>> stmt 2
>>
>> Then we have 4 messages a,b,c,d.
>>
>> I think you expect this execution order:
>>
>> stmt1 a
>> stmt2 a
>> stmt1 b
>> stmt2 b
>> stmt3 a
>> stmt3 b
>> stmt4 a
>> stmt4 b
>>
>> but in reality, it is this one:
>>
>> stmt1 a
>> stmt1 b
>> stmt1 c
>> stmt1 d
>> stmt2 a
>> stmt2 b
>> stmt3 c
>> stmt3 d
>>
>> And now think about the implications if stmt1 modifies some global state
>> and stmt2 uses it. Then do the same for some property that is specific to
>> the message (aka "contained inside" a,b,c,d). It may help to get a piece
>> of
>> paper and a pencil ;) You'll see a big difference...
>>
>
> Ok, I was only partially understanding this. I was understanding that all
> statments were evaluated, then all actions take, not that this was
> happening for individual statements no matter what the type.
>
> what happens if you have
>
> if function1 then function2
>
> for a couple messages?
>

well, at first function1 is evaluated for all messages, then function2 for
all those messages, where the results was "true"


>
> I think this is going to be a source of ongoing confusion for users. I
> don't see it being reasonable to create a global() function that can do
> everything that anyone ever wants to do with global variables in one shot.
> At least not until you start allowing a series of statements to be passed
> to the funtion, effectively turning it into
>
> atomic(
>   statement1
>   statement2
>   statement3
> )
>
> that combines these statements into one as far as the SIMD engine is
> concerned.
>
>
so you suggest to go back to the beginning and drop global state altogether
-- that's the ultimate outcome of that statement, at least for the time
being ;)

My approach was less drastic. This is why I asked for use cases. I think it
is possible to support a number of -as many said here- important use cases
with reasonable effort, which means we could actually do it. Other uses
would simply not be supported.

going off on a slight tangent, reading on SIMD (since I didn't recognize
> the abbrevidation and had been misunderstanding your prior explanation),
> this approach seems odd to take on modern CPUs. by executing the same
> statement for each message in turn, and then going on to the next staement,
> it seems like you are thrashing the CPU cache as I would expect that the
> cache footprint for the data being manipulated in a mesage to be
> overwelmingly larger than the cache footprint of the statement being
> executed. It would seem that you would be in much better shape to run all
> the commands against a single message, keeping that message in the cache as
> much as possible, and then going on to the next message. Not to mention the
> increased memory use if processing each message sets a bunch of variables,
> because the current approach requires that all the variables for all the
> messages be generated at the same time, rather than creating all the
> variables for one message, then being able to throw them all away when you
> go to work on the next message.
>

Please beg my pardon that I cannot discuss this at length. I know this
would lead to a couple of hundered mails, taking up a week or two, with the
conlusion that there were some nice things possible but nobody has time to
do that ;)

But let me scratch on the surface and provide some quick reasons (again,
please accept my apologies that I can describe everything in details, so
some questions will definitely be left open).

The bottom line is that the SIMD idea makes much of rsyslog's current
performance possible. While we have some cache thrashing, it gives us the
ability to have relatively large chunks of independend processing, and so
we get away from the fine-grained parallel processing that we otherwise
would have. For example, if we ran each message individually through the
actions, we would need to lock and unlock the action lock for *each*
message, and if we had multiple workers, we would have large locking
contention. Along the same lines, we could not longer support transactions
in the way we currently do, as messages from different batches (threads)
would be intermixed. Also, what we loose from cache thrashing of the
message content, we gain from all templates being both spatial and temporal
closely together, and we make much better use of trace caches as the loops
are relatively tight.

That doesn't mean the engine is perfect. In fact, the engine has evolved
over the past 10 years, and the v7 enhancements sometimes caused a lot of
hard thinking to fit into the "unchangable" rest of the framework. What I
hate most is some ugly flag handling that we need to do the conditionals,
and that requires more cache writes than one would hope for. The SIMD idea
was perfect for v6 and below, where we effectively had a list of actions,
each one with an attached filter. This is also where the design stems back
to.

This is no longer that case, and I would really like to *rewrite* the
engine. I have to admit I am not sure if there would really be performance
gain, looking at theory there are some facts that say "yes" and some that
say "no". HOWEVER, a rewrite is a very major undertaking. To keep
transaction support and big buffers for output actions, not only the engine
needs to be modified, but all action modules need to know that new mode of
operation, and need to become fully aware of the threading model (e.g. they
must re-initialize instances when a new thread is created and drop them
when it is destructed). Some outputs may need to do different
serialization, and there is a full can of worms of things that will most
probably overlooked in the initial shot. Most actions would probably need
buffer management and background writers, and we would need to find out how
to convey (synchronously) back state information where the caller needs it
(e.g. for backup actions). And, of course, the rule engine itself must be
modified to support some totally different processing. Bottom line: a
loooooooooooooot of work to do (I think I said this in one sentence in the
initial posting to these topics last week).

So my approach is as usual: gradually evolve the engine within the current
constraints and move the limits slightly forward. No matter how much I
would like to rewrite all this things, I simply don't have the time to do
so.

Back to the context of this discussion: I am trying to find something that
is useful for many cases, and that I can actually implement. I think we can
do 80% of the desired things (probably more leading to 98%) with relatively
unintrusive changes.


> Is this approach also happening on batches of messages being parsed on the
> input side?


There are no actions on the input side. As I said: all the input does, is
read the source, stuff messages into the main queue. Of course, if we use
input batches, we need to stuff the messages at first into an interim
structure, which is then passed onto the queue, as this only requires a
single locking operation (again, looking for corase-grained parallelity).
[And, yes. if we used the input batch directly inside the queue instead of
"unpacking" it, we would gain efficiency. Again, I'd love to do it (and
have tried several times to get time for it), but it's not possible with
the current resources. Frankly, re-writing rsyslog from scratch with 10+
yrs of experience would definetly bring improvement, but...).


> if so, the cache thrashing may explain some of the slowdown on the input
> side.
>
>
I think I didn't share that number. We did the udp testing. V5 was
consistently slower than v7. So there is no slowdown. And v7 with 2 to 3
threads on a recvmmsg()-supporting kernel works pretty slick ;)

Again, sorry I do not want to go into the discussion of rule engine details
and potential improvements at this time. I know that if the core engine
would be rewritten, the problem of global variables would disappear. But,
again, this is too large a task to work on now (or do we have a sponsor for
6 month of work? ;)). So I am looking to do best under the given
constraints.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to