Re: [rsyslog] feedback requested: NEW rsyslog.conf format -- XML?

Rainer Gerhards Thu, 24 Jun 2010 23:25:10 -0700

Hi all,

I am replying to this post as it has all the buzzwords I need for my reply.
I've read and thought about all others, as well as done quite a bit of web
research. So please everyone see which part of the reply is actually a reply
to another question raised as well ;)

> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of [email protected]
> Sent: Thursday, June 24, 2010 6:34 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] feedback requested: NEW rsyslog.conf format --
> XML?
> 
> On Thu, 24 Jun 2010, Mr. Demeanour wrote:
> 
> > [email protected] wrote:
> >> On Thu, 24 Jun 2010, Michael Biebl wrote:
> >>
> >>> Looking at the XML config example I have to admit that I don't
> >>> really like how it looks and feels.
> >>>
> >>> Even david's example looks rather verbose and I also share the
> >>> concern that such an XML file would be hard to edit using e.g. vi.
> >>>
> >>> I just stumbled upon
> >>> http://stackoverflow.com/questions/1925305/best-config-file-format.
> >>>
> >>>
> >>> One alternative already discussed here, is YAML. Then the site
> >>> above mentions INI style format, which basically everyone knows but
> >>> question is, if it's flexible enough.
> >>
> >> INI is good for one level of item->subvalue data, but with rsyslog I
> >> think we need the ability to go arbararily deep
> >>
> >>> Has anyone experience with JSON or LUA?
> >>
> >> both of these are really full languages (or subsets of languages). I
> >> don't think they are appropriate for config files (and I really
> >> question them as being appropriate for data transfers, even though
> >> JSON is used for this purpose extensivly)
> >
> > Hmmm. It struck me a few-dozen posts back in this thread that the
> > configfile language perhaps *needs* to be a full language.
> >
> > Many perl and PHP apps have config files that are really data
> > declarations in perl or PHP; I thought of suggesting a configfile
> > written in perl. Then I reconsidered.
> >
> > Could you elaborate your objection to using JSON? you say you
> "question"
> > it, but we haven't seen your question.
> 
> part of it is that it offends me to send data as a code snippet to be
> interpreted. This has already caused security issues that people are
> working around, but it just seems like a funamentally wrong thing to
> do.

I found one comment in the link that Michael provided that mentions a big
problem that I did not really have on my focus:

" Writing a perl script to hack on an INI file is trivial. Writing a perl
script to hack on a Lua config file is not really possible in the general
case. Even if you do know Lua, you can't generally write a program to load a
Lua config file, examine the values, change a value, and write it back. It's
not possible. INI files can be indexed and searched. But if the config file
is a program, the key may be generated in an arbitrarily complicated way, so
you may not find it. And on and on."

Jason Orendorff Dec 18 '09 at 6:56 in
http://stackoverflow.com/questions/1925305/best-config-file-format

This will probably add the requirement to be able to auto-generate config
files including the ability to read them by third-party tools.

> If speed or security are not major issues, having a config language be
> a
> snippet of code is definantly convienient and lets the person do a huge
> number of things that the program author never thought of (see simple
> event correltator for an example of this), but in rsyslog speed is a
> significant issue (processing multiple hundreds of thousands of logs
> per
> second doesn't leave much time) and I don't think that an interpreter
> is
> up to the task. Interpreted languages also usually don't support
> multi-threaded operation well.

This is a *very* important point. And it is the single reason why I
re-thought about RainerScript and tend not to use it. While (in design) it
can do anything I ever need, the interpretation is too slow -- at least as
far as the current implementation is concerned. I have read up on Lua, and
there seem to be large similarities between how Lua works and how
RainerScript actually (in filters!) works. Let met assume that Lua is far
more optimized than RainerScript. Even then, it is a generic engine and
running that engine to actually process syslog data is simply too slow.

In order to gain the high data rates we have. Using my test lab as an
example, we are currently at ~250,000 mps. The goal for my next performance
tuning step will be to double that value (I don't know yet when I will start
with that work). Overall, the design shall be that rsyslog almost linearly
scales with the number of CPUs (and network cards) added. I've done a couple
of design errors with that in the past, but now I am through with that, have
done a lot of research and think that I can achieve this nearly-linear
speedup in the future. That means there will no longer be an actual upper
limit on the number of messages per second rsyslog can process. Of course,
even on a single processor, we need *excellent* performance.

For the single-processor, this means we need highly optimized, up to the task
algorithms that don't do many things generically.

For the multi-processor, that means we need to run as many of these tasks
truly concurrently.

For example, in the last performance tuning step, I radically changed the way
rules are processed. Rather than thinking in terms of messages and steps to
be done on these, I now have an implementation that works, semi-parallel, on
the batch as whole and (logically) computes sub-steps of message processing
concurrently to each other (to truly elaborate on this would take a day or
more to write, thus I spare the details).

I don't think any general language can provide the functionality I need to do
these sorts of things. This was also an important reason that lead to
RainerScript, a language where I could define the level of granularity
myself. The idea is still not dead, but the implementation effort was done
wrongly. But I have become skeptic if a language at all is the right
approach.

Also note the difference between config and runtime engine. Whatver library /
script/ format/ language we use for the config will, for the reasons given
above, NOT be used during execution. It can only be used as a meta-language
to specify what the actual engine will do.

So if we go for Lua (for example), we could use Lua to build the rsyslog
config objects. But during actual execution, we will definitely not use Lua.
So we would need a way to express rsyslog control flow in Lua, what probably
would stretch the spec too far. Note that a Lua "if then" would not be
something that the engine uses, but rather be used to build a config object.
So we still have the issue how to specify an "rsyslog engine if then" inside
a Lua script". Except, of course, if you think that Lua can do regular
processing, which I ruled out with argument above.

> It may be possible to compile the interpeted language and then run
> that,
> but that starts to seem a bit complicated for a config language.
> 
> Taking this approach would be an interesting thing to do, but I think
> it
> would end up being a pretty complete re-write of rsyslog. 

That's the reason I was so hesitant to touch the config format for years. It
affects much more than just the config format. If we go and create a new
config format, than I think we should reconsider use cases and see where the
current engine does not provide things that are desirable. Some of them have
not been implemented because I could stretch the config file format not that
far, some others have not been implemented because I did not know them and
some where left out because it was too complicated to add them to the current
design (like config reload on HUP, probably the most complex of all).

Some time in the future, all these things are needed. So I don't like the
idea to introduce a new, but interim, config format that permits to specify
what the current engine does. And then replace it again in a year or two with
yet another format that then provides the capabilities necessary for the
enhancements. 

At a minimum, I'd like to have a config format that has all the required
flexibility, even though some features may not yet be used. However, even
that means considerable change to rsyslog's core. This is a multi-month
undertaking. 

I am sorry I did not convey this earlier -- it was so crystal-clear to me
that I simply missed to tell... :(

> the approach
> I
> would take would be to start by making everything that rsyslog does
> into a
> function and then have the 'config file' define the entire event
> processing loop. 

Well, in a sense it is. But this statement has a co-notation of a one-to-one
mapping between these functions and the way processing is actually carried
out. That would mean a serialized execution.

We have long departed from that point of view. Let me tell in a somewhat
abstract way: the order of execution of the task inside a given configuration
can be viewed as a partially ordered set. Some of the tasks need to be
preceded by others, but a (large) number of tasks have no relationship. To
gain speed and scalability, the rsyslog engine tries to identify this partial
order and tries to run those task in parallel that have no dependency on each
other. Also, one must note that a config file is written with the assumption
of a single message traversing the engine, which is a gross simplification.
In practice, we now have batches (multiple messages at once) traversing
through the engine, where a lot of things are done concurrently and far
different from what one would expect when looking at the config file (but in
a functionally equivalent way). It is this transformation of in-sequence,
single-message view to partial execution order, parallel view that provides
the necessary speedup to be able to serve demanding environments.

Back to scripting languages/config formats:

I doubt the rsyslog rule engine will ever need loop constructs. It just needs
"if" and sequence.

To David's XML examples: I find none of them sufficiently readable. For each
it took me (well knowing what was intended) some time to grasp the idea. My
initial script-like example, I assume, was easy to grasp for everyone.

>From what I know so far, YAML actually seems to be the best fit (but I have
yet to see a rsyslog config with it). I *think* we could use the if-then-else
constructs in a Python way inside YAML, something like

if expr then
   Action 1:
     Param1: value1
     ...
   Action 2:
     ...
   ...
else
   Action n+1:
   ....
end if

Thinking about Michael's argument to use something people already know, I
don't think this type of "uncommon" if-then-else should surprise someone, so
I think it is OK in that sense as well.

Remains the problem with libyaml. Let me assume it works, but is abandoned.
Assuming (yet to prove!) I need to write a parser for some format in an case,
I could actually start with libyaml and maintain it iff there is need to do
so. Depending on the code quality (not checked yet), this may save me some
or, hopefully, considerable time.

This is the situation as I currently see it. This post is *not* meant to stop
the discussion. The discussion is extremely helpful and I would love to hear
more comments. This thread has brought us very far and even though we have
started a new design iteration, that is useful. Better now notice something
does not fit than when half the work is done ;)

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] feedback requested: NEW rsyslog.conf format -- XML?

Reply via email to