Hi all, I am replying to this post as it has all the buzzwords I need for my reply. I've read and thought about all others, as well as done quite a bit of web research. So please everyone see which part of the reply is actually a reply to another question raised as well ;)
> -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Thursday, June 24, 2010 6:34 PM > To: rsyslog-users > Subject: Re: [rsyslog] feedback requested: NEW rsyslog.conf format -- > XML? > > On Thu, 24 Jun 2010, Mr. Demeanour wrote: > > > [email protected] wrote: > >> On Thu, 24 Jun 2010, Michael Biebl wrote: > >> > >>> Looking at the XML config example I have to admit that I don't > >>> really like how it looks and feels. > >>> > >>> Even david's example looks rather verbose and I also share the > >>> concern that such an XML file would be hard to edit using e.g. vi. > >>> > >>> I just stumbled upon > >>> http://stackoverflow.com/questions/1925305/best-config-file-format. > >>> > >>> > >>> One alternative already discussed here, is YAML. Then the site > >>> above mentions INI style format, which basically everyone knows but > >>> question is, if it's flexible enough. > >> > >> INI is good for one level of item->subvalue data, but with rsyslog I > >> think we need the ability to go arbararily deep > >> > >>> Has anyone experience with JSON or LUA? > >> > >> both of these are really full languages (or subsets of languages). I > >> don't think they are appropriate for config files (and I really > >> question them as being appropriate for data transfers, even though > >> JSON is used for this purpose extensivly) > > > > Hmmm. It struck me a few-dozen posts back in this thread that the > > configfile language perhaps *needs* to be a full language. > > > > Many perl and PHP apps have config files that are really data > > declarations in perl or PHP; I thought of suggesting a configfile > > written in perl. Then I reconsidered. > > > > Could you elaborate your objection to using JSON? you say you > "question" > > it, but we haven't seen your question. > > part of it is that it offends me to send data as a code snippet to be > interpreted. This has already caused security issues that people are > working around, but it just seems like a funamentally wrong thing to > do. I found one comment in the link that Michael provided that mentions a big problem that I did not really have on my focus: " Writing a perl script to hack on an INI file is trivial. Writing a perl script to hack on a Lua config file is not really possible in the general case. Even if you do know Lua, you can't generally write a program to load a Lua config file, examine the values, change a value, and write it back. It's not possible. INI files can be indexed and searched. But if the config file is a program, the key may be generated in an arbitrarily complicated way, so you may not find it. And on and on." Jason Orendorff Dec 18 '09 at 6:56 in http://stackoverflow.com/questions/1925305/best-config-file-format This will probably add the requirement to be able to auto-generate config files including the ability to read them by third-party tools. > If speed or security are not major issues, having a config language be > a > snippet of code is definantly convienient and lets the person do a huge > number of things that the program author never thought of (see simple > event correltator for an example of this), but in rsyslog speed is a > significant issue (processing multiple hundreds of thousands of logs > per > second doesn't leave much time) and I don't think that an interpreter > is > up to the task. Interpreted languages also usually don't support > multi-threaded operation well. This is a *very* important point. And it is the single reason why I re-thought about RainerScript and tend not to use it. While (in design) it can do anything I ever need, the interpretation is too slow -- at least as far as the current implementation is concerned. I have read up on Lua, and there seem to be large similarities between how Lua works and how RainerScript actually (in filters!) works. Let met assume that Lua is far more optimized than RainerScript. Even then, it is a generic engine and running that engine to actually process syslog data is simply too slow. In order to gain the high data rates we have. Using my test lab as an example, we are currently at ~250,000 mps. The goal for my next performance tuning step will be to double that value (I don't know yet when I will start with that work). Overall, the design shall be that rsyslog almost linearly scales with the number of CPUs (and network cards) added. I've done a couple of design errors with that in the past, but now I am through with that, have done a lot of research and think that I can achieve this nearly-linear speedup in the future. That means there will no longer be an actual upper limit on the number of messages per second rsyslog can process. Of course, even on a single processor, we need *excellent* performance. For the single-processor, this means we need highly optimized, up to the task algorithms that don't do many things generically. For the multi-processor, that means we need to run as many of these tasks truly concurrently. For example, in the last performance tuning step, I radically changed the way rules are processed. Rather than thinking in terms of messages and steps to be done on these, I now have an implementation that works, semi-parallel, on the batch as whole and (logically) computes sub-steps of message processing concurrently to each other (to truly elaborate on this would take a day or more to write, thus I spare the details). I don't think any general language can provide the functionality I need to do these sorts of things. This was also an important reason that lead to RainerScript, a language where I could define the level of granularity myself. The idea is still not dead, but the implementation effort was done wrongly. But I have become skeptic if a language at all is the right approach. Also note the difference between config and runtime engine. Whatver library / script/ format/ language we use for the config will, for the reasons given above, NOT be used during execution. It can only be used as a meta-language to specify what the actual engine will do. So if we go for Lua (for example), we could use Lua to build the rsyslog config objects. But during actual execution, we will definitely not use Lua. So we would need a way to express rsyslog control flow in Lua, what probably would stretch the spec too far. Note that a Lua "if then" would not be something that the engine uses, but rather be used to build a config object. So we still have the issue how to specify an "rsyslog engine if then" inside a Lua script". Except, of course, if you think that Lua can do regular processing, which I ruled out with argument above. > It may be possible to compile the interpeted language and then run > that, > but that starts to seem a bit complicated for a config language. > > Taking this approach would be an interesting thing to do, but I think > it > would end up being a pretty complete re-write of rsyslog. That's the reason I was so hesitant to touch the config format for years. It affects much more than just the config format. If we go and create a new config format, than I think we should reconsider use cases and see where the current engine does not provide things that are desirable. Some of them have not been implemented because I could stretch the config file format not that far, some others have not been implemented because I did not know them and some where left out because it was too complicated to add them to the current design (like config reload on HUP, probably the most complex of all). Some time in the future, all these things are needed. So I don't like the idea to introduce a new, but interim, config format that permits to specify what the current engine does. And then replace it again in a year or two with yet another format that then provides the capabilities necessary for the enhancements. At a minimum, I'd like to have a config format that has all the required flexibility, even though some features may not yet be used. However, even that means considerable change to rsyslog's core. This is a multi-month undertaking. I am sorry I did not convey this earlier -- it was so crystal-clear to me that I simply missed to tell... :( > the approach > I > would take would be to start by making everything that rsyslog does > into a > function and then have the 'config file' define the entire event > processing loop. Well, in a sense it is. But this statement has a co-notation of a one-to-one mapping between these functions and the way processing is actually carried out. That would mean a serialized execution. We have long departed from that point of view. Let me tell in a somewhat abstract way: the order of execution of the task inside a given configuration can be viewed as a partially ordered set. Some of the tasks need to be preceded by others, but a (large) number of tasks have no relationship. To gain speed and scalability, the rsyslog engine tries to identify this partial order and tries to run those task in parallel that have no dependency on each other. Also, one must note that a config file is written with the assumption of a single message traversing the engine, which is a gross simplification. In practice, we now have batches (multiple messages at once) traversing through the engine, where a lot of things are done concurrently and far different from what one would expect when looking at the config file (but in a functionally equivalent way). It is this transformation of in-sequence, single-message view to partial execution order, parallel view that provides the necessary speedup to be able to serve demanding environments. Back to scripting languages/config formats: I doubt the rsyslog rule engine will ever need loop constructs. It just needs "if" and sequence. To David's XML examples: I find none of them sufficiently readable. For each it took me (well knowing what was intended) some time to grasp the idea. My initial script-like example, I assume, was easy to grasp for everyone. >From what I know so far, YAML actually seems to be the best fit (but I have yet to see a rsyslog config with it). I *think* we could use the if-then-else constructs in a Python way inside YAML, something like if expr then Action 1: Param1: value1 ... Action 2: ... ... else Action n+1: .... end if Thinking about Michael's argument to use something people already know, I don't think this type of "uncommon" if-then-else should surprise someone, so I think it is OK in that sense as well. Remains the problem with libyaml. Let me assume it works, but is abandoned. Assuming (yet to prove!) I need to write a parser for some format in an case, I could actually start with libyaml and maintain it iff there is need to do so. Depending on the code quality (not checked yet), this may save me some or, hopefully, considerable time. This is the situation as I currently see it. This post is *not* meant to stop the discussion. The discussion is extremely helpful and I would love to hear more comments. This thread has brought us very far and even though we have started a new design iteration, that is useful. Better now notice something does not fit than when half the work is done ;) Rainer _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

