> -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Monday, May 31, 2010 12:30 PM > To: rsyslog-users > Subject: Re: [rsyslog] Where is the output module for the udp > transportationtoremote syslog server > > On Mon, 31 May 2010, [email protected] wrote: > > > On Mon, 31 May 2010, Rainer Gerhards wrote: > > > >>> I agree that doing it in the output would be far better in many > ways, > >>> but > >>> since there isn't a way to do a plugin there (at least not as far > as I > >>> know, it would be good to get confirmation or a better idea) > >> > >> David, can you tell me what you have on your mind for this > functionality? I > >> have thought a bit about it, and I probably have one approach > myself. But I > >> would prefer to hear your idea before I push you into a direction. > > > > > > two options > > > > 1. something that would work similar to the existing format > > string, but would call a C subroutine that could read the existing > > properties and would create the output string in a buffer > > > > 2. something that could also modify the exisitng properties (more > > powerful, but also more dangerous and could involve locking to > prevent > > other things from trying to read properties at the same time) > > > > we haven't gone too far down the road of researching the output > > performance (since the input and queue locking has dominated so far), > but > > it is clear that the output currently takes significantly more CPU > time > > than input, it may be that being able to use C to define the output > format > > instead of interpreting the format string may be a noticable > improvement. > > Is there a relativly easy way to test this? (say, hard-code a format > or > > two and test writes to file and network with the hard-coded format vs > a > > format string that produces the same output?) > > for the traditional output formats the difference may not be that much, > but if there is extensive parsing involved (as the initial poster is > doing, or what I would expect is common for specific log types into a > database) the difference can be much more significant since it can > replace > multiple regex statements with a much faster single pass that looks for > word breaks and inserts standard filler in those spots. > > With the new syslog format where the data is 'supposed to be' in a > series of name=value tuples, something like this would be a pretty > efficiant way of extracting particular portions of the data to be > output > (although the properties could be extended to do this sort of thing by > providing something similar to a perl hash)
You are looking in the same direction I am, and I think this is good news ;) The current engine supports functions coded in C, but not yet as real plugins nor in an easy to see way. It is done via a crude function interface library module, and only within the script engine. My original plan (over a year, or even two, ago) was to generalize these library plugins, so that it is easy to add new code and load them as plugins. Actually, making them available as plugins should not be too much work given the already existing infrastructure. There already exist a handful of "function modules", the control structure is just statically created during compile time, much as some of the output plugins are statically linked. Then the original plan was to enable templates to call scripts and enable scripts to define templates (kind of). Unfortunately, I got distracted by more important things before I could complete all of this. HOWEVER, at this time performance was not a major concern. With what has evolved in the mean time, I do not like the original approach that much any longer. At least the script engine must become much faster before I can take a real look at that capability. Right now, scripts generate a interim code that then is interpreted by a (kind of) virtual machine. A script invocation inside a template would mean that a VM must be instantiated, the script interpreted and the resulting string be used as template contents. Clearly, this is not for high-performance use. Still, however, it may be useful to have that capability for those cases, where performance is not the #1 consideration. But given that everything would need to be implemented, it does make limited sense to look into something known to be too slow in the long run. BTW, this is one reason that I have not yet continued to work on the script engine, knowing that some larger redesign is due to fit it into the now much tighter runtime constraints. On the performance of the output system: I think the system in general is quite fast and efficient, with only ONE important exception: that is, if multiple replacements need to happen. Still, the algorithm is quite efficient, but it is generic and needs to run though a number of steps. Of course, it is definitely faster to permit a C plugin to look at the message and then format, in an "atomic" way the resulting custom string. Thus, you need to write multiple C codes instead of using a generic engine, but can do so in a much higher performance way. I would assume, however, that this approach cannot beat the simple templates we usually use (maybe by less than 5% and, of course, there may be cases where this matters). As you know, my current focus is speed, together with some functional enhancements. I was looking at queue operations improvements, but the potential output speed improvements may be more interesting than the queue mode improvements (and apply to more use cases). So it may make sense to look into these, first. My challenge here is to find something that is a) generic enough to be useful in various (usual) cases b) specific enough to be rather fast and it should also be able to implement within a few weeks at most, because I can probably not spend much more time on a single feature/refactoring. One solution may be to create "template modules". I could envision a template module to be something that generates the template string *as a whole* from the input message. That is, we would have $template current-style,"%msg%\n" but also (**) $modload tplcustom $template custom,tplcustom where tplcustom generates the template string. While this sounds promising, we have some issues. One immediately pops up my mind: we will probably be able to use the same template for file writing or forwarding, but for file writing we need a LF at the end, while for forwarding we do not need it. So the most natural way would be to have the ability to embed a "custom template" into a regular template, like suggested by this syntax: $template both,"%=tplcustom%\n" however, this brings us down to the slippery slope of the original design. As a next thing to be requested, I could ask for using not the msg object (with its fixed unmodified properties), but rather of a transformation of the message object. So we would end up with something like this: $template cmplx,"%=tplcustom(syslogtag & msg)%" Which would require a much more complex logic working behind the scenes. Of course, depending on the format used, the engine could select different processing algorithms. Doing this on the fly seems possible, but requires more work than I can commit in one sequence. Also, it would be useful to have the ability to persist already-generated properties with the message while it is continued to be processed in the rule engine. So far, we do not have this ability, and the reason is processing time (plus, as usual, implementation effort): for that, we would need to maintain a list (or hash, ...) of name/value pairs, store them to disk for disk queues and shuffle them through the rule engine as processing is carried out. As I said, quite doable, but another big addition. So I am somewhat stuck with things that sound interesting, but are a bit interdependent. Doing them all together is too big to be useful, and it will probably fail because I can probably not keep focus on all of the for the next, say, 9 to 12 month that it would require to complete everything. So I am again down to picking what is most useful. Out of this discussion, it looks like the idea I marked with (**), the plain C template generator could be a useful route to take. I am saying this under the assumption that it would be relatively easy to implement and cause at least some speedup in standard cases (contrary to what I expect, I have to admit...). But that approach is highly specialized, requiring a C module for each custom format. So does it really serve the rsyslog community well - or just some very isolated use cases? Thinking more about it, it would probably be useful if it is both a) relatively easy to implement and b) causes some speedup in standard cases But b) cannot be proven without actually implementing the interface. So, in practice, the questions boils down to what we *expect* about the usefulness of this utility. Having said that, I'd appreciate feedback, both on the concrete question of the usefulness of this feature as well as any and all comments on the situation at large. I am trying to put my development resources, which thankfully have been somewhat increased nowadays :) to the area where they provide greatest benefit. Rainer _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

