I just blogged the description, and added this hopefully useful sentence: Note that the consistency of the action instance data is actually guarded by the rsyslog core by actually running the output module processing on a single thread *for that action*. But the output module code may be called concurrently if more than one action uses the same output module. That is a typical case. If so, each of the concurrently running instances receives its private instance data pointer but shares everything else.
In context: http://blog.gerhards.net/2010/06/what-are-actions-and-action-instance.html Rainer > -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of Rainer Gerhards > Sent: Wednesday, June 09, 2010 10:53 AM > To: rsyslog-users > Subject: Re: [rsyslog] discussion request: performance enhancement for > imtcp > > > > Also, strings are not generated before the actions, but while we > are > > > processing them. Doing it up front would require even more memory > and > > > processing time, because we would need to run over all actions > twice > > (once to > > > create the string, once to call the actions, storing all strings > > created). > > > This does not even make sense from a lock contention POV because > each > > action > > > has a separate lock, so there can be no lock contention between > > different > > > actions. The question of whether to generate all strings for ONE > > action > > > upfront was the initial question, and I think we have reached some > > consensus > > > on it (meaning that it is at least wroth trying out the performance > > effects > > > and then decide). > > > > I'm not quite clear on the granularity here. > > > > if I have the config > > > > *.* file1 > > *.* file2 > > *.* @ip1 > > *.* @ip2 > > *.* @@ip3 > > *.* @@ip4 > > > > for purposes of the locking, how many separate things are there? > > Sorry, I think should have defined some terms first. > > An *action* is a specific instance of some desired output. The actual > processing carried out is NOT termed "action", even though one could > easily > do so. I have to admit I have not defined any term for that. So let's > call > this processing. That actual processing is carried out by the output > module > (and the really bad thing is that the entry point is named "doAction", > which > somewhat implies that the output module is called the action, what is > not the > case). > > Each action can use the service of exactly one output module. Each > output > module can provide services to many actions. So we have a N:1 > relationship > between actions and output modules. > > > depending on how I read your explination, sometimes it sounds like 6 > > (one > > for each line) and sometimes itsounds like 3 (one for file output, > one > > for > > UDP send, one for TCP send) > > In the above samples, 3 output modules are involved, where each output > module > is used by two actions. We have 6 actions, and so we have 6 action > locks. > > So the output module interface does not serialize access to the output > module, but rather to the action instance. All action-specific data is > kept > in a separate, per-action data structure and passed into the output > module at > the time the doAction call is made. The output module can modify all of > this > instance data as if it were running on a single thread. HOWEVER, any > global > data items (in short: everything not inside the action instance data) > is > *not* synchronized by the rsyslog core. The output module must take > care > itself of synchronization if it desires to have concurrent access to > such > data items. All current output modules do NOT access global data other > than > for config parsing (which is serial and single-threaded by nature). > > I hope this clarifies. If not, please keep asking. It is important to > get > this right, and maybe I finally end up expressing me precise enough ;) > > Rainer > > > > > >> note that the output lock is only needed when the two threads > really > > >> are > > >> accessing the same thing (probably only for files, as you can have > > two > > >> network connections to the same destination at the same time, in > > which > > >> case you can use the path name as the lock id). For things like > > >> databases, > > >> network relays (including relp) it would probably be better if > each > > >> worker > > >> thread opened it's own connection. In these cases the destination > is > > >> designed to accept messages in parallel on multiple connections > > anyway. > > >> The good news is that the more complex (and slower) sending > methods > > >> also > > >> tend to be the ones that can have multiple outbound connections. > > > > > > I agree, but that's another quite large effort. None of the current > > outputs > > > are designed in that way, and it introduces quite some complexity > in > > error > > > and recovery cases. Right now, I'd consider this the last thing > that > > I'd > > > address. > > > > Ok, we'll discuss this when dealing with thread-safe output modules > > > > >> I seem to remember reading in the module explination that you do > > some > > >> trickery to take fairly normal code written in the module and make > > it > > >> thread-safe (by doing something with the variable access IIRC). > > > > > > That trick simply is the action lock -- so there is no concurrency > at > > that > > > level. But I agree (and have begun to work on that idea) that it > > would be > > > useful to provide that capability, at least if the output supports > > it. As it > > > turned out today, there is still some other ground to explore > before > > going > > > down that path. > > > > ahh, that makes sense (I was puzzeled over what trickery you had done > > to > > make the variables be thread-safe) > > > > >> If you have this (and use the filename as the lock) you also gain > > >> protection against two different actions stepping on each other. > > >> > > >> I have a growing number of cases where I have things like > > >> :hostname, isequal, "foo" /var/log/messages;fixup_format > > >> & ~ > > >> *.* /var/log/messages > > >> > > >> this works today if I'm sending over the network instead of > writing > > to > > >> a > > >> file, but on my relay boxes (which do both) I have a number of > > >> corrupted > > >> messages each day due to the different actions stepping on each > > other. > > > > > > That is a bug I would be interested in finding. The threading model > > does NOT > > > allow for that possibility (I mean from a design point, as you > > experience it > > > happens, but the design does not mean this is valid). Still I will > > keep > > > myself now focused a bit on the performance optimization, it > doesn't > > make > > > sense to now, that I have gained up momentum and knowledge in that > > area > > > again, start another bughunt and loose that momentum. But that's > > definitely > > > something I am interested in, it shows something works > fundamentally > > flawed. > > > > Ok, one thing at a time. > > > > >> note that if you do this output locking on files, it may be > possible > > to > > >> do strange things like > > >> > > >> =*.info /var/log/messages > > >> =*.debug /var/log/messages > > >> etc > > >> > > >> and allow these to have multiple worker threads running so that > each > > >> worker be processing messages with different severity as different > > >> actions > > >> in parallel (with just a write lock around the final output to the > > >> file). > > >> This is far uglier than being able to do the action processing in > > >> parallel, but may work. > > > > > > ah, OK, I guess I get the picture. You are writing to files with > more > > than > > > one action. That does not work well. Ruleset inclusion is the > current > > > solution to it. In the long term, it may be useful to have a single > > object > > > that represents the file being written, no matter which rule is > used > > to do > > > it. I'd say that's something that would go together with the new > > config > > > format... > > > > I think that it wouldn't need any change to the configs. the more I > > think > > about it the more I think this is only really a significant problem > for > > file output and (there it should be pretty trivial to implement), > > everything else can just have multiple sockets open (one per > > thread) > > > > >> I don't see much here where threads handling one message instead > of > > >> multiple messages could speed things up much. Since writes are not > > >> atomic, > > >> you still need the output locks (or multiple outputs) even if only > > >> processing one message at a time. > > >> > > >> single thread, single message is a simpler case, but in that case > > the > > >> locking will be very close to a no-op anyway (since there will > never > > be > > >> contention) > > > > > > One thing that I found out during my research and testing is that > it > > pays to > > > look at a far more granular level, and todays change is the first > > real-world > > > approach to this. Not craft one method that does it all, but see > the > > > different config params and what they demand (same for > transactions, > > etc, > > > etc.). Then code "driver"-like functions for that specific case and > > call the > > > rigth one for the config params given. That way it is possible to > > provide > > > high speed where it is possible but provide some costly features as > > well. > > > Then, they do not affect the majority of cases that do not need > them > > (in > > > other words: pay the performance penalty only if you also get some > > benefits > > > from themn). The same holds true for some other optimizations that > > can only > > > be done when looking at a very fine-granular level. I think that it > > will be > > > possible to even get rid of locks at all in some important cases. I > > will most > > > probably try to introduce some lock-free alternative for the "mark" > > case, not > > > only to cover it, but also to see how it works in practice. Out of > my > > testing > > > and reasearch, it should provide superb performance. If that turns > > out to be > > > true, I see many more potential for these methods. > > > > sounds good. lock free will almost always win. > > > > > I will try this at the moment, but at the expense of stability. The > > next > > > days, I'll try out at least some ideas and only after that I will > see > > what it > > > takes to stabilize the engine in all cases again (getting a too- > large > > delta > > > may make this stabilization too hard, doing the stabilization too > > early > > > distracts me from the real facts I intend to look at - but who said > > life is > > > easy ;)). > > > > I'm going to get mylab setup again to test this. > > > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

