Re: [rsyslog] discussion request: performance enhancement for imtcp

david Tue, 08 Jun 2010 04:11:01 -0700

On Tue, 8 Jun 2010, Rainer Gerhards wrote:

>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> On Mon, 7 Jun 2010 22:11:37 +0200, "Rainer Gerhards"
>> <[email protected]> wrote:
>
> this is not correct. But you miss the action part.
>
> Remember that the output plugin interface specifies that only one thread may
> be inside an action at any time concurrently. This was introduced to
> facilitate writing output plugins. In theory, an output plugin can request to
> be called concurrently, but this is not yet implemented. So we need to hold
> on to the action lock (NOT queue lock) whenever we call an action.
>
> Even more, transactions mean that we must not interleave two or more batches.
> Let's say we had two batches A und B, each with 4 messages. Then calling the
> output as follows:
>
> Abegin
> Bbegin
> A1
> A2
> B1
> A3
> B2
> A4
> Acommit
> B3
> B4
> Bcommit
>
> would mean that at Acommit, messages A1,..,A4,B1,B2 would be committed. This
> could be worked around by far more complex ouput plugins. These would then
> need to not only support concurrency but also keep separate
> objects/connections for the various threads. This, if at all, makes only
> sense for database plugins. I don't see if the added overhead would make any
> sense at all to things like the file writer.
>
> But as we have already discussed, it is not so easy to keep the file writer
> problem free in that case as well -- because it may get interrupted during
> writes (which means we need a lock, even if we manage to permit more
> concurrency inside the file writer).
>
> So in essence, the area to look at is that we can restructure the output
> plugin interface in regard to its transaction support. I am currently looking
> at this area and have done some preliminary testing. My main concern at this
> time is to find those spots that actually are the primary bottlenecks (at
> this time, hopefully moving the border forward ;)). The past hours I
> thankfully was able to get same base results and match them with what I
> expect. At some other places, the results surprise me a bit. This is not
> unexpected -- I had no time to touch that code (under a performance poing of
> few) for roughly a year, so I need to gain some new understanding. Also, the
> code has evolved, and it may be possible to refactor it into something
> simpler (which is good for both performance and maintability).
>
> As one of the next things, I will probably use the "big memory, off sync"
> string generation, just to see the effects (it is rather complicated to get
> that in cleanly, because there was so much optimization in v4 on cache hit
> efficiency, parts of which must be undone). Along that way, I will also
> analyze the calling structure and search for simplifications.


hmm, I was thinking something along the lines of the following (crafting 
details as I type, so there may be errors here)

queue a1 a2 a3 a4 b1 b2 b3 b4 c1 .....

worker thread 1              worker thread 2
lock queue
mark a1-a4 'in process'
unlock queue
start processing action        lock queue
   by creating output           find a1-a4 'in process'
   strings (one per action)       so mark b1-b4 'in process'
                                unlock queue
                                start processing action
                                  by creating output
                                  strings (one per action)
time passes                    time passes
                                for each action
for each action                   lock output
   lock output                     send string
   send string                     unlock output
   unlock output                lock queue
                                mark b1-b4 complete
                                find that b1 is not the beginning
                                  of the list and do nothing further
                                unlock queue
lock queue
mark a1-a4 complete
find that a1-b4 are all
   marked as complete so
   move start-of-queue to c1


the locks on the output are a simple mutex for each output (very cheap if 
nothing else is holding the lock, which since only writing takes place 
within it should be the common case), which worker thread gets to a 
particular output first doesn't matter, as long as it flushes all it's 
work before releasing the lock.

note that the output lock is only needed when the two threads really are 
accessing the same thing (probably only for files, as you can have two 
network connections to the same destination at the same time, in which 
case you can use the path name as the lock id). For things like databases, 
network relays (including relp) it would probably be better if each worker 
thread opened it's own connection. In these cases the destination is 
designed to accept messages in parallel on multiple connections anyway. 
The good news is that the more complex (and slower) sending methods also 
tend to be the ones that can have multiple outbound connections.

for writing to a file, you need some sort of lock to be able to have 
multiple threads without the threads stepping on top of each other with 
their writes anyway.


this assumes that the two worker threads can do everything (except 
possibly output the data) for different messages in parallel.

I seem to remember reading in the module explination that you do some 
trickery to take fairly normal code written in the module and make it 
thread-safe (by doing something with the variable access IIRC). A similar 
trick for the actual output could have a flag to toggle between 'single 
output with locking' and 'each worker thread gets a duplicate output with 
no locking' so that it's not a huge complexity in each output module 
('just' a one-time complexity to setup the handling)

if all you are doing is to have an action lock that single-threads all 
activity for that action, then this isn't possible.



If you have this (and use the filename as the lock) you also gain 
protection against two different actions stepping on each other.

I have a growing number of cases where I have things like
:hostname, isequal, "foo" /var/log/messages;fixup_format
& ~
*.* /var/log/messages

this works today if I'm sending over the network instead of writing to a 
file, but on my relay boxes (which do both) I have a number of corrupted 
messages each day due to the different actions stepping on each other.


note that if you do this output locking on files, it may be possible to 
do strange things like

=*.info /var/log/messages
=*.debug /var/log/messages
etc

and allow these to have multiple worker threads running so that each 
worker be processing messages with different severity as different actions 
in parallel (with just a write lock around the final output to the file). 
This is far uglier than being able to do the action processing in 
parallel, but may work.


Having re-read your message and my thoughts, I think I end up arguing for 
changes to the output like you were speculating at above.

I don't see much here where threads handling one message instead of 
multiple messages could speed things up much. Since writes are not atomic, 
you still need the output locks (or multiple outputs) even if only 
processing one message at a time.

single thread, single message is a simpler case, but in that case the 
locking will be very close to a no-op anyway (since there will never be 
contention)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] discussion request: performance enhancement for imtcp

Reply via email to