On Mon, 20 Apr 2009, Rainer Gerhards wrote:

>> -----Original Message-----
>> From: [email protected] [mailto:rsyslog-
>> [email protected]] On Behalf Of [email protected]
>>
>> On Mon, 20 Apr 2009, Rainer Gerhards wrote:
>>
>>> David,
>>>
>>> I start with some quick pointers. I think it makes sense to move the
>> results
>>> of this discussion into a document - or alternatively move it to the
>> wiki, if
>>> you (or others) find this useful. I have to admit that I am a bit
>> skeptic
>>> about the wiki, I guess mail is better for discussion here. But I
>> wanted to
>>> mention this option.
>>>
>>> Now on to the meat:
>>>
>>>> -----Original Message-----
>>>> From: [email protected] [mailto:rsyslog-
>>>> [email protected]] On Behalf Of [email protected]
>>
>>
>> hmm, I suspect that having the 'direct' mode able to do this IFF (if
>> and only if) all output modules are able to do the multi-message
>> handling
>> would be a win.
>
> You can't do that, because if it is in direct mode, there always is at most
> one message inside the queue. You can not operate on the main message queue
> "batch", as this is not yet filtered, so you do not know which message is for
> which action. So, from the action perspective, nothing is queued at this
> point. Thus, you need a queue running in a real queue mode. I hope it will
> become more clear if you have looked at the data flow (otherwise I need to
> write some big overview about it...).

I had not thought about the filtering issue

>>
>> specificly I expect to find that the locking process to deliver a
>> single
>> message is expensive enough
>
> This is handled by the main queue batch. So even in direct mode, we have the
> benefit from the locking code improvement (I agree, potentially a *very big*
> gain). I guess you currently think of a single big queue inside rsyslog,
> which is the wrong picture. We have chained queues and you always need to
> look which part of the message processing works on which queues. Very
> important implications!

this is a big difference. yes, I was thinking that there was one big queue 
(unless you defined action queues explicitly), I'll pay very careful 
attention to the tutorial and let you know if it explains this.

>> that it's a big win even for the simple
>> default case of writing to a file. I also expect to see wins for moving
>> events from the main queue to the action queues.
>
> Yup, thus the direct mode oft he action queue does not affect the main queue
> at all (and in direct mode we have no locing in the action queues, why should
> we ... nothing needs to by synchronized if you just stick the message into
> the output...)

and if you have multiple output threads?

>>> It gets messy when there is failure in the actions and it gets very
>> complex
>>> if we think about the various shutdown scenarios (not to mention disk
>>> assisted queues actually running in DA mode). I have begin to look at
>> these
>>> issues (part of today's and over-the-weekend thinking ;)), but this
>> will
>>> probably need some more time to finally solve - plus some discussion,
>> I
>>> guess...
>>
>> would it simplify things significantly to say that the multi-message
>> output and having multiple worker threads are exclusive?
>
> Unlikely (but I don't like to totally outrule it, probability less than 5%)

Ok, not an issue then

>>
>>>>
>>>> X=max_messages
>>>>
>>>> if (messages in queue)
>>>>    mark that it is going to process the next X messages
>>>>    grab the messages
>>>>    format them for output
>>>>    attempt to deliver the messages
>>>>    if (message delived sucessfully)
>>>>      mark messages in the queue as delivered
>>>>      X=max_messages (reset X in case it was reduced due to delivery
>>>> errors)
>>>>    else (delivering this batch failed, reset and try to deliver the
>>>> first half)
>>>
>>> I think, in our previous discussion (mailing list archive), we
>> concluded that
>>> there is no value in re-trying with half of the batch.
>>
>> very possibly, I'm not remembering it.
>>
>> not doing so will simplify the code considerably, but the advantages of
>> retrying with half the batch are:
>>
>> 1. you deliver as much as you can
>>
>> 2. when you finally get stuck, you can pinpoint directly what message
>> you
>> were stuck on (in case you have a failure based on the data, say quotes
>> in
>> something that then gets formatted into a database, or slashes in
>> something that becomes a filename component)
>>
>> your call
>
> I need to refer you back to our previous discussion. Unfortunately, it was
> private. I dug the link out and sent it via private mail. Sorry all others,
> please stand by a little moment. If I have not read it wrong, it boiled down
> to we have no non-transactional sources that were problematic and we had not
> identified cases where it would be useful to retry with fewer elements.
>
> I'd provide a more complete description, but that would probably take me
> another 2...4 hours, and I hope to get around (yes, it was a reeeaaaly long
> discussion). David, if you like to quote anything from me, feel free to do
> so.

I'll dig through this today and tonight and review this

to be clear, I'm mostly concerned about the debugging/troubleshooting 
issues (which one of these 1000 messages made the database complain..). 
but I guess this can be addressed by stopping rsyslog and restarting it 
with a smaller batch size until you track it down. it should be rare 
enough to make that tolerable.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to