Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

Rainer Gerhards Thu, 08 Oct 2015 06:09:02 -0700

2015-10-08 11:54 GMT+02:00 David Lang <[email protected]>:
> On Thu, 8 Oct 2015, Rainer Gerhards wrote:
>
>> 2015-10-08 7:07 GMT+02:00 Rainer Gerhards <[email protected]>:
>>>
>>> Sent from phone, thus brief.
>>> Am 07.10.2015 23:15 schrieb "David Lang" <[email protected]>:
>>>>
>>>>
>>>> I would have expected rsyslog to show errors in it's logs and/or
>>>> problems
>>>> in impstats when maxopenfiles is hit and it can't open a file for
>>>> output.
>>>
>>>
>>> ACK,  that strongly smells like a bug.
>>
>>
>> I have looked into it, and I now think I know why no error message was
>> generated: this can lead to an error message loop. This is the case if
>> the error message is written to a file that itself had the error. So
>> the next write will also generate one and then rsyslog is busy
>> reporting the error that happened during error reporiting ... and so
>> on. There is the default rate-limiter which places some upper bound on
>> the iterations, but this still can get very ugly.
>>
>> Note that tracking the "already reported" state for each file is more
>> or less out of question -- that would mean we would need to maintain a
>> list for all files that ever have been tried. Or is that acceptable?
>> In any case, this means we need to add a lot of code for this error
>> tracking.
>>
>> Any other ideas or suggestions in general?
>
>
> 1. it should result in a failure, not a processed for that message in
> impstats


yeah, that's another (related) issue

> 2. it should report via stdout/stderr

not sure. If we do not run under systemd, nobody will ever see that
message. If we run under systemd, systemd will forward it to rsyslog
and so we have the loop again.

It would only help if someone runs it in a terminal session, but IMHO
we should try to avoid going such lengths.

>
> 3. it should be reported as part of dynafile stats (since that's where it's
> most likely to be hit)

that's related to 1., need to see how this can be shuffled...

>
> 4. it would need to be rate limited, even if you don't have a message loop,
> having every new message generate an error message is not going to work. Can
> teh standard backoff for suspended/failed outptus be used here?


well, that's an even broader question. If we suspend the action for
failure, it will be stalled in that case. agreed, that makes some
sense. Not sure if it is good for most use cases, though.

Probably that's the best route to take, but that also means quite a
bit more of work on omfile is required. Not sure if I can slip it "in
between" the other work. Well, let's see what it might take.

Rainer

>
>
> David Lang
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

Reply via email to