Re: [rsyslog] Fwd: Re: rsyslog kills entire system => force reboot
On 09/27/2016 01:02 PM, Andre Lorbach wrote: > So far it seems to be very difficult to reproduce this problem. > Are you still able to reproduce the problem with 8.21? As you can imagine its quite difficult for me to reproduce it as well and at the moment I won't upgrade my production systems to a later version. > If yes could you send me the configuration you are using and the output > of: ldd /sbin/rsyslogd > > I am interested to see against which libfastjson library rsyslog is using, > it should be libfastjson.so.4 Yes it's libfastjson.so.4 But I had further problems with syslog, last friday nearly every server got a problem and again it was syslog Im not sure if it was the same problem since it was nearly on every system. What I found out so far is that nscd can block the system and go up 100%CPU and this problem is also related to syslog. (short story i've removed nscd from all systems since its not really required.) What I really need is a configuration which does work and drop messages even though messages can not be stored somewhere or whatever problem it is. CALL syslog() must not block the entire system. I know its not as specified in the RFC but Cheers Raffi > Best regards, > Andre Lorbach > >> -Original Message- >> From: rsyslog-boun...@lists.adiscon.com [mailto:rsyslog- >> boun...@lists.adiscon.com] On Behalf Of singh.janmejay >> Sent: Friday, September 16, 2016 10:46 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] Fwd: Re: rsyslog kills entire system => force > reboot >> How long does it take to go thru one cycle of verifying the problem > exists? >> I was wondering if bisecting would be viable? >> >> May not be required though, stats, entire config and all thread > backtrace will >> likely give you/us enough clues. >> >> On Sep 16, 2016 12:30 PM, "Raffael Sahli" > wrote: >>> yep, I can confirm that the problem is gone. >>> Downgrade back to 8.20 solved the problem. >>> >>> Anybody with the same problem? >>> >>> >>> Forwarded Message >>> Subject: Re: [rsyslog] rsyslog kills entire system => force reboot >>> Date: Mon, 12 Sep 2016 11:03:58 +0200 >>> From: Raffael Sahli >>> To: rsyslog@lists.adiscon.com >>> >>> fyi since the downgrade to 8.20 (from 8.21), we didn't notice any > problems. >>> >>> >>> On 09.09.2016 15:48, Raffael Sahli wrote: >>> On 09.09.2016 15:09, David Lang wrote: > On Fri, 9 Sep 2016, Raffael Sahli wrote: >> >> Actually I tried $ActionResumeRetryCount with a value 10, @see 2nd >> configuration. But faced the same problem. >> >> >> Strange thing is, I deployed new rsyslog configs without the remote >> forwarding, but this morning one server was unresponsive again, same >> problem. >> >> Does anybody know, can this also happen without remote >> forwarding? > > where are your local logs being written? is there any chance that it's > running out of space or otherwise falling behind (think of a slow NFS > server) > > remember that even with retries = 10 rsyslog won't stop completely, but > it will slow things down drastically so that it appears to be dead. No, just the local filesystem. And the fs and disk i/o is fine. > >> Maybe this more a general syslog problem, as far as I know the RFC, >> since syslog should never loose any messages by default. >> I just like to know what rsyslog config I should use with remote >> forwarding, but without any timeout for syslog services if syslog is >> somehow unresponsive. > > per the syslog spec it should block forever if it can't deliver the > message. Yeah thats the point, I don't get that > > But to really see what's going on, configure impstats and have it write > to a local file, that will let you see what's going on when it appears > to stalls. Mhm will try it out, or/and try downgrade to an earlier version since I did not have such problems before. >>> -- >>> Raffael Sahli >>> >>> >>> ___ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE >>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of >>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >> ___ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: >> This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > beyond >> our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. > _
Re: [rsyslog] Fwd: Re: rsyslog kills entire system => force reboot
For what it is worth, I am running rsyslog 8.21 on around 12,000 servers and have not run into any issues with it. Cheers, Brian On Wed, Sep 28, 2016 at 3:20 AM Raffael Sahli wrote: > > > On 09/27/2016 01:02 PM, Andre Lorbach wrote: > > So far it seems to be very difficult to reproduce this problem. > > Are you still able to reproduce the problem with 8.21? > As you can imagine its quite difficult for me to reproduce it as well > and at the moment I won't upgrade my production systems to a later version. > > > If yes could you send me the configuration you are using and the output > > of: ldd /sbin/rsyslogd > > > > I am interested to see against which libfastjson library rsyslog is > using, > > it should be libfastjson.so.4 > Yes it's libfastjson.so.4 > > > But I had further problems with syslog, last friday nearly every server > got a problem and again it was syslog > Im not sure if it was the same problem since it was nearly on every > system. What I found out so far is > that nscd can block the system and go up 100%CPU and this problem is > also related to syslog. > (short story i've removed nscd from all systems since its not really > required.) > > > What I really need is a configuration which does work and drop messages > even though messages can not be stored somewhere or whatever problem it is. > CALL syslog() must not block the entire system. I know its not as > specified in the RFC but > > > Cheers > Raffi > > > > > Best regards, > > Andre Lorbach > > > >> -Original Message- > >> From: rsyslog-boun...@lists.adiscon.com [mailto:rsyslog- > >> boun...@lists.adiscon.com] On Behalf Of singh.janmejay > >> Sent: Friday, September 16, 2016 10:46 AM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] Fwd: Re: rsyslog kills entire system => force > > reboot > >> How long does it take to go thru one cycle of verifying the problem > > exists? > >> I was wondering if bisecting would be viable? > >> > >> May not be required though, stats, entire config and all thread > > backtrace will > >> likely give you/us enough clues. > >> > >> On Sep 16, 2016 12:30 PM, "Raffael Sahli" > > wrote: > >>> yep, I can confirm that the problem is gone. > >>> Downgrade back to 8.20 solved the problem. > >>> > >>> Anybody with the same problem? > >>> > >>> > >>> Forwarded Message > >>> Subject: Re: [rsyslog] rsyslog kills entire system => force reboot > >>> Date: Mon, 12 Sep 2016 11:03:58 +0200 > >>> From: Raffael Sahli > >>> To: rsyslog@lists.adiscon.com > >>> > >>> fyi since the downgrade to 8.20 (from 8.21), we didn't notice any > > problems. > >>> > >>> > >>> On 09.09.2016 15:48, Raffael Sahli wrote: > >>> > On 09.09.2016 15:09, David Lang wrote: > > On Fri, 9 Sep 2016, Raffael Sahli wrote: > > >> > >> Actually I tried $ActionResumeRetryCount with a value 10, @see > 2nd >> configuration. But faced the same problem. > >> > >> > >> Strange thing is, I deployed new rsyslog configs without the > remote >> forwarding, but this morning one server was unresponsive > again, same >> problem. > >> > >> Does anybody know, can this also happen without remote > >> forwarding? > > > > where are your local logs being written? is there any chance that > it's > running out of space or otherwise falling behind (think of a > slow NFS > server) > > remember that even with retries = 10 > rsyslog won't stop completely, but > it will slow things down > drastically so that it appears to be dead. > > No, just the local filesystem. > And the fs and disk i/o is fine. > > > > > >> Maybe this more a general syslog problem, as far as I know the > RFC, >> since syslog should never loose any messages by default. > >> I just like to know what rsyslog config I should use with remote > >> forwarding, but without any timeout for syslog services if syslog > is >> somehow unresponsive. > > > > per the syslog spec it should block forever if it can't deliver > the > message. > > Yeah thats the point, I don't get that > > > > > But to really see what's going on, configure impstats and have it > write > to a local file, that will let you see what's going on when > it appears > to stalls. > > Mhm will try it out, or/and try downgrade to an earlier version since > I did not have such problems before. > > > > > > >>> -- > >>> Raffael Sahli > >>> > >>> > >>> ___ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com/professional-services/ > >>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE > >>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > >>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > >>> DON'T LIKE
Re: [rsyslog] Fwd: Re: rsyslog kills entire system => force reboot
On 09/28/2016 12:58 PM, Brian Knox wrote: > For what it is worth, I am running rsyslog 8.21 on around 12,000 servers > and have not run into any issues with it. Which distro/version and kernel? For sure my problem are related to something else but in the end syslog is the the problem which lead to the crash or an unresponsive service. > > Cheers, > Brian > > On Wed, Sep 28, 2016 at 3:20 AM Raffael Sahli > wrote: > >> >> On 09/27/2016 01:02 PM, Andre Lorbach wrote: >>> So far it seems to be very difficult to reproduce this problem. >>> Are you still able to reproduce the problem with 8.21? >> As you can imagine its quite difficult for me to reproduce it as well >> and at the moment I won't upgrade my production systems to a later version. >> >>> If yes could you send me the configuration you are using and the output >>> of: ldd /sbin/rsyslogd >>> >>> I am interested to see against which libfastjson library rsyslog is >> using, >>> it should be libfastjson.so.4 >> Yes it's libfastjson.so.4 >> >> >> But I had further problems with syslog, last friday nearly every server >> got a problem and again it was syslog >> Im not sure if it was the same problem since it was nearly on every >> system. What I found out so far is >> that nscd can block the system and go up 100%CPU and this problem is >> also related to syslog. >> (short story i've removed nscd from all systems since its not really >> required.) >> >> >> What I really need is a configuration which does work and drop messages >> even though messages can not be stored somewhere or whatever problem it is. >> CALL syslog() must not block the entire system. I know its not as >> specified in the RFC but >> >> >> Cheers >> Raffi >> >> >> >>> Best regards, >>> Andre Lorbach >>> -Original Message- From: rsyslog-boun...@lists.adiscon.com [mailto:rsyslog- boun...@lists.adiscon.com] On Behalf Of singh.janmejay Sent: Friday, September 16, 2016 10:46 AM To: rsyslog-users Subject: Re: [rsyslog] Fwd: Re: rsyslog kills entire system => force >>> reboot How long does it take to go thru one cycle of verifying the problem >>> exists? I was wondering if bisecting would be viable? May not be required though, stats, entire config and all thread >>> backtrace will likely give you/us enough clues. On Sep 16, 2016 12:30 PM, "Raffael Sahli" >>> wrote: > yep, I can confirm that the problem is gone. > Downgrade back to 8.20 solved the problem. > > Anybody with the same problem? > > > Forwarded Message > Subject: Re: [rsyslog] rsyslog kills entire system => force reboot > Date: Mon, 12 Sep 2016 11:03:58 +0200 > From: Raffael Sahli > To: rsyslog@lists.adiscon.com > > fyi since the downgrade to 8.20 (from 8.21), we didn't notice any >>> problems. > > On 09.09.2016 15:48, Raffael Sahli wrote: > >> On 09.09.2016 15:09, David Lang wrote: >> > On Fri, 9 Sep 2016, Raffael Sahli wrote: >> >> >> >> >> Actually I tried $ActionResumeRetryCount with a value 10, @see >> 2nd >> configuration. But faced the same problem. >> >> >> >> >> >> Strange thing is, I deployed new rsyslog configs without the >> remote >> forwarding, but this morning one server was unresponsive >> again, same >> problem. >> >> >> >> Does anybody know, can this also happen without remote forwarding? >> > >> > where are your local logs being written? is there any chance that >> it's > running out of space or otherwise falling behind (think of a >> slow NFS > server) > > remember that even with retries = 10 >> rsyslog won't stop completely, but > it will slow things down >> drastically so that it appears to be dead. >> >> No, just the local filesystem. >> And the fs and disk i/o is fine. >> >> >> > >> >> Maybe this more a general syslog problem, as far as I know the >> RFC, >> since syslog should never loose any messages by default. >> >> I just like to know what rsyslog config I should use with remote forwarding, but without any timeout for syslog services if syslog >> is >> somehow unresponsive. >> > >> > per the syslog spec it should block forever if it can't deliver >> the > message. >> >> Yeah thats the point, I don't get that >> >> > >> > But to really see what's going on, configure impstats and have it >> write > to a local file, that will let you see what's going on when >> it appears > to stalls. >> >> Mhm will try it out, or/and try downgrade to an earlier version since >> I did not have such problems before. >> >> >> >> >> > -- > Raffael Sahli > > > ___ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/p
Re: [rsyslog] Fwd: Re: rsyslog kills entire system => force reboot
2016-09-28 9:20 GMT+02:00 Raffael Sahli : > > > On 09/27/2016 01:02 PM, Andre Lorbach wrote: > > So far it seems to be very difficult to reproduce this problem. > > Are you still able to reproduce the problem with 8.21? > As you can imagine its quite difficult for me to reproduce it as well > and at the moment I won't upgrade my production systems to a later version. > > > If yes could you send me the configuration you are using and the output > > of: ldd /sbin/rsyslogd > > > > I am interested to see against which libfastjson library rsyslog is > using, > > it should be libfastjson.so.4 > Yes it's libfastjson.so.4 > > > But I had further problems with syslog, last friday nearly every server > got a problem and again it was syslog > Im not sure if it was the same problem since it was nearly on every > system. What I found out so far is > that nscd can block the system and go up 100%CPU and this problem is > also related to syslog. > (short story i've removed nscd from all systems since its not really > required.) > > > What I really need is a configuration which does work and drop messages > even though messages can not be stored somewhere or whatever problem it is. > CALL syslog() must not block the entire system. I know its not as > specified in the RFC but > > Well, the question is why things block when you have configured them not to block. This works in all cases we know and most importnatly it works in all test cases. As Andre said, we are currently unable to reproduce. You are also unable. In essence, this boils down to that we need to wait until either we find a way to reproduce it (we are currently trying, but I am only mildly optimistic TBH) or someone else comes along and shows how to reproduce it. Without a repro, I guess it will be impossible to address this issue. Rainer > Cheers > Raffi > > > > > Best regards, > > Andre Lorbach > > > >> -Original Message- > >> From: rsyslog-boun...@lists.adiscon.com [mailto:rsyslog- > >> boun...@lists.adiscon.com] On Behalf Of singh.janmejay > >> Sent: Friday, September 16, 2016 10:46 AM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] Fwd: Re: rsyslog kills entire system => force > > reboot > >> How long does it take to go thru one cycle of verifying the problem > > exists? > >> I was wondering if bisecting would be viable? > >> > >> May not be required though, stats, entire config and all thread > > backtrace will > >> likely give you/us enough clues. > >> > >> On Sep 16, 2016 12:30 PM, "Raffael Sahli" > > wrote: > >>> yep, I can confirm that the problem is gone. > >>> Downgrade back to 8.20 solved the problem. > >>> > >>> Anybody with the same problem? > >>> > >>> > >>> Forwarded Message > >>> Subject: Re: [rsyslog] rsyslog kills entire system => force reboot > >>> Date: Mon, 12 Sep 2016 11:03:58 +0200 > >>> From: Raffael Sahli > >>> To: rsyslog@lists.adiscon.com > >>> > >>> fyi since the downgrade to 8.20 (from 8.21), we didn't notice any > > problems. > >>> > >>> > >>> On 09.09.2016 15:48, Raffael Sahli wrote: > >>> > On 09.09.2016 15:09, David Lang wrote: > > On Fri, 9 Sep 2016, Raffael Sahli wrote: > > >> > >> Actually I tried $ActionResumeRetryCount with a value 10, @see > 2nd >> configuration. But faced the same problem. > >> > >> > >> Strange thing is, I deployed new rsyslog configs without the > remote >> forwarding, but this morning one server was unresponsive > again, same >> problem. > >> > >> Does anybody know, can this also happen without remote > >> forwarding? > > > > where are your local logs being written? is there any chance that > it's > running out of space or otherwise falling behind (think of a > slow NFS > server) > > remember that even with retries = 10 > rsyslog won't stop completely, but > it will slow things down > drastically so that it appears to be dead. > > No, just the local filesystem. > And the fs and disk i/o is fine. > > > > > >> Maybe this more a general syslog problem, as far as I know the > RFC, >> since syslog should never loose any messages by default. > >> I just like to know what rsyslog config I should use with remote > >> forwarding, but without any timeout for syslog services if syslog > is >> somehow unresponsive. > > > > per the syslog spec it should block forever if it can't deliver > the > message. > > Yeah thats the point, I don't get that > > > > > But to really see what's going on, configure impstats and have it > write > to a local file, that will let you see what's going on when > it appears > to stalls. > > Mhm will try it out, or/and try downgrade to an earlier version since > I did not have such problems before. > > > > > > >>> -- > >>> Raffael Sahli > >>> > >>> > >>> _