Re: [rsyslog] 100% cpu lockup rsyslog 8.29 on FreeBSD 11.0Rp12 amd64 - bad config?
Hey Dave, can you check whether _this_ might be the issue: http://lists.adiscon.net/pipermail/rsyslog/2017-August/044683.html I ran into the same problem (rsyslog locking up the CPU). Regards. Andreas ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp
Since this is all "only just" a test-setup, I have not yet gotten into creating an optimized configuration. I am aware, that default queue sizes are overkill, however the problems I encountered here were happening on an idle system. Though I am not familiar with the internals, I would at least be surprised that a handful of messages (created using 'logger' in the shell) would consume all the memory. The segfault was actually caused by an optimization, turning off -fguess-branch-probability fixed the segfault right away. Unfortunately I don't have valgrind available but I am able to run it in a remote debugger and what I saw didn't make much sense to me (in my experience with this platform, this is either due to a multithreading or compiler issue). The segfault always ocurred in rsyslogd.c:1749 when trying to call the function janitorRun(). Note: The mere _attempt_ to just call the function at this point would cause the segfault, not something inside janitorRun(). Though this may seem weird, it's not the first compiler issue I encountered on this platform and not the first one that causes crashes as well... So once I turned off this particular optimization (found by trial and error), rsyslog actually ran quite well, everything seemed to be working fine. Remote and local logging did work, and memory consumption wasn't too high (I forgot what it was exactly, but I had ~38 MBytes of RAM left). However, when I stopped the remote syslog server (Winsyslog), I saw rsyslog taking 100% CPU. This stopped when omrelp was able to reconnect. While rsyslog was looping, it did neither crash, nor take up all system memory (memory consumption stayed the same), which was the point at which I started the debugger again to find where it was consuming all that time. This eventually lead me to doTransaction() inside action.c where I found what I was writing in my previous mails, that it would loop around and not stop until omrelp was able to reconnect. I then fired up a Ubuntu 16.04 VM (at that time I was naively thinking it would use a current version), because I thought to myself: "No way, this must be happening on a PC too". The rsyslogd running in the VM didn't have the issue, but I quickly found out, that it was using rsyslogd 8.16. With the knowledge of where it was looping, I then started to look at the history of the rsyslog source and found the change I mentioned before. This is how far I have dug until this point. To both of you: Thanks for the help and suggestions, I really appreciate it :-) Regards, Andreas So, I can actually reproduce this on a PC. I've built everything manually and encounter the exact same issue: As long as the remote end is unavailable, rsyslog hangs in a loop consuming time of an entire CPU core. The log output and configuration is the same as I have posted before. Regards, Andreas ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp
On 08/23/2017 01:13 AM, Thomas Deutschmann via rsyslog wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 2017-08-21 13:21, Andreas Wehrmann via rsyslog wrote: actionProcessMessage() always returns RS_RET_ACTION_FAILED (which is -2123), meaning i will be decremented for re-submit and the loop starts all over again. But this is not the problem itself. It *should* retry. See action.resumeRetryCount and action.resumeInterval documentation [1]. So I ultimately traced it down to this change: https://github.com/rsyslog/rsyslog/commit/128214fffac7dcec69b5c8dffdb8222bbd29af27 Reverting this makes everything seem to work as expected, though it probably introduces the bug it was supposed to fix... This should only trigger the problem, not causing the problem. My current theory: _No_ msg can be delivered. Even internal messages nowadays [2]. This will eat up your memory until rsyslogd segfaults: For example, when rsyslogd launches it will already create a message like rsyslogd: [origin software="rsyslogd" swVersion="8.29.0" x-pid="31213" x-info="http://www.rsyslog.com;] start Now this message has to go through your queue. Once it reaches your omrelp action which is failing, rsyslogd will generate a rsyslogd: action 'action 1' suspended, next retry is message. This message will also go through your queue and be processed by the still failing omrelp action (which will be skipped like you can see in your logs: next retry (if applicable): 966135869 [now 966135839]). Due to the small amount of memory you just need a small bunch of message to run out of memory. And don't forget about the main queue [3]. Do you get some core dumps when rsyslogd segfaults? Maybe you have to enable this first on your system. Or could you rsyslogd through gdb/valgrind? What's about your dmesg? Any messages indicating OOM killer activity? See also: = [1] http://www.rsyslog.com/doc/v8-stable/configuration/actions.html [2] http://www.rsyslog.com/rsyslog-error-reporting-improved/ [3] http://www.rsyslog.com/doc/v8-stable/concepts/queues.html Since this is all "only just" a test-setup, I have not yet gotten into creating an optimized configuration. I am aware, that default queue sizes are overkill, however the problems I encountered here were happening on an idle system. Though I am not familiar with the internals, I would at least be surprised that a handful of messages (created using 'logger' in the shell) would consume all the memory. The segfault was actually caused by an optimization, turning off -fguess-branch-probability fixed the segfault right away. Unfortunately I don't have valgrind available but I am able to run it in a remote debugger and what I saw didn't make much sense to me (in my experience with this platform, this is either due to a multithreading or compiler issue). The segfault always ocurred in rsyslogd.c:1749 when trying to call the function janitorRun(). Note: The mere _attempt_ to just call the function at this point would cause the segfault, not something inside janitorRun(). Though this may seem weird, it's not the first compiler issue I encountered on this platform and not the first one that causes crashes as well... So once I turned off this particular optimization (found by trial and error), rsyslog actually ran quite well, everything seemed to be working fine. Remote and local logging did work, and memory consumption wasn't too high (I forgot what it was exactly, but I had ~38 MBytes of RAM left). However, when I stopped the remote syslog server (Winsyslog), I saw rsyslog taking 100% CPU. This stopped when omrelp was able to reconnect. While rsyslog was looping, it did neither crash, nor take up all system memory (memory consumption stayed the same), which was the point at which I started the debugger again to find where it was consuming all that time. This eventually lead me to doTransaction() inside action.c where I found what I was writing in my previous mails, that it would loop around and not stop until omrelp was able to reconnect. I then fired up a Ubuntu 16.04 VM (at that time I was naively thinking it would use a current version), because I thought to myself: "No way, this must be happening on a PC too". The rsyslogd running in the VM didn't have the issue, but I quickly found out, that it was using rsyslogd 8.16. With the knowledge of where it was looping, I then started to look at the history of the rsyslog source and found the change I mentioned before. This is how far I have dug until this point. To both of you: Thanks for the help and suggestions, I really appreciate it :-) Regards, Andreas ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing li
Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp
On 08/21/2017 12:03 PM, Andreas Wehrmann via rsyslog wrote: So, I was able to make some progress in the meantime: I turned off compiler optimizations and rsyslog doesn't crash anymore though I have yet to figure out, what optimization breaks the program. Nevertheless, an issue remains: When rsyslog cannot connect to a remote host (because the port is closed), (one of) the worker threads goes into an endless loop clogging the CPU, the following lines repeat over and over: 5839.355962950:main Q:Reg/w0 : omrelp.c: omrelp: shutdownImmediate ptr now is 0x800a11e8 5839.356130840:main Q:Reg/w0 : ../action.c: doTransaction: action 0, processing msg -1, result -2123 5839.356323210:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] I traced it down to doTransaction() in action.c. for(i = 0 ; i < wrkrInfo->p.tx.currIParam ; ++i) { /* Note: we provide the message's base iparam - actionProcessMessage() * uses this as *base* address. */ iRet = actionProcessMessage(pThis, (wrkrInfo->p.tx.iparams, pThis->iNumTpls, i, 0), pWti); if(iRet != RS_RET_DEFER_COMMIT && iRet != RS_RET_PREVIOUS_COMMITTED && iRet != RS_RET_OK) --i; /* we need to re-submit */ DBGPRINTF("doTransaction: action %d, processing msg %d, result %d\n", pThis->iActionNbr, i,iRet); } actionProcessMessage() always returns RS_RET_ACTION_FAILED (which is -2123), meaning i will be decremented for re-submit and the loop starts all over again. Best regards, Andreas So I ultimately traced it down to this change: https://github.com/rsyslog/rsyslog/commit/128214fffac7dcec69b5c8dffdb8222bbd29af27 Reverting this makes everything seem to work as expected, though it probably introduces the bug it was supposed to fix... Regards, Andreas ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp
5839.353174748:main Q:Reg/w0 : iminternal.c: signaling new internal message via SIGTTOU: 'action 'action 0' suspended, next retry is Sun Aug 13 03:04:29 2000 [v8.29.0 try http://www.rsyslog.com/e/2007 ]' I am not sure but I have the feelings you are just running badly out of memory which will cause the segmentation fault. True, I noticed the message reschedule, but the console is filled with these messages: 5839.354769610:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] 5839.354994710:main Q:Reg/w0 : ../action.c: doTransaction: action 0, currIParam 1 5839.355200970:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] 5839.355398190:main Q:Reg/w0 : omrelp.c: omrelp: shutdownImmediate ptr now is 0x800a11e8 5839.355571450:main Q:Reg/w0 : ../action.c: doTransaction: action 0, processing msg -1, result -2123 5839.355778480:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] 5839.355962950:main Q:Reg/w0 : omrelp.c: omrelp: shutdownImmediate ptr now is 0x800a11e8 5839.356130840:main Q:Reg/w0 : ../action.c: doTransaction: action 0, processing msg -1, result -2123 5839.356323210:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] So, I was able to make some progress in the meantime: I turned off compiler optimizations and rsyslog doesn't crash anymore though I have yet to figure out, what optimization breaks the program. Nevertheless, an issue remains: When rsyslog cannot connect to a remote host (because the port is closed), (one of) the worker threads goes into an endless loop clogging the CPU, the following lines repeat over and over: 5839.355962950:main Q:Reg/w0 : omrelp.c: omrelp: shutdownImmediate ptr now is 0x800a11e8 5839.356130840:main Q:Reg/w0 : ../action.c: doTransaction: action 0, processing msg -1, result -2123 5839.356323210:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] I traced it down to doTransaction() in action.c. for(i = 0 ; i < wrkrInfo->p.tx.currIParam ; ++i) { /* Note: we provide the message's base iparam - actionProcessMessage() * uses this as *base* address. */ iRet = actionProcessMessage(pThis, (wrkrInfo->p.tx.iparams, pThis->iNumTpls, i, 0), pWti); if(iRet != RS_RET_DEFER_COMMIT && iRet != RS_RET_PREVIOUS_COMMITTED && iRet != RS_RET_OK) --i; /* we need to re-submit */ DBGPRINTF("doTransaction: action %d, processing msg %d, result %d\n", pThis->iActionNbr, i,iRet); } actionProcessMessage() always returns RS_RET_ACTION_FAILED (which is -2123), meaning i will be decremented for re-submit and the loop starts all over again. Best regards, Andreas ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp
On 08/18/2017 03:22 PM, Thomas Deutschmann via rsyslog wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 2017-08-18 13:05, Andreas Wehrmann via rsyslog wrote: Compiling 8.29.0 with the option "--enable-imptcp" breaks the compilation process: CC imptcp_la-imptcp.lo imptcp.c: In function 'AcceptConnReq': imptcp.c:831:2: error: 'pThis' undeclared (first use in this function) imptcp.c:831:2: note: each undeclared identifier is reported only once for each function it appears in make[2]: *** [imptcp_la-imptcp.lo] Error 1 Can you please show us a complete build.log and give us some details about the embedded system? The device is a custom made board based on a Coldfire V4e with 64 MB RAM and flash. The toolchain comes from Codesourcery and uses gcc 4.5.2: m68k-linux-gnu-gcc (Sourcery CodeBench Lite 2011.03-98) 4.5.2. It runs a pretty old version of Linux 2.6.29. The attached build log contains a build without the imptcp module enabled (my main priority right now is RELP/TCP output). I'd really like to fix the crash issue first, afterwards I might dig deeper into why imptcp is failing here. However, instead of backing off for some time, it retries the connection _immediately_, which fails almost immediately again which in turn leads to an immediate retry and so it spins around very quickly. No, I think your observations are wrong. Each action which is failing will be suspended and you can see in your logs that's the case for your setup as well: 5839.353174748:main Q:Reg/w0 : iminternal.c: signaling new internal message via SIGTTOU: 'action 'action 0' suspended, next retry is Sun Aug 13 03:04:29 2000 [v8.29.0 try http://www.rsyslog.com/e/2007 ]' I am not sure but I have the feelings you are just running badly out of memory which will cause the segmentation fault. True, I noticed the message reschedule, but the console is filled with these messages: 5839.354769610:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] 5839.354994710:main Q:Reg/w0 : ../action.c: doTransaction: action 0, currIParam 1 5839.355200970:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] 5839.355398190:main Q:Reg/w0 : omrelp.c: omrelp: shutdownImmediate ptr now is 0x800a11e8 5839.355571450:main Q:Reg/w0 : ../action.c: doTransaction: action 0, processing msg -1, result -2123 5839.355778480:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] 5839.355962950:main Q:Reg/w0 : omrelp.c: omrelp: shutdownImmediate ptr now is 0x800a11e8 5839.356130840:main Q:Reg/w0 : ../action.c: doTransaction: action 0, processing msg -1, result -2123 5839.356323210:main Q:Reg/w0 : ../action.c: actionTryResume: action 0x800a4180 state: susp, next retry (if applicable): 966135869 [now 966135839] What is happening here and why? I don't think memory is an issue because when I disable omrelp (and only log locally) everything seems to work fine. When rsyslogd starts up it consumes less than 2 MB of memory. I've got the stupid feeling this might be a platform related issue... Processing: rsyslog = Build path taken because: directory build, build key set, spec file newer than rpm, rpmbuild --dbpath /home/work/rootfs//var/lib/rpm --target m68k --define '_unpackaged_files_terminate_build 0' --define '_target_cpu m68k' --define '__strip strip' --define '_topdir /home/work/rpm' --define '_prefix /usr' --define '_tmppath /home/work/tmp' --define '_rpmdir /home/work/rpm/RPMS' --define '_mandir /usr/share/man' --define '_sysconfdir /etc' --define '_localstatedir /var' -bc --short-circuit /home/work/dist/lfs-5.1/rsyslog/rsyslog.spec Building target platforms: m68k Building for target m68k Executing(%build): /bin/sh -e /home/work/tmp/rpm-tmp.85611 + umask 022 + cd /home/work/rpm/BUILD + cd rsyslog-8.29.0 + ./configure --prefix=/usr --host=m68k-linux-gnu --build=x86_64-unknown-linux-gnu --enable-shared --disable-static --enable-klog --enable-inet --enable-debug-symbols --enable-relp --enable-imfile --disable-imptcp --disable-libgcrypt --disable-uuid checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for m68k-linux-gnu-strip... m68k-linux-gnu-strip checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... no checking for mawk... mawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking whether make supports nested variables... (cached) yes checking for style of include used by make... GNU checking for m68k-linux-gnu-gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compil