Re: [rsyslog] 100% cpu lockup rsyslog 8.29 on FreeBSD 11.0Rp12 amd64 - bad config?

2017-09-25 Thread Andreas Wehrmann via rsyslog

Hey Dave,

can you check whether _this_ might be the issue:
http://lists.adiscon.net/pipermail/rsyslog/2017-August/044683.html

I ran into the same problem (rsyslog locking up the CPU).

Regards.
Andreas

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp

2017-08-29 Thread Andreas Wehrmann via rsyslog


Since this is all "only just" a test-setup, I have not yet gotten into 
creating an optimized configuration.
I am aware, that default queue sizes are overkill, however the 
problems I encountered here were happening on an idle system.
Though I am not familiar with the internals, I would at least be 
surprised that a handful of messages

(created using 'logger' in the shell) would consume all the memory.

The segfault was actually caused by an optimization, turning off 
-fguess-branch-probability fixed the segfault right away.
Unfortunately I don't have valgrind available but I am able to run it 
in a remote debugger and what I saw didn't make much sense to me
(in my experience with this platform, this is either due to a 
multithreading or compiler issue).


The segfault always ocurred in rsyslogd.c:1749 when trying to call the 
function janitorRun().
Note: The mere _attempt_ to just call the function at this point would 
cause the segfault, not something inside janitorRun().


Though this may seem weird, it's not the first compiler issue I 
encountered on this platform

and not the first one that causes crashes as well...

So once I turned off this particular optimization (found by trial and 
error), rsyslog actually ran quite well, everything seemed to be 
working fine.
Remote and local logging did work, and memory consumption wasn't too 
high (I forgot what it was exactly, but I had ~38 MBytes of RAM left).


However, when I stopped the remote syslog server (Winsyslog), I saw 
rsyslog taking 100% CPU.

This stopped when omrelp was able to reconnect.

While rsyslog was looping, it did neither crash, nor take up all 
system memory (memory consumption stayed the same),
which was the point at which I started the debugger again to find 
where it was consuming all that time.


This eventually lead me to doTransaction() inside action.c where I 
found what I was writing in my previous mails,
that it would loop around and not stop until omrelp was able to 
reconnect.


I then fired up a Ubuntu 16.04 VM (at that time I was naively thinking 
it would use a current version),
because I thought to myself: "No way, this must be happening on a PC 
too".


The rsyslogd running in the VM didn't have the issue, but I quickly 
found out, that it was using rsyslogd 8.16.
With the knowledge of where it was looping, I then started to look at 
the history of the rsyslog source

and found the change I mentioned before.

This is how far I have dug until this point.

To both of you: Thanks for the help and suggestions, I really 
appreciate it :-)


Regards,
Andreas


So, I can actually reproduce this on a PC.
I've built everything manually and encounter the exact same issue:

As long as the remote end is unavailable, rsyslog hangs in a loop 
consuming time of an entire CPU core.

The log output and configuration is the same as I have posted before.


Regards,
Andreas

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp

2017-08-23 Thread Andreas Wehrmann via rsyslog

On 08/23/2017 01:13 AM, Thomas Deutschmann via rsyslog wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 2017-08-21 13:21, Andreas Wehrmann via rsyslog wrote:

actionProcessMessage() always returns RS_RET_ACTION_FAILED (which is
-2123),
meaning i will be decremented for re-submit and the loop starts all
over again.

But this is not the problem itself. It *should* retry. See
action.resumeRetryCount and action.resumeInterval
documentation [1].



So I ultimately traced it down to this change:
https://github.com/rsyslog/rsyslog/commit/128214fffac7dcec69b5c8dffdb8222bbd29af27


Reverting this makes everything seem to work as expected,
though it probably introduces the bug it was supposed to fix...

This should only trigger the problem, not causing the problem.


My current theory: _No_ msg can be delivered. Even internal messages
nowadays [2]. This will eat up your memory until rsyslogd segfaults:

For example, when rsyslogd launches it will already create a message
like


rsyslogd:  [origin software="rsyslogd" swVersion="8.29.0" x-pid="31213" 
x-info="http://www.rsyslog.com;] start

Now this message has to go through your queue. Once it reaches
your omrelp action which is failing, rsyslogd will generate a


rsyslogd: action 'action 1' suspended, next retry is 

message. This message will also go through your queue and be
processed by the still failing omrelp action (which will be skipped
like you can see in your logs: next retry (if applicable):
966135869 [now 966135839]).

Due to the small amount of memory you just need a small bunch of
message to run out of memory.

And don't forget about the main queue [3].


Do you get some core dumps when rsyslogd segfaults? Maybe you have to
enable this first on your system.

Or could you rsyslogd through gdb/valgrind?

What's about your dmesg? Any messages indicating OOM killer activity?


See also:
=
[1] http://www.rsyslog.com/doc/v8-stable/configuration/actions.html

[2] http://www.rsyslog.com/rsyslog-error-reporting-improved/

[3] http://www.rsyslog.com/doc/v8-stable/concepts/queues.html



Since this is all "only just" a test-setup, I have not yet gotten into 
creating an optimized configuration.
I am aware, that default queue sizes are overkill, however the problems 
I encountered here were happening on an idle system.
Though I am not familiar with the internals, I would at least be 
surprised that a handful of messages

(created using 'logger' in the shell) would consume all the memory.

The segfault was actually caused by an optimization, turning off 
-fguess-branch-probability fixed the segfault right away.
Unfortunately I don't have valgrind available but I am able to run it in 
a remote debugger and what I saw didn't make much sense to me
(in my experience with this platform, this is either due to a 
multithreading or compiler issue).


The segfault always ocurred in rsyslogd.c:1749 when trying to call the 
function janitorRun().
Note: The mere _attempt_ to just call the function at this point would 
cause the segfault, not something inside janitorRun().


Though this may seem weird, it's not the first compiler issue I 
encountered on this platform

and not the first one that causes crashes as well...

So once I turned off this particular optimization (found by trial and 
error), rsyslog actually ran quite well, everything seemed to be working 
fine.
Remote and local logging did work, and memory consumption wasn't too 
high (I forgot what it was exactly, but I had ~38 MBytes of RAM left).


However, when I stopped the remote syslog server (Winsyslog), I saw 
rsyslog taking 100% CPU.

This stopped when omrelp was able to reconnect.

While rsyslog was looping, it did neither crash, nor take up all system 
memory (memory consumption stayed the same),
which was the point at which I started the debugger again to find where 
it was consuming all that time.


This eventually lead me to doTransaction() inside action.c where I found 
what I was writing in my previous mails,

that it would loop around and not stop until omrelp was able to reconnect.

I then fired up a Ubuntu 16.04 VM (at that time I was naively thinking 
it would use a current version),

because I thought to myself: "No way, this must be happening on a PC too".

The rsyslogd running in the VM didn't have the issue, but I quickly 
found out, that it was using rsyslogd 8.16.
With the knowledge of where it was looping, I then started to look at 
the history of the rsyslog source

and found the change I mentioned before.

This is how far I have dug until this point.

To both of you: Thanks for the help and suggestions, I really appreciate 
it :-)


Regards,
Andreas

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing li

Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp

2017-08-21 Thread Andreas Wehrmann via rsyslog

On 08/21/2017 12:03 PM, Andreas Wehrmann via rsyslog wrote:

So, I was able to make some progress in the meantime:
I turned off compiler optimizations and rsyslog doesn't crash anymore
though I have yet to figure out, what optimization breaks the program.

Nevertheless, an issue remains:
When rsyslog cannot connect to a remote host (because the port is 
closed),
(one of) the worker threads goes into an endless loop clogging the 
CPU, the following lines repeat over and over:


5839.355962950:main Q:Reg/w0  : omrelp.c: omrelp: shutdownImmediate 
ptr now is 0x800a11e8
5839.356130840:main Q:Reg/w0  : ../action.c: doTransaction: action 0, 
processing msg -1, result -2123
5839.356323210:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]


I traced it down to doTransaction() in action.c.

for(i = 0 ; i < wrkrInfo->p.tx.currIParam ; ++i) {
/* Note: we provide the message's base iparam - 
actionProcessMessage()

* uses this as *base* address.
*/
iRet = actionProcessMessage(pThis,
(wrkrInfo->p.tx.iparams, pThis->iNumTpls, i, 0), 
pWti);
if(iRet != RS_RET_DEFER_COMMIT && iRet != 
RS_RET_PREVIOUS_COMMITTED &&

iRet != RS_RET_OK)
--i; /* we need to re-submit */
DBGPRINTF("doTransaction: action %d, processing msg %d, result %d\n",
pThis->iActionNbr, i,iRet);
}


actionProcessMessage() always returns RS_RET_ACTION_FAILED (which is 
-2123),
meaning i will be decremented for re-submit and the loop starts all 
over again.



Best regards,
Andreas


So I ultimately traced it down to this change:
https://github.com/rsyslog/rsyslog/commit/128214fffac7dcec69b5c8dffdb8222bbd29af27

Reverting this makes everything seem to work as expected,
though it probably introduces the bug it was supposed to fix...

Regards,
Andreas

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp

2017-08-21 Thread Andreas Wehrmann via rsyslog


5839.353174748:main Q:Reg/w0  : iminternal.c: signaling new internal 
message via SIGTTOU: 'action 'action 0' suspended, next retry is Sun 
Aug 13 03:04:29 2000 [v8.29.0 try http://www.rsyslog.com/e/2007 ]'

I am not sure but I have the feelings you are just running badly out
of memory which will cause the segmentation fault.



True, I noticed the message reschedule, but the console is filled with 
these messages:


5839.354769610:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]
5839.354994710:main Q:Reg/w0  : ../action.c: doTransaction: action 0, 
currIParam 1
5839.355200970:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]
5839.355398190:main Q:Reg/w0  : omrelp.c: omrelp: shutdownImmediate 
ptr now is 0x800a11e8
5839.355571450:main Q:Reg/w0  : ../action.c: doTransaction: action 0, 
processing msg -1, result -2123
5839.355778480:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]
5839.355962950:main Q:Reg/w0  : omrelp.c: omrelp: shutdownImmediate 
ptr now is 0x800a11e8
5839.356130840:main Q:Reg/w0  : ../action.c: doTransaction: action 0, 
processing msg -1, result -2123
5839.356323210:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]




So, I was able to make some progress in the meantime:
I turned off compiler optimizations and rsyslog doesn't crash anymore
though I have yet to figure out, what optimization breaks the program.

Nevertheless, an issue remains:
When rsyslog cannot connect to a remote host (because the port is closed),
(one of) the worker threads goes into an endless loop clogging the CPU, 
the following lines repeat over and over:


5839.355962950:main Q:Reg/w0  : omrelp.c: omrelp: shutdownImmediate ptr 
now is 0x800a11e8
5839.356130840:main Q:Reg/w0  : ../action.c: doTransaction: action 0, 
processing msg -1, result -2123
5839.356323210:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]


I traced it down to doTransaction() in action.c.

for(i = 0 ; i < wrkrInfo->p.tx.currIParam ; ++i) {
/* Note: we provide the message's base iparam - actionProcessMessage()
* uses this as *base* address.
*/
iRet = actionProcessMessage(pThis,
(wrkrInfo->p.tx.iparams, pThis->iNumTpls, i, 0), 
pWti);

if(iRet != RS_RET_DEFER_COMMIT && iRet != RS_RET_PREVIOUS_COMMITTED &&
iRet != RS_RET_OK)
--i; /* we need to re-submit */
DBGPRINTF("doTransaction: action %d, processing msg %d, result %d\n",
pThis->iActionNbr, i,iRet);
}


actionProcessMessage() always returns RS_RET_ACTION_FAILED (which is -2123),
meaning i will be decremented for re-submit and the loop starts all over 
again.



Best regards,
Andreas

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] rsyslogd 8.29.0 - crash when omrelp server port is not open + build error in imptcp

2017-08-20 Thread Andreas Wehrmann via rsyslog

On 08/18/2017 03:22 PM, Thomas Deutschmann via rsyslog wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 2017-08-18 13:05, Andreas Wehrmann via rsyslog wrote:

Compiling 8.29.0 with the option "--enable-imptcp" breaks the
compilation process:

CC   imptcp_la-imptcp.lo imptcp.c: In function
'AcceptConnReq': imptcp.c:831:2: error: 'pThis' undeclared (first
use in this function) imptcp.c:831:2: note: each undeclared
identifier is reported only once for each function it appears in
make[2]: *** [imptcp_la-imptcp.lo] Error 1

Can you please show us a complete build.log and give us some details
about the embedded system?

The device is a custom made board based on a Coldfire V4e with 64 MB RAM 
and flash.

The toolchain comes from Codesourcery and uses gcc 4.5.2:
m68k-linux-gnu-gcc (Sourcery CodeBench Lite 2011.03-98) 4.5.2.
It runs a pretty old version of Linux 2.6.29.

The attached build log contains a build without the imptcp module 
enabled (my main priority right now is RELP/TCP output).
I'd really like to fix the crash issue first, afterwards I might dig 
deeper into why imptcp is failing here.



However, instead of backing off for some time, it retries the
connection _immediately_, which fails almost immediately again
which in turn leads to an immediate retry and so it spins around
very quickly.

No, I think your observations are wrong. Each action which is failing
will be suspended and you can see in your logs that's the case for
your setup as well:


5839.353174748:main Q:Reg/w0  : iminternal.c: signaling new internal message 
via SIGTTOU: 'action 'action 0' suspended, next retry is Sun Aug 13 03:04:29 
2000 [v8.29.0 try http://www.rsyslog.com/e/2007 ]'

I am not sure but I have the feelings you are just running badly out
of memory which will cause the segmentation fault.



True, I noticed the message reschedule, but the console is filled with 
these messages:


5839.354769610:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]
5839.354994710:main Q:Reg/w0  : ../action.c: doTransaction: action 0, 
currIParam 1
5839.355200970:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]
5839.355398190:main Q:Reg/w0  : omrelp.c: omrelp: shutdownImmediate ptr 
now is 0x800a11e8
5839.355571450:main Q:Reg/w0  : ../action.c: doTransaction: action 0, 
processing msg -1, result -2123
5839.355778480:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]
5839.355962950:main Q:Reg/w0  : omrelp.c: omrelp: shutdownImmediate ptr 
now is 0x800a11e8
5839.356130840:main Q:Reg/w0  : ../action.c: doTransaction: action 0, 
processing msg -1, result -2123
5839.356323210:main Q:Reg/w0  : ../action.c: actionTryResume: action 
0x800a4180 state: susp, next retry (if applicable): 966135869 [now 
966135839]


What is happening here and why?
I don't think memory is an issue because when I disable omrelp (and only 
log locally) everything seems to work fine.

When rsyslogd starts up it consumes less than 2 MB of memory.

I've got the stupid feeling this might be a platform related issue...

Processing: rsyslog
=
Build path taken because: directory build, build key set, spec file newer than 
rpm,

rpmbuild --dbpath /home/work/rootfs//var/lib/rpm --target m68k --define 
'_unpackaged_files_terminate_build 0' --define '_target_cpu m68k' --define 
'__strip strip' --define '_topdir /home/work/rpm' --define '_prefix /usr' 
--define '_tmppath /home/work/tmp' --define '_rpmdir /home/work/rpm/RPMS'  
--define '_mandir /usr/share/man' --define '_sysconfdir /etc' --define 
'_localstatedir /var' -bc --short-circuit  
/home/work/dist/lfs-5.1/rsyslog/rsyslog.spec
Building target platforms: m68k
Building for target m68k
Executing(%build): /bin/sh -e /home/work/tmp/rpm-tmp.85611
+ umask 022
+ cd /home/work/rpm/BUILD
+ cd rsyslog-8.29.0
+ ./configure --prefix=/usr --host=m68k-linux-gnu 
--build=x86_64-unknown-linux-gnu --enable-shared --disable-static --enable-klog 
--enable-inet --enable-debug-symbols --enable-relp --enable-imfile 
--disable-imptcp --disable-libgcrypt --disable-uuid
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for m68k-linux-gnu-strip... m68k-linux-gnu-strip
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking for style of include used by make... GNU
checking for m68k-linux-gnu-gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compil