On Tue, 28 Jan 2014, dormando wrote:
> > > > On Mon, Jan 13, 2014 at 9:22 AM, dormando <[email protected]> wrote: > > > > > > > > > There's no forwarder from the central server, if I understand what you > > > > > were saying. Also given the strace of the select (which is selecting > > > > > for > > > > > nothing and waiting forever?) is it checking for something inbetween > > > > > the > > > > > select runs? > > > > > > > > > > > > > > OK, the configs I think look good. What exactly it is waiting for is > > > > hard > > > > to tell without context. Can you gather a debug log while it is in this > > > > state. Details are here: > > > > > > > > http://www.rsyslog.com/doc/troubleshoot.html > > > > > > > > If a full debug log is too much to do, you may want to use debug on > > > > demand: > > > > > > > > http://www.rsyslog.com/how-to-use-debug-on-demand/ > > > > > > > > If that's also impossible, at least many more strace lines PLUS > > > > information > > > > on what the thread name is (obtained via top) may also help us get > > > > closer. > > > > > > on-demand might work. It'll take a few days to iterate through this due to > > > the infrequency of the hang. Thought it was hung just now but it started > > > working again... I guess it could also be intermittent. Maybe I'll set up > > > a script to watch for log gaps and snapshot the debug stuff. > > > > > > Thanks for your fast responses! > > > > > > > Well, it took a week to fail that time :/ > > > > strace was showing a select retry that was going to take over a day, so I > > didn't wait for it to catch the lines inbetween :/ I might have to patch > > the code to set an upper limit to whatever backoff that is? Or I could've > > tried to interrupt it... > > > > im:imtcp - just one thread, sitting in 100% CPU on one core. > > > > Local logging continues to work... I see log lines show up for stuff I do > > *on* the log box. no remote log lines appear to be incoming. > > > > I took some gdb backtrace snapshots since that's the best I could do this > > time. Will hopefully get more strace and the debug output next time... if > > the thread will wake up and start applying it. > > > > First below are three summary backtraces which show some threads changing > > state, and the rest not doing much. Below that is one full all-thread > > stack trace. > > > > These are taken from the hung process. > > > > summary of backtrace: > > 4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,?? > > 1 select,runInput,thrdStarter,start_thread,clone,?? > > 1 select,mainloop,realMain,main > > 1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,?? > > 1 > > pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,?? > > 1 > > IsReady,IsReady,RunSelect,Run,runInput,thrdStarter,start_thread,clone,?? > > > > another snapshot: > > 4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,?? > > 1 > > TCPSessGetNxtSess,RunSelect,Run,runInput,thrdStarter,start_thread,clone,?? > > 1 select,runInput,thrdStarter,start_thread,clone,?? > > 1 select,mainloop,realMain,main > > 1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,?? > > 1 > > pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,?? > > > > another snapshot: > > 4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,?? > > 1 select,runInput,thrdStarter,start_thread,clone,?? > > 1 select,mainloop,realMain,main > > 1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,?? > > 1 > > pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,?? > > 1 > > pNsd=0x7f80142e8990,,Add,Add,RunSelect,Run,runInput,thrdStarter,start_thread,clone,?? > > > > > > Full stack trace: > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > > [New Thread 0x7f80137fd700 (LWP 31775)] > > [New Thread 0x7f8013bfe700 (LWP 31774)] > > [New Thread 0x7f8013fff700 (LWP 31773)] > > [New Thread 0x7f8018e81700 (LWP 31772)] > > [New Thread 0x7f8018a80700 (LWP 31771)] > > [New Thread 0x7f8019682700 (LWP 31770)] > > [New Thread 0x7f8019e83700 (LWP 31769)] > > [New Thread 0x7f801a684700 (LWP 31768)] > > 0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6 > > > > Thread 9 (Thread 0x7f801a684700 (LWP 31768)): > > #0 0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6 > > #1 0x00007f801cbf44cd in runInput (pThrd=0x2550060) at imuxsock.c:1363 > > #2 0x00000000004638b6 in thrdStarter (arg=0x2550060) at ../threads.c:212 > > #3 0x00007f801e050e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #4 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #5 0x0000000000000000 in ?? () > > > > Thread 8 (Thread 0x7f8019e83700 (LWP 31769)): > > #0 0x00007f801e057d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0 > > #1 0x00007f801c9dd001 in readklog (pModConf=0x252f940) at bsd.c:228 > > #2 0x00007f801c9dd27e in klogLogKMsg (pModConf=0x252f940) at bsd.c:286 > > #3 0x00007f801c9db926 in runInput (pThrd=0x2559cd0) at imklog.c:282 > > #4 0x00000000004638b6 in thrdStarter (arg=0x2559cd0) at ../threads.c:212 > > #5 0x00007f801e050e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #6 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #7 0x0000000000000000 in ?? () > > > > Thread 7 (Thread 0x7f8019682700 (LWP 31770)): > > #0 0x00007f801ae2a2a0 in Add (pNsdsel=0x7f80141e7d80, pNsd=0x7f80142e8990, > > waitOp=NSDSEL_RD) at nsdsel_ptcp.c:82 > > #1 0x00007f801bb94f46 in Add (pNsdsel=0x7f801417c020, pNsd=0x7f80140e4420, > > waitOp=NSDSEL_RD) at nsdsel_gtls.c:93 > > #2 0x00007f801bfa3536 in Add (pThis=0x7f80144b32a0, pStrm=0x7f80142a8300, > > waitOp=NSDSEL_RD) at nssel.c:142 > > #3 0x00007f801bd9bea5 in RunSelect (pThis=0x252b5e0, > > workset=0x7f8019681570, > > sizeWorkset=128) at tcpsrv.c:758 > > #4 0x00007f801bd9c29f in Run (pThis=0x252b5e0) at tcpsrv.c:868 > > #5 0x00007f801c1a8351 in runInput (pThrd=0x2559e90) at imtcp.c:584 > > #6 0x00000000004638b6 in thrdStarter (arg=0x2559e90) at ../threads.c:212 > > #7 0x00007f801e050e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #8 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #9 0x0000000000000000 in ?? () > > > > Thread 6 (Thread 0x7f8018a80700 (LWP 31771)): > > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #1 0x0000000000450dab in doIdleProcessing (pThis=0x25555d0, pWtp=0x255f9f0, > > pbInactivityTOOccured=0x7f8018a7fd18) at wti.c:252 > > #2 0x0000000000451082 in wtiWorker (pThis=0x25555d0) at wti.c:325 > > #3 0x0000000000450205 in wtpWorker (arg=0x25555d0) at wtp.c:388 > > #4 0x00007f801e050e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #5 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #6 0x0000000000000000 in ?? () > > > > Thread 5 (Thread 0x7f8018e81700 (LWP 31772)): > > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #1 0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0bc0) at tcpsrv.c:631 > > #2 0x00007f801e050e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #3 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #4 0x0000000000000000 in ?? () > > > > Thread 4 (Thread 0x7f8013fff700 (LWP 31773)): > > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #1 0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0c28) at tcpsrv.c:631 > > #2 0x00007f801e050e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #3 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #4 0x0000000000000000 in ?? () > > > > Thread 3 (Thread 0x7f8013bfe700 (LWP 31774)): > > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #1 0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0c90) at tcpsrv.c:631 > > #2 0x00007f801e050e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #3 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #4 0x0000000000000000 in ?? () > > > > Thread 2 (Thread 0x7f80137fd700 (LWP 31775)): > > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #1 0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0cf8) at tcpsrv.c:631 > > #2 0x00007f801e050e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #3 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #4 0x0000000000000000 in ?? () > > > > Thread 1 (Thread 0x7f801e691740 (LWP 31767)): > > #0 0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6 > > #1 0x000000000040f22c in mainloop () at syslogd.c:1286 > > #2 0x00000000004108da in realMain (argc=4, argv=0x7fffad7a59b8) at > > syslogd.c:2038 > > #3 0x00000000004109ba in main (argc=4, argv=0x7fffad7a59b8) at > > syslogd.c:2068 > > > > I didn't fatfinger the strace this time: the im:imtcp thread is still > hung. All other threads seem to be doing stuff via the strace. The "hung" > thread is spewing this: > 10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > 10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > 10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > 10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource > temporarily unavailable) > > ... about as fast as it can execute them. > > It's always the same fd (58), which is just one random tcp connection from > some random client. ... hit send a bit too early, since I am a failure... A couple backtrace snapshots from the thread in question: Thread 7 (Thread 0x7f49fb25e700 (LWP 10730)): #0 0x00007f49fdb7f522 in Add (pThis=0x7f49f4285be0, pStrm=0x7f49f4027c60, waitOp=NSDSEL_RD) at nssel.c:142 #1 0x00007f49fd977ea5 in RunSelect (pThis=0x156f5e0, workset=0x7f49fb25d570, sizeWorkset=128) at tcpsrv.c:758 #2 0x00007f49fd97829f in Run (pThis=0x156f5e0) at tcpsrv.c:868 #3 0x00007f49fdd84351 in runInput (pThrd=0x159de90) at imtcp.c:584 #4 0x00000000004638b6 in thrdStarter (arg=0x159de90) at ../threads.c:212 #5 0x00007f49ffc2ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007f49fef3c3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f49fb25e700 (LWP 10730)): #0 0x00007f49fd4e98c6 in gnutls_record_get_direction () from /usr/lib/x86_64-linux-gnu/libgnutls.so.26 #1 0x00007f49fd770ec3 in Add (pNsdsel=0x7f49f42ec1e0, pNsd=0x7f49f40e7700, waitOp=NSDSEL_RD) at nsdsel_gtls.c:83 #2 0x00007f49fdb7f536 in Add (pThis=0x7f49f4285be0, pStrm=0x7f49f40f46d0, waitOp=NSDSEL_RD) at nssel.c:142 #3 0x00007f49fd977ea5 in RunSelect (pThis=0x156f5e0, workset=0x7f49fb25d570, sizeWorkset=128) at tcpsrv.c:758 #4 0x00007f49fd97829f in Run (pThis=0x156f5e0) at tcpsrv.c:868 #5 0x00007f49fdd84351 in runInput (pThrd=0x159de90) at imtcp.c:584 #6 0x00000000004638b6 in thrdStarter (arg=0x159de90) at ../threads.c:212 #7 0x00007f49ffc2ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #8 0x00007f49fef3c3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #9 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f49fb25e700 (LWP 10730)): #0 0x00007f49fd976ca6 in TCPSessGetNxtSess (pThis=0x156f5e0, iCurr=161) at tcpsrv.c:257 #1 0x00007f49fd977ec6 in RunSelect (pThis=0x156f5e0, workset=0x7f49fb25d570, sizeWorkset=128) at tcpsrv.c:760 #2 0x00007f49fd97829f in Run (pThis=0x156f5e0) at tcpsrv.c:868 #3 0x00007f49fdd84351 in runInput (pThrd=0x159de90) at imtcp.c:584 #4 0x00000000004638b6 in thrdStarter (arg=0x159de90) at ../threads.c:212 #5 0x00007f49ffc2ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x00007f49fef3c3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f49fb25e700 (LWP 10730)): #0 gtlsHasRcvInBuffer (pThis=0x7f49f403a4b0) at nsd_gtls.h:84 #1 0x00007f49fd770e6f in Add (pNsdsel=0x7f49f42ec1e0, pNsd=0x7f49f403a4b0, waitOp=NSDSEL_RD) at nsdsel_gtls.c:75 #2 0x00007f49fdb7f536 in Add (pThis=0x7f49f4285be0, pStrm=0x7f49f403a1c0, waitOp=NSDSEL_RD) at nssel.c:142 #3 0x00007f49fd977ea5 in RunSelect (pThis=0x156f5e0, workset=0x7f49fb25d570, sizeWorkset=128) at tcpsrv.c:758 #4 0x00007f49fd97829f in Run (pThis=0x156f5e0) at tcpsrv.c:868 #5 0x00007f49fdd84351 in runInput (pThrd=0x159de90) at imtcp.c:584 #6 0x00000000004638b6 in thrdStarter (arg=0x159de90) at ../threads.c:212 #7 0x00007f49ffc2ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #8 0x00007f49fef3c3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #9 0x0000000000000000 in ?? () Thanks! Still no debug trace, but this should be pretty enlightening at this point? _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

