> > > On Mon, Jan 13, 2014 at 9:22 AM, dormando <[email protected]> wrote: > > > > > > > There's no forwarder from the central server, if I understand what you > > > > were saying. Also given the strace of the select (which is selecting for > > > > nothing and waiting forever?) is it checking for something inbetween the > > > > select runs? > > > > > > > > > > > OK, the configs I think look good. What exactly it is waiting for is hard > > > to tell without context. Can you gather a debug log while it is in this > > > state. Details are here: > > > > > > http://www.rsyslog.com/doc/troubleshoot.html > > > > > > If a full debug log is too much to do, you may want to use debug on > > > demand: > > > > > > http://www.rsyslog.com/how-to-use-debug-on-demand/ > > > > > > If that's also impossible, at least many more strace lines PLUS > > > information > > > on what the thread name is (obtained via top) may also help us get closer. > > > > on-demand might work. It'll take a few days to iterate through this due to > > the infrequency of the hang. Thought it was hung just now but it started > > working again... I guess it could also be intermittent. Maybe I'll set up > > a script to watch for log gaps and snapshot the debug stuff. > > > > Thanks for your fast responses! > > > > Well, it took a week to fail that time :/ > > strace was showing a select retry that was going to take over a day, so I > didn't wait for it to catch the lines inbetween :/ I might have to patch > the code to set an upper limit to whatever backoff that is? Or I could've > tried to interrupt it... > > im:imtcp - just one thread, sitting in 100% CPU on one core. > > Local logging continues to work... I see log lines show up for stuff I do > *on* the log box. no remote log lines appear to be incoming. > > I took some gdb backtrace snapshots since that's the best I could do this > time. Will hopefully get more strace and the debug output next time... if > the thread will wake up and start applying it. > > First below are three summary backtraces which show some threads changing > state, and the rest not doing much. Below that is one full all-thread > stack trace. > > These are taken from the hung process. > > summary of backtrace: > 4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,?? > 1 select,runInput,thrdStarter,start_thread,clone,?? > 1 select,mainloop,realMain,main > 1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,?? > 1 > pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,?? > 1 > IsReady,IsReady,RunSelect,Run,runInput,thrdStarter,start_thread,clone,?? > > another snapshot: > 4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,?? > 1 > TCPSessGetNxtSess,RunSelect,Run,runInput,thrdStarter,start_thread,clone,?? > 1 select,runInput,thrdStarter,start_thread,clone,?? > 1 select,mainloop,realMain,main > 1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,?? > 1 > pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,?? > > another snapshot: > 4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,?? > 1 select,runInput,thrdStarter,start_thread,clone,?? > 1 select,mainloop,realMain,main > 1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,?? > 1 > pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,?? > 1 > pNsd=0x7f80142e8990,,Add,Add,RunSelect,Run,runInput,thrdStarter,start_thread,clone,?? > > > Full stack trace: > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > [New Thread 0x7f80137fd700 (LWP 31775)] > [New Thread 0x7f8013bfe700 (LWP 31774)] > [New Thread 0x7f8013fff700 (LWP 31773)] > [New Thread 0x7f8018e81700 (LWP 31772)] > [New Thread 0x7f8018a80700 (LWP 31771)] > [New Thread 0x7f8019682700 (LWP 31770)] > [New Thread 0x7f8019e83700 (LWP 31769)] > [New Thread 0x7f801a684700 (LWP 31768)] > 0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6 > > Thread 9 (Thread 0x7f801a684700 (LWP 31768)): > #0 0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007f801cbf44cd in runInput (pThrd=0x2550060) at imuxsock.c:1363 > #2 0x00000000004638b6 in thrdStarter (arg=0x2550060) at ../threads.c:212 > #3 0x00007f801e050e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #4 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #5 0x0000000000000000 in ?? () > > Thread 8 (Thread 0x7f8019e83700 (LWP 31769)): > #0 0x00007f801e057d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007f801c9dd001 in readklog (pModConf=0x252f940) at bsd.c:228 > #2 0x00007f801c9dd27e in klogLogKMsg (pModConf=0x252f940) at bsd.c:286 > #3 0x00007f801c9db926 in runInput (pThrd=0x2559cd0) at imklog.c:282 > #4 0x00000000004638b6 in thrdStarter (arg=0x2559cd0) at ../threads.c:212 > #5 0x00007f801e050e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #6 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #7 0x0000000000000000 in ?? () > > Thread 7 (Thread 0x7f8019682700 (LWP 31770)): > #0 0x00007f801ae2a2a0 in Add (pNsdsel=0x7f80141e7d80, pNsd=0x7f80142e8990, > waitOp=NSDSEL_RD) at nsdsel_ptcp.c:82 > #1 0x00007f801bb94f46 in Add (pNsdsel=0x7f801417c020, pNsd=0x7f80140e4420, > waitOp=NSDSEL_RD) at nsdsel_gtls.c:93 > #2 0x00007f801bfa3536 in Add (pThis=0x7f80144b32a0, pStrm=0x7f80142a8300, > waitOp=NSDSEL_RD) at nssel.c:142 > #3 0x00007f801bd9bea5 in RunSelect (pThis=0x252b5e0, workset=0x7f8019681570, > sizeWorkset=128) at tcpsrv.c:758 > #4 0x00007f801bd9c29f in Run (pThis=0x252b5e0) at tcpsrv.c:868 > #5 0x00007f801c1a8351 in runInput (pThrd=0x2559e90) at imtcp.c:584 > #6 0x00000000004638b6 in thrdStarter (arg=0x2559e90) at ../threads.c:212 > #7 0x00007f801e050e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #8 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #9 0x0000000000000000 in ?? () > > Thread 6 (Thread 0x7f8018a80700 (LWP 31771)): > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x0000000000450dab in doIdleProcessing (pThis=0x25555d0, pWtp=0x255f9f0, > pbInactivityTOOccured=0x7f8018a7fd18) at wti.c:252 > #2 0x0000000000451082 in wtiWorker (pThis=0x25555d0) at wti.c:325 > #3 0x0000000000450205 in wtpWorker (arg=0x25555d0) at wtp.c:388 > #4 0x00007f801e050e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #5 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #6 0x0000000000000000 in ?? () > > Thread 5 (Thread 0x7f8018e81700 (LWP 31772)): > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0bc0) at tcpsrv.c:631 > #2 0x00007f801e050e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #3 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x0000000000000000 in ?? () > > Thread 4 (Thread 0x7f8013fff700 (LWP 31773)): > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0c28) at tcpsrv.c:631 > #2 0x00007f801e050e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #3 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x0000000000000000 in ?? () > > Thread 3 (Thread 0x7f8013bfe700 (LWP 31774)): > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0c90) at tcpsrv.c:631 > #2 0x00007f801e050e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #3 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x0000000000000000 in ?? () > > Thread 2 (Thread 0x7f80137fd700 (LWP 31775)): > #0 0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0cf8) at tcpsrv.c:631 > #2 0x00007f801e050e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #3 0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #4 0x0000000000000000 in ?? () > > Thread 1 (Thread 0x7f801e691740 (LWP 31767)): > #0 0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x000000000040f22c in mainloop () at syslogd.c:1286 > #2 0x00000000004108da in realMain (argc=4, argv=0x7fffad7a59b8) at > syslogd.c:2038 > #3 0x00000000004109ba in main (argc=4, argv=0x7fffad7a59b8) at syslogd.c:2068 >
I didn't fatfinger the strace this time: the im:imtcp thread is still hung. All other threads seem to be doing stuff via the strace. The "hung" thread is spewing this: 10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) ... about as fast as it can execute them. It's always the same fd (58), which is just one random tcp connection from some random client. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

