> > > On Mon, Jan 13, 2014 at 9:22 AM, dormando <[email protected]> wrote:
> > >
> > > > There's no forwarder from the central server, if I understand what you
> > > > were saying. Also given the strace of the select (which is selecting for
> > > > nothing and waiting forever?) is it checking for something inbetween the
> > > > select runs?
> > > >
> > > >
> > > OK, the configs I think look good. What exactly it is waiting for is hard
> > > to tell without context. Can you gather a debug log while it is in this
> > > state. Details are here:
> > >
> > > http://www.rsyslog.com/doc/troubleshoot.html
> > >
> > > If a full debug log is too much to do, you may want to use debug on 
> > > demand:
> > >
> > > http://www.rsyslog.com/how-to-use-debug-on-demand/
> > >
> > > If that's also impossible, at least many more strace lines PLUS 
> > > information
> > > on what the thread name is (obtained via top) may also help us get closer.
> >
> > on-demand might work. It'll take a few days to iterate through this due to
> > the infrequency of the hang. Thought it was hung just now but it started
> > working again... I guess it could also be intermittent. Maybe I'll set up
> > a script to watch for log gaps and snapshot the debug stuff.
> >
> > Thanks for your fast responses!
> >
>
> Well, it took a week to fail that time :/
>
> strace was showing a select retry that was going to take over a day, so I
> didn't wait for it to catch the lines inbetween :/ I might have to patch
> the code to set an upper limit to whatever backoff that is? Or I could've
> tried to interrupt it...
>
> im:imtcp - just one thread, sitting in 100% CPU on one core.
>
> Local logging continues to work... I see log lines show up for stuff I do
> *on* the log box. no remote log lines appear to be incoming.
>
> I took some gdb backtrace snapshots since that's the best I could do this
> time. Will hopefully get more strace and the debug output next time... if
> the thread will wake up and start applying it.
>
> First below are three summary backtraces which show some threads changing
> state, and the rest not doing much. Below that is one full all-thread
> stack trace.
>
> These are taken from the hung process.
>
> summary of backtrace:
>       4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,??
>       1 select,runInput,thrdStarter,start_thread,clone,??
>       1 select,mainloop,realMain,main
>       1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,??
>       1
> pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,??
>       1
> IsReady,IsReady,RunSelect,Run,runInput,thrdStarter,start_thread,clone,??
>
> another snapshot:
>       4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,??
>       1
> TCPSessGetNxtSess,RunSelect,Run,runInput,thrdStarter,start_thread,clone,??
>       1 select,runInput,thrdStarter,start_thread,clone,??
>       1 select,mainloop,realMain,main
>       1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,??
>       1
> pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,??
>
> another snapshot:
>       4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,??
>       1 select,runInput,thrdStarter,start_thread,clone,??
>       1 select,mainloop,realMain,main
>       1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,??
>       1
> pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,??
>       1
> pNsd=0x7f80142e8990,,Add,Add,RunSelect,Run,runInput,thrdStarter,start_thread,clone,??
>
>
> Full stack trace:
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x7f80137fd700 (LWP 31775)]
> [New Thread 0x7f8013bfe700 (LWP 31774)]
> [New Thread 0x7f8013fff700 (LWP 31773)]
> [New Thread 0x7f8018e81700 (LWP 31772)]
> [New Thread 0x7f8018a80700 (LWP 31771)]
> [New Thread 0x7f8019682700 (LWP 31770)]
> [New Thread 0x7f8019e83700 (LWP 31769)]
> [New Thread 0x7f801a684700 (LWP 31768)]
> 0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6
>
> Thread 9 (Thread 0x7f801a684700 (LWP 31768)):
> #0  0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f801cbf44cd in runInput (pThrd=0x2550060) at imuxsock.c:1363
> #2  0x00000000004638b6 in thrdStarter (arg=0x2550060) at ../threads.c:212
> #3  0x00007f801e050e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x0000000000000000 in ?? ()
>
> Thread 8 (Thread 0x7f8019e83700 (LWP 31769)):
> #0  0x00007f801e057d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f801c9dd001 in readklog (pModConf=0x252f940) at bsd.c:228
> #2  0x00007f801c9dd27e in klogLogKMsg (pModConf=0x252f940) at bsd.c:286
> #3  0x00007f801c9db926 in runInput (pThrd=0x2559cd0) at imklog.c:282
> #4  0x00000000004638b6 in thrdStarter (arg=0x2559cd0) at ../threads.c:212
> #5  0x00007f801e050e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 7 (Thread 0x7f8019682700 (LWP 31770)):
> #0  0x00007f801ae2a2a0 in Add (pNsdsel=0x7f80141e7d80, pNsd=0x7f80142e8990,
> waitOp=NSDSEL_RD) at nsdsel_ptcp.c:82
> #1  0x00007f801bb94f46 in Add (pNsdsel=0x7f801417c020, pNsd=0x7f80140e4420,
> waitOp=NSDSEL_RD) at nsdsel_gtls.c:93
> #2  0x00007f801bfa3536 in Add (pThis=0x7f80144b32a0, pStrm=0x7f80142a8300,
> waitOp=NSDSEL_RD) at nssel.c:142
> #3  0x00007f801bd9bea5 in RunSelect (pThis=0x252b5e0, workset=0x7f8019681570,
> sizeWorkset=128) at tcpsrv.c:758
> #4  0x00007f801bd9c29f in Run (pThis=0x252b5e0) at tcpsrv.c:868
> #5  0x00007f801c1a8351 in runInput (pThrd=0x2559e90) at imtcp.c:584
> #6  0x00000000004638b6 in thrdStarter (arg=0x2559e90) at ../threads.c:212
> #7  0x00007f801e050e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #8  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #9  0x0000000000000000 in ?? ()
>
> Thread 6 (Thread 0x7f8018a80700 (LWP 31771)):
> #0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000450dab in doIdleProcessing (pThis=0x25555d0, pWtp=0x255f9f0,
> pbInactivityTOOccured=0x7f8018a7fd18) at wti.c:252
> #2  0x0000000000451082 in wtiWorker (pThis=0x25555d0) at wti.c:325
> #3  0x0000000000450205 in wtpWorker (arg=0x25555d0) at wtp.c:388
> #4  0x00007f801e050e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 5 (Thread 0x7f8018e81700 (LWP 31772)):
> #0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0bc0) at tcpsrv.c:631
> #2  0x00007f801e050e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 4 (Thread 0x7f8013fff700 (LWP 31773)):
> #0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0c28) at tcpsrv.c:631
> #2  0x00007f801e050e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 3 (Thread 0x7f8013bfe700 (LWP 31774)):
> #0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0c90) at tcpsrv.c:631
> #2  0x00007f801e050e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 2 (Thread 0x7f80137fd700 (LWP 31775)):
> #0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0cf8) at tcpsrv.c:631
> #2  0x00007f801e050e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 0x7f801e691740 (LWP 31767)):
> #0  0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000040f22c in mainloop () at syslogd.c:1286
> #2  0x00000000004108da in realMain (argc=4, argv=0x7fffad7a59b8) at
> syslogd.c:2038
> #3  0x00000000004109ba in main (argc=4, argv=0x7fffad7a59b8) at syslogd.c:2068
>

I didn't fatfinger the strace this time: the im:imtcp thread is still
hung. All other threads seem to be doing stuff via the strace. The "hung"
thread is spewing this:
10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
10730 recvfrom(58, 0x7f49f41835e0, 461, 0, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)

... about as fast as it can execute them.

It's always the same fd (58), which is just one random tcp connection from
some random client.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to