> > On Mon, Jan 13, 2014 at 9:22 AM, dormando <[email protected]> wrote:
> >
> > > There's no forwarder from the central server, if I understand what you
> > > were saying. Also given the strace of the select (which is selecting for
> > > nothing and waiting forever?) is it checking for something inbetween the
> > > select runs?
> > >
> > >
> > OK, the configs I think look good. What exactly it is waiting for is hard
> > to tell without context. Can you gather a debug log while it is in this
> > state. Details are here:
> >
> > http://www.rsyslog.com/doc/troubleshoot.html
> >
> > If a full debug log is too much to do, you may want to use debug on demand:
> >
> > http://www.rsyslog.com/how-to-use-debug-on-demand/
> >
> > If that's also impossible, at least many more strace lines PLUS information
> > on what the thread name is (obtained via top) may also help us get closer.
>
> on-demand might work. It'll take a few days to iterate through this due to
> the infrequency of the hang. Thought it was hung just now but it started
> working again... I guess it could also be intermittent. Maybe I'll set up
> a script to watch for log gaps and snapshot the debug stuff.
>
> Thanks for your fast responses!
>

Well, it took a week to fail that time :/

strace was showing a select retry that was going to take over a day, so I
didn't wait for it to catch the lines inbetween :/ I might have to patch
the code to set an upper limit to whatever backoff that is? Or I could've
tried to interrupt it...

im:imtcp - just one thread, sitting in 100% CPU on one core.

Local logging continues to work... I see log lines show up for stuff I do
*on* the log box. no remote log lines appear to be incoming.

I took some gdb backtrace snapshots since that's the best I could do this
time. Will hopefully get more strace and the debug output next time... if
the thread will wake up and start applying it.

First below are three summary backtraces which show some threads changing
state, and the rest not doing much. Below that is one full all-thread
stack trace.

These are taken from the hung process.

summary of backtrace:
      4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,??
      1 select,runInput,thrdStarter,start_thread,clone,??
      1 select,mainloop,realMain,main
      1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,??
      1
pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,??
      1
IsReady,IsReady,RunSelect,Run,runInput,thrdStarter,start_thread,clone,??

another snapshot:
      4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,??
      1
TCPSessGetNxtSess,RunSelect,Run,runInput,thrdStarter,start_thread,clone,??
      1 select,runInput,thrdStarter,start_thread,clone,??
      1 select,mainloop,realMain,main
      1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,??
      1
pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,??

another snapshot:
      4 pthread_cond_wait@@GLIBC_2.3.2,wrkr,start_thread,clone,??
      1 select,runInput,thrdStarter,start_thread,clone,??
      1 select,mainloop,realMain,main
      1 read,readklog,klogLogKMsg,runInput,thrdStarter,start_thread,clone,??
      1
pthread_cond_wait@@GLIBC_2.3.2,doIdleProcessing,wtiWorker,wtpWorker,start_thread,clone,??
      1
pNsd=0x7f80142e8990,,Add,Add,RunSelect,Run,runInput,thrdStarter,start_thread,clone,??


Full stack trace:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f80137fd700 (LWP 31775)]
[New Thread 0x7f8013bfe700 (LWP 31774)]
[New Thread 0x7f8013fff700 (LWP 31773)]
[New Thread 0x7f8018e81700 (LWP 31772)]
[New Thread 0x7f8018a80700 (LWP 31771)]
[New Thread 0x7f8019682700 (LWP 31770)]
[New Thread 0x7f8019e83700 (LWP 31769)]
[New Thread 0x7f801a684700 (LWP 31768)]
0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6

Thread 9 (Thread 0x7f801a684700 (LWP 31768)):
#0  0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f801cbf44cd in runInput (pThrd=0x2550060) at imuxsock.c:1363
#2  0x00000000004638b6 in thrdStarter (arg=0x2550060) at ../threads.c:212
#3  0x00007f801e050e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f8019e83700 (LWP 31769)):
#0  0x00007f801e057d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f801c9dd001 in readklog (pModConf=0x252f940) at bsd.c:228
#2  0x00007f801c9dd27e in klogLogKMsg (pModConf=0x252f940) at bsd.c:286
#3  0x00007f801c9db926 in runInput (pThrd=0x2559cd0) at imklog.c:282
#4  0x00000000004638b6 in thrdStarter (arg=0x2559cd0) at ../threads.c:212
#5  0x00007f801e050e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#6  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f8019682700 (LWP 31770)):
#0  0x00007f801ae2a2a0 in Add (pNsdsel=0x7f80141e7d80, pNsd=0x7f80142e8990,
waitOp=NSDSEL_RD) at nsdsel_ptcp.c:82
#1  0x00007f801bb94f46 in Add (pNsdsel=0x7f801417c020, pNsd=0x7f80140e4420,
waitOp=NSDSEL_RD) at nsdsel_gtls.c:93
#2  0x00007f801bfa3536 in Add (pThis=0x7f80144b32a0, pStrm=0x7f80142a8300,
waitOp=NSDSEL_RD) at nssel.c:142
#3  0x00007f801bd9bea5 in RunSelect (pThis=0x252b5e0, workset=0x7f8019681570,
sizeWorkset=128) at tcpsrv.c:758
#4  0x00007f801bd9c29f in Run (pThis=0x252b5e0) at tcpsrv.c:868
#5  0x00007f801c1a8351 in runInput (pThrd=0x2559e90) at imtcp.c:584
#6  0x00000000004638b6 in thrdStarter (arg=0x2559e90) at ../threads.c:212
#7  0x00007f801e050e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f8018a80700 (LWP 31771)):
#0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000000000450dab in doIdleProcessing (pThis=0x25555d0, pWtp=0x255f9f0,
pbInactivityTOOccured=0x7f8018a7fd18) at wti.c:252
#2  0x0000000000451082 in wtiWorker (pThis=0x25555d0) at wti.c:325
#3  0x0000000000450205 in wtpWorker (arg=0x25555d0) at wtp.c:388
#4  0x00007f801e050e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f8018e81700 (LWP 31772)):
#0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0bc0) at tcpsrv.c:631
#2  0x00007f801e050e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f8013fff700 (LWP 31773)):
#0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0c28) at tcpsrv.c:631
#2  0x00007f801e050e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f8013bfe700 (LWP 31774)):
#0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0c90) at tcpsrv.c:631
#2  0x00007f801e050e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f80137fd700 (LWP 31775)):
#0  0x00007f801e054d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f801bd9b97c in wrkr (myself=0x7f801bfa0cf8) at tcpsrv.c:631
#2  0x00007f801e050e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f801d3603fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f801e691740 (LWP 31767)):
#0  0x00007f801d359763 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000000040f22c in mainloop () at syslogd.c:1286
#2  0x00000000004108da in realMain (argc=4, argv=0x7fffad7a59b8) at
syslogd.c:2038
#3  0x00000000004109ba in main (argc=4, argv=0x7fffad7a59b8) at syslogd.c:2068


Thanks,
-Dormando
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to