Hello:

First of all, apologies if this has been addressed before!

We're forwarding our logs to a central server using the following config:

$WorkDirectory /var/lib/rsyslog

$ActionQueueType LinkedList
$ActionQueueFileName remoteq
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
*.* @@172.16.0.75:514
# -- a bunch of rules also logging to local files follow

This normally works great; and the queuing works fine when the server rsyslog process is shut down. But, when we reboot the server, or shut it down, huge chunks of forwarded data fail to get queued. Looking at the debug logs, it looks like the client side thinks the server's still around, and isn't queuing the logs:

6939.784970000:42790940: Message from UNIX socket: #3
6939.785145000:42790940: logmsg: flags 4, from 'testclient', msg Aug 4 15:28:59 root: new test message # 47
6939.785236000:42790940: Message has legacy syslog format.
6939.785330000:42790940: main Q: entry added, size now 1 entries
6939.785440000:42790940: wtpAdviseMaxWorkers signals busy
6939.785530000:42790940: main Q: EnqueueMsg advised worker start
6939.785640000:42790940: --------imuxsock calling select, active file descriptors (max 3): 3
6939.786501000:41d8f940: main Q: entry deleted, state 0, size now 0 entries
6939.786616000:41d8f940: testing filter, f_pmask 255
6939.786699000:41d8f940: Called action, logging to builtin-fwd
6939.786793000:41d8f940: action 1 queue: entry added, size now 1 entries
6939.786905000:41d8f940: wtpAdviseMaxWorkers signals busy
6939.786989000:41d8f940: action 1 queue: EnqueueMsg advised worker start
6939.787099000:41d8f940: testing filter, f_pmask 127
6939.787186000:41d8f940: Called action, logging to builtin-file
6939.787274000:41d8f940: file to log to: /var/log/messages
6939.787360000:41d8f940: doWrite, pData->pStrm 0x2b237853a650, lenBuf 65
6939.787445000:41d8f940: strm 0x2b237853a650: file 6(messages) flush, buflen 65
6939.787575000:41d8f940: strm 0x2b237853a650: file 6 write wrote 65 bytes
6939.787690000:41d8f940: testing filter, f_pmask 0
6939.787775000:41d8f940: testing filter, f_pmask 0
6939.787855000:41d8f940: testing filter, f_pmask 0
6939.787934000:41d8f940: testing filter, f_pmask 1
6939.788014000:41d8f940: testing filter, f_pmask 0
6939.788096000:41d8f940: testing filter, f_pmask 0
6939.788195000:41d8f940: main Q:Reg/w0: worker IDLE, waiting for work.
6939.788288000:4138e940: action 1 queue: entry deleted, state 0, size now 0 entries
6939.788401000:4138e940:  172.16.0.75
6939.788481000:4138e940:  172.16.0.75:514/tcp
6939.788571000:4138e940: TCP sent 69 bytes, requested 69
6939.788659000:4138e940: action 1 queue:Reg/w0: worker IDLE, waiting for work.

Sending a log message every second from the client, on the server, looking at a combined log, this is what we see during a reboot scenario (only listing remote test messages and logging-related server messages):

Aug  4 16:25:49 testclient root: Test message # 5
Aug  4 16:25:50 testclient root: Test message # 6
Aug  4 16:25:51 logserver shutdown[2104]: shutting down for system reboot
Aug  4 16:25:51 logserver init: Switching to runlevel: 6
Aug  4 16:25:51 testclient root: Test message # 7
Aug  4 16:25:52 testclient root: Test message # 8
Aug  4 16:25:53 testclient root: Test message # 9
Aug 4 16:25:53 logserver rsyslogd: db error (2002): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Aug  4 16:25:54 testclient root: Test message # 10
Aug  4 16:25:55 testclient root: Test message # 11
Aug  4 16:25:56 testclient root: Test message # 12
Aug  4 16:25:57 testclient root: Test message # 13
Aug  4 16:25:58 logserver kernel: Kernel logging (proc) stopped.
Aug 4 16:25:58 logserver rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1282" x-info="http://www.rsyslog.com";] exiting on signal 15. Aug 4 16:26:44 logserver kernel: imklog 4.6.2, log source = /proc/kmsg started. Aug 4 16:26:44 logserver rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1282" x-info="http://www.rsyslog.com";] (re)start Aug 4 16:26:44 logserver kernel: Bootdata ok (command line is root=/dev/sys.vg/root.lv ro panic=30 time) Aug 4 16:26:44 logserver kernel: Linux version 2.6.18-194.32.1.1.el5.xen (mockbuild@*****.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri May 20 11:57:35 PDT 2011
---- whole boot process here ----
Aug 4 16:26:44 logserver rsyslogd: db error (2002): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Aug  4 16:26:50 testclient root: Test message # 66
Aug  4 16:26:51 testclient root: Test message # 67
---- back to normal ---

The DB errors look fine/unrelated - even with db logging turned off and only spitting logs out to file, we get this problem.

Is this just a consequence of using TCP? I don't mind losing a few log entries, but potentially losing a few hundred is another matter. Is the answer to switch to RELP? Or am I missing some key configuration setting?

We're using rsyslog 4.6.2 as packaged for RHEL 6, on CentOS 5.6 (we're preparing to move to CentOS 6, hence the version weirdness...). I certainly don't mind running a different version if that's what it takes though.

Thanks in advance for any ideas, feedback, etc.

-Jonas Courteau
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to