Hello:
First of all, apologies if this has been addressed before!
We're forwarding our logs to a central server using the following config:
$WorkDirectory /var/lib/rsyslog
$ActionQueueType LinkedList
$ActionQueueFileName remoteq
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
*.* @@172.16.0.75:514
# -- a bunch of rules also logging to local files follow
This normally works great; and the queuing works fine when the server
rsyslog process is shut down. But, when we reboot the server, or shut
it down, huge chunks of forwarded data fail to get queued. Looking at
the debug logs, it looks like the client side thinks the server's still
around, and isn't queuing the logs:
6939.784970000:42790940: Message from UNIX socket: #3
6939.785145000:42790940: logmsg: flags 4, from 'testclient', msg Aug 4
15:28:59 root: new test message # 47
6939.785236000:42790940: Message has legacy syslog format.
6939.785330000:42790940: main Q: entry added, size now 1 entries
6939.785440000:42790940: wtpAdviseMaxWorkers signals busy
6939.785530000:42790940: main Q: EnqueueMsg advised worker start
6939.785640000:42790940: --------imuxsock calling select, active file
descriptors (max 3): 3
6939.786501000:41d8f940: main Q: entry deleted, state 0, size now 0 entries
6939.786616000:41d8f940: testing filter, f_pmask 255
6939.786699000:41d8f940: Called action, logging to builtin-fwd
6939.786793000:41d8f940: action 1 queue: entry added, size now 1 entries
6939.786905000:41d8f940: wtpAdviseMaxWorkers signals busy
6939.786989000:41d8f940: action 1 queue: EnqueueMsg advised worker start
6939.787099000:41d8f940: testing filter, f_pmask 127
6939.787186000:41d8f940: Called action, logging to builtin-file
6939.787274000:41d8f940: file to log to: /var/log/messages
6939.787360000:41d8f940: doWrite, pData->pStrm 0x2b237853a650, lenBuf 65
6939.787445000:41d8f940: strm 0x2b237853a650: file 6(messages) flush,
buflen 65
6939.787575000:41d8f940: strm 0x2b237853a650: file 6 write wrote 65 bytes
6939.787690000:41d8f940: testing filter, f_pmask 0
6939.787775000:41d8f940: testing filter, f_pmask 0
6939.787855000:41d8f940: testing filter, f_pmask 0
6939.787934000:41d8f940: testing filter, f_pmask 1
6939.788014000:41d8f940: testing filter, f_pmask 0
6939.788096000:41d8f940: testing filter, f_pmask 0
6939.788195000:41d8f940: main Q:Reg/w0: worker IDLE, waiting for work.
6939.788288000:4138e940: action 1 queue: entry deleted, state 0, size
now 0 entries
6939.788401000:4138e940: 172.16.0.75
6939.788481000:4138e940: 172.16.0.75:514/tcp
6939.788571000:4138e940: TCP sent 69 bytes, requested 69
6939.788659000:4138e940: action 1 queue:Reg/w0: worker IDLE, waiting for
work.
Sending a log message every second from the client, on the server,
looking at a combined log, this is what we see during a reboot scenario
(only listing remote test messages and logging-related server messages):
Aug 4 16:25:49 testclient root: Test message # 5
Aug 4 16:25:50 testclient root: Test message # 6
Aug 4 16:25:51 logserver shutdown[2104]: shutting down for system reboot
Aug 4 16:25:51 logserver init: Switching to runlevel: 6
Aug 4 16:25:51 testclient root: Test message # 7
Aug 4 16:25:52 testclient root: Test message # 8
Aug 4 16:25:53 testclient root: Test message # 9
Aug 4 16:25:53 logserver rsyslogd: db error (2002): Can't connect to
local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Aug 4 16:25:54 testclient root: Test message # 10
Aug 4 16:25:55 testclient root: Test message # 11
Aug 4 16:25:56 testclient root: Test message # 12
Aug 4 16:25:57 testclient root: Test message # 13
Aug 4 16:25:58 logserver kernel: Kernel logging (proc) stopped.
Aug 4 16:25:58 logserver rsyslogd: [origin software="rsyslogd"
swVersion="4.6.2" x-pid="1282" x-info="http://www.rsyslog.com"] exiting
on signal 15.
Aug 4 16:26:44 logserver kernel: imklog 4.6.2, log source = /proc/kmsg
started.
Aug 4 16:26:44 logserver rsyslogd: [origin software="rsyslogd"
swVersion="4.6.2" x-pid="1282" x-info="http://www.rsyslog.com"] (re)start
Aug 4 16:26:44 logserver kernel: Bootdata ok (command line is
root=/dev/sys.vg/root.lv ro panic=30 time)
Aug 4 16:26:44 logserver kernel: Linux version
2.6.18-194.32.1.1.el5.xen (mockbuild@*****.com) (gcc version 4.1.2
20080704 (Red Hat 4.1.2-48)) #1 SMP Fri May 20 11:57:35 PDT 2011
---- whole boot process here ----
Aug 4 16:26:44 logserver rsyslogd: db error (2002): Can't connect to
local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Aug 4 16:26:50 testclient root: Test message # 66
Aug 4 16:26:51 testclient root: Test message # 67
---- back to normal ---
The DB errors look fine/unrelated - even with db logging turned off and
only spitting logs out to file, we get this problem.
Is this just a consequence of using TCP? I don't mind losing a few log
entries, but potentially losing a few hundred is another matter. Is the
answer to switch to RELP? Or am I missing some key configuration setting?
We're using rsyslog 4.6.2 as packaged for RHEL 6, on CentOS 5.6 (we're
preparing to move to CentOS 6, hence the version weirdness...). I
certainly don't mind running a different version if that's what it takes
though.
Thanks in advance for any ideas, feedback, etc.
-Jonas Courteau
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com