On Mon, 7 Mar 2011, Todd Michael Bushnell wrote:
Appreciate the feedback. Sorry last night's message so sparse - wasn't feeling
too great and wanted to crash. Few points followed by my config file:
1. Using TCP, not RELP, because I'm still using syslog-NG as central
loghost with rsyslog on servers.
Ok, this is still a mechanism that will stall if the server stops
accepting messages
2. Already have configured to queue locally in the event of outage.
See config below. I've tested successfully in the past, but yesterday,
when there were problems, I checked the local queue and did not see
local queueing occurring. Perhaps it was just slow enough to slow
things to a crawl, killing Apache, but not quite slow enough to result
in local queueing. Is that possible? Should I look to tune this?
no, if things are slow it will queue locally, and apache will only see
things slow down if the queue fills up (or if you are writing the queue to
disk, if the disk can't keep up)
what are you looking at to decide that rsyslog is not queuing messages?
3. Running "ancient" version of Rsyslog because this is the latest in
CentOS 5 repo. Figured this is because it's stable which is what I
want. No need for some of the newer bells at this point. If I guessed
wrong here and latest will give me better stability and performance I'll
build new RPMs.
the latest will definantly give you better performance, but the other part
of the problem is that since it is so old, getting help here is a bit
harder, simply because it's ahrd to remember back that far.
4. Have a number of admitted design deficiencies with Apache and Tomcat
that could be contributing to performance although this does not impact
sysklog which is why I proceeded as-is until I could get engineering to
fix.
4a. Apache uses logger to send to local syslog socket (where rsyslog
writes locally and sends to 2 remote servers) and also writes to its own
files locally so we're logging twice locally for every message. Not
good when traffic gets high, I presume. Just noticed this yesterday so
need to get fixed. To make matters worse, all logging is happening on
the same volume. Until fixed, maybe I should just have rsyslog write
local Apache logs to /dev/null and forward to remote syslog - nothing
else. Thoughts?
if you want rsyslog to write the queue to disk you will also have
performance issues (the rsyslog disk queue is not very efficient)
One question to ask is how critical it is that no logs get lost? you may
want to configure rsyslog to discard messages if it gets too many queued
rather than stopping apache.
or you may want to have apache write log files and then have rsyslog use
imfile to read the file.
4b. Log4j sending directly to syslog servers, writing to its own local
files and sending to localhost:514 for local logging. Would prefer all
gets handed to rsyslog for local and remote logging. Need to get
engineering to fix that too. Like mentioned before, to reduce IO
contention and avoid duplication, might just configure rsyslog to write
to /dev/null as long as it's configured like this. Only question with
4a/b is that this never posed a problem with sysklog, but is a problem
with rsyslog. This is the reason I did not try to make any major
changes in phase 1.
remember that sysklog didn't do TCP logging, it only did UDP logging, so
it would send the messages out over the network as fast as it could, and
if the receiver can't keep up the message is lost.
you may want to do a test with rsyslog using UDP instead of TCP and see if
the behavior is what you expect. If it is, then you are loosing logs
because your central server can't keep up with UDP, but with TCP you are
stalling and killing apache instead.
# Configuration File
# Provides kernel logging support (previously done by rklogd)
$ModLoad imklog
# Provides support for local system logging (e.g. via logger command)
$ModLoad imuxsock
# Max Message Size (default 2k)
$MaxMessageSize 8192
hmm, you may want to look at enabling jumbo packets on your network so
that each log message can be pushed in a single packet.
# Must listen on localhost for Log4j. Need engineering to change this
$ModLoad imudp
$UDPServerAddress 127.0.0.1
$UDPServerRun 514
# Use traditional timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# ownership/permissions
$umask 0000
$FileOwner root
$FileGroup wheel
$FileCreateMode 0640
# include directory for breaking directives into separate files (future)
$IncludeConfig /etc/rsyslog.d/
# forward to remote host, queueing to local disk if host is down and memory
fills up
# work (spool) files directory
$WorkDirectory /var/log/rsyslog
# loghost1
# in-memory queue; set for asynchronous processing (?)
$ActionQueueType LinkedList
in my testing LinkedList was slower than the default. everything does
asynchronous processing.
but both the default and LinkedList are limited to memory size and the
$MainMsgQueueSize or $ActionMsgQueueSize (which I think default to 10000)
# failover queue filename; also enables disk mode
$ActionQueueFileName failqueue-loghost1
I don't think this enables disk mode, you also need to set the
$ActionQueueType to a disk related type.
David Lang
# infinite retries on insert failure
$ActionResumeRetryCount -1
# save in-memory data if rsyslog shuts down
$ActionQueueSaveOnShutdown on
# remote logging of everything
*.* @@loghost1:5140
# loghost2
# in-memory queue; set for asynchronous processing (?)
$ActionQueueType LinkedList
# failover queue filename; also enables disk mode
$ActionQueueFileName failqueue-loghost2
# infinite retries on insert failure
$ActionResumeRetryCount -1
# save in-memory data if rsyslog shuts down
$ActionQueueSaveOnShutdown on
# remote logging of everything
*.* @@loghost2:5140
# Log Filtering Rules
# Emergency Messages
if $syslogseverity <= '0' then *
if $syslogseverity <= '0' then /var/log/messages
if $syslogseverity <= '0' then ~
# Apache
if $programname == 'logger' and ($msg contains 'access_log' or $msg contains
'cookie_log' or $msg contains 'r
equest_log') then /var/log/http
& ~
if $programname == 'httpd' and ($syslogfacility-text == 'local5' or
$syslogfacility-text == 'local6') then /var/log/http_err
& ~
# Log4j (App Logs)
if $programname == 'com.redacted.infra.syslog.Log4jSystemLogger' then
/var/log/log4j
& ~
# Kernel & IPTables
if $programname == 'kernel' and ($msg contains 'LOGACCEPT' or $msg contains
'LOGDROP') then /var/log/iptables
& ~
# Auth Messages
if $syslogfacility-text == 'auth' or $syslogfacility-text == 'authpriv' then
/var/log/secure
& ~
# Mail
if $syslogfacility-text == 'mail' then /var/log/maillog
& ~
# Catchall for remaining log messages
*.* /var/log/messages
On Mar 6, 2011, at 10:43 PM, Todd Michael Bushnell wrote:
Been planning an rsyslog deployment for about a month. Everything performed as
expected in my limited use dev environment, but when I deployed rsyslog today
to my production environment multiple systems yielded similar disastrous
results:
After a few hours Apache jumped up to 250+ processes (max=256, normal=~50) and
then started hanging. At this time, rsyslog also stopped logging altogether.
As soon as I killed rsyslog and started sysklog, httpd processes dropped to 50
and everything went back to normal.
I'm not sure if this is a case where rsyslog froze and it's state resulted in
Apache's inability to close processes or if there is a problem with Apache and
Rsyslog when a decent volume of traffic is passed through. I'm happy to
provide additional information if someone could give me some clues as to where
to start looking. At this point we're reverting until I can diagnose this
issue and assure my team that I've fixed the problem for good.
Version: rsyslog-3.22.1-3.el5_5.1
System: Linux ******* 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008
x86_64 x86_64 x86_64 GNU/Linux
Todd Michael Bushnell
[email protected]
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com