On Mon, 7 Mar 2011, Todd Michael Bushnell wrote:
Appreciate the feedback. Sorry last night's message so sparse - wasn't feeling
too great and wanted to crash. Few points followed by my config file:
1. Using TCP, not RELP, because I'm still using syslog-NG as central loghost
with rsyslog on servers.
Ok, this is still a mechanism that will stall if the server stops accepting
messages
2. Already have configured to queue locally in the event of outage. See config
below. I've tested successfully in the past, but yesterday, when there were
problems, I checked the local queue and did not see local queueing occurring.
Perhaps it was just slow enough to slow things to a crawl, killing Apache, but
not quite slow enough to result in local queueing. Is that possible? Should I
look to tune this?
no, if things are slow it will queue locally, and apache will only see things
slow down if the queue fills up (or if you are writing the queue to disk, if
the disk can't keep up)
what are you looking at to decide that rsyslog is not queuing messages?
3. Running "ancient" version of Rsyslog because this is the latest in CentOS 5
repo. Figured this is because it's stable which is what I want. No need for some of the
newer bells at this point. If I guessed wrong here and latest will give me better
stability and performance I'll build new RPMs.
the latest will definantly give you better performance, but the other part of
the problem is that since it is so old, getting help here is a bit harder,
simply because it's ahrd to remember back that far.
4. Have a number of admitted design deficiencies with Apache and Tomcat that
could be contributing to performance although this does not impact sysklog
which is why I proceeded as-is until I could get engineering to fix.
4a. Apache uses logger to send to local syslog socket (where rsyslog writes
locally and sends to 2 remote servers) and also writes to its own files locally
so we're logging twice locally for every message. Not good when traffic gets
high, I presume. Just noticed this yesterday so need to get fixed. To make
matters worse, all logging is happening on the same volume. Until fixed, maybe
I should just have rsyslog write local Apache logs to /dev/null and forward to
remote syslog - nothing else. Thoughts?
if you want rsyslog to write the queue to disk you will also have performance
issues (the rsyslog disk queue is not very efficient)
One question to ask is how critical it is that no logs get lost? you may want
to configure rsyslog to discard messages if it gets too many queued rather than
stopping apache.
or you may want to have apache write log files and then have rsyslog use imfile
to read the file.
4b. Log4j sending directly to syslog servers, writing to its own local files
and sending to localhost:514 for local logging. Would prefer all gets handed
to rsyslog for local and remote logging. Need to get engineering to fix that
too. Like mentioned before, to reduce IO contention and avoid duplication,
might just configure rsyslog to write to /dev/null as long as it's configured
like this. Only question with 4a/b is that this never posed a problem with
sysklog, but is a problem with rsyslog. This is the reason I did not try to
make any major changes in phase 1.
remember that sysklog didn't do TCP logging, it only did UDP logging, so it
would send the messages out over the network as fast as it could, and if the
receiver can't keep up the message is lost.
you may want to do a test with rsyslog using UDP instead of TCP and see if the
behavior is what you expect. If it is, then you are loosing logs because your
central server can't keep up with UDP, but with TCP you are stalling and
killing apache instead.
# Configuration File
# Provides kernel logging support (previously done by rklogd)
$ModLoad imklog
# Provides support for local system logging (e.g. via logger command)
$ModLoad imuxsock
# Max Message Size (default 2k)
$MaxMessageSize 8192
hmm, you may want to look at enabling jumbo packets on your network so that
each log message can be pushed in a single packet.
# Must listen on localhost for Log4j. Need engineering to change this
$ModLoad imudp
$UDPServerAddress 127.0.0.1
$UDPServerRun 514
# Use traditional timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# ownership/permissions
$umask 0000
$FileOwner root
$FileGroup wheel
$FileCreateMode 0640
# include directory for breaking directives into separate files (future)
$IncludeConfig /etc/rsyslog.d/
# forward to remote host, queueing to local disk if host is down and memory
fills up
# work (spool) files directory
$WorkDirectory /var/log/rsyslog
# loghost1
# in-memory queue; set for asynchronous processing (?)
$ActionQueueType LinkedList
in my testing LinkedList was slower than the default. everything does
asynchronous processing.
but both the default and LinkedList are limited to memory size and the
$MainMsgQueueSize or $ActionMsgQueueSize (which I think default to 10000)
# failover queue filename; also enables disk mode
$ActionQueueFileName failqueue-loghost1
I don't think this enables disk mode, you also need to set the $ActionQueueType
to a disk related type.
David Lang
# infinite retries on insert failure
$ActionResumeRetryCount -1
# save in-memory data if rsyslog shuts down
$ActionQueueSaveOnShutdown on
# remote logging of everything
*.* @@loghost1:5140
# loghost2
# in-memory queue; set for asynchronous processing (?)
$ActionQueueType LinkedList
# failover queue filename; also enables disk mode
$ActionQueueFileName failqueue-loghost2
# infinite retries on insert failure
$ActionResumeRetryCount -1
# save in-memory data if rsyslog shuts down
$ActionQueueSaveOnShutdown on
# remote logging of everything
*.* @@loghost2:5140
# Log Filtering Rules
# Emergency Messages
if $syslogseverity <= '0' then *
if $syslogseverity <= '0' then /var/log/messages
if $syslogseverity <= '0' then ~
# Apache
if $programname == 'logger' and ($msg contains 'access_log' or $msg contains
'cookie_log' or $msg contains 'r
equest_log') then /var/log/http
& ~
if $programname == 'httpd' and ($syslogfacility-text == 'local5' or
$syslogfacility-text == 'local6') then /var/log/http_err
& ~
# Log4j (App Logs)
if $programname == 'com.redacted.infra.syslog.Log4jSystemLogger' then
/var/log/log4j
& ~
# Kernel & IPTables
if $programname == 'kernel' and ($msg contains 'LOGACCEPT' or $msg contains
'LOGDROP') then /var/log/iptables
& ~
# Auth Messages
if $syslogfacility-text == 'auth' or $syslogfacility-text == 'authpriv' then
/var/log/secure
& ~
# Mail
if $syslogfacility-text == 'mail' then /var/log/maillog
& ~
# Catchall for remaining log messages
*.* /var/log/messages
On Mar 6, 2011, at 10:43 PM, Todd Michael Bushnell wrote:
Been planning an rsyslog deployment for about a month. Everything performed as
expected in my limited use dev environment, but when I deployed rsyslog today
to my production environment multiple systems yielded similar disastrous
results:
After a few hours Apache jumped up to 250+ processes (max=256, normal=~50) and
then started hanging. At this time, rsyslog also stopped logging altogether.
As soon as I killed rsyslog and started sysklog, httpd processes dropped to 50
and everything went back to normal.
I'm not sure if this is a case where rsyslog froze and it's state resulted in
Apache's inability to close processes or if there is a problem with Apache and
Rsyslog when a decent volume of traffic is passed through. I'm happy to
provide additional information if someone could give me some clues as to where
to start looking. At this point we're reverting until I can diagnose this
issue and assure my team that I've fixed the problem for good.
Version: rsyslog-3.22.1-3.el5_5.1
System: Linux ******* 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008
x86_64 x86_64 x86_64 GNU/Linux
Todd Michael Bushnell
[email protected]
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com