On Fri, 11 Mar 2011, Todd Michael Bushnell wrote:
Appreciate the stellar advice Rainer. I built a 5.6.x (latest stable), but
before upgrading I wanted to do some tests with my existing version/config and
then apply some of the rule syntax changes you recommended to gauge performance
benefit. I used the syslog_caller tool to perform a few tests. Here are those
results:
Running the following command on a test box and simply recording realtime for
comparison:
time ./syslog_caller -m 50000
Initial (Apache killing) Config w/ Remote TCP logging: 5m38.163s
Switch to UDP Remote Logging:
4m22.497s
Disable Remote Logging:
3m23.322s
Initial w/ Amended Rules (see RULES section):
4m54.055s
Amended Rules w/ expression-based rules commented out: 2m36.023s
Amended Rules w/ MainMsg Disk Queue 7m43.498
Sysklog (for comparison): 2m13.986s
Glad to see just changing my rules improved performance by about 13%. My
initial reaction was to send this info and ask a series of questions based on
the data, but instead decided to give it a whirl with the latest stable
version: 5.6.4.
5.6.4 w/ Amended rules: 0m3.773s
Wow - I almost fell off my chair! This is AMAZING! Thank you! Given these
results, I just have a couple final questions:
we really mean it when we say that performance has improved with the later
versions ;-)
now, given what you have been describing, I suspect that you are still
going to have problems, because I think that your central log server just
can't quite keep up, so with TCP logging you will still block eventually.
In compliance heavy environments (which I'm in), I assume the
recommendation is to add disk queuing for the main queue. Is this
correct? Something like:
$MainMsgQueueFileName mainqueue
$MainMsgQueueType LinkedList
$MainMsgQueueSaveOnShutdown on
I understand there is a performance tradeoff, but given PCI-DSS, it'll
be worth it, I think.
the thing is that this isn't giving you the reliability that you think it
is.
the process of logging with rsyslog has many steps
1. write the log to /dev/log
if the system crashes here the log is lost. has the application already
completed the action it's trying to log? if so you have no record of it.
2. rsyslog accepts the message and puts it in the main message queue
unless the main message queue is a disk queue (not a disk assisted
queue) and you have fsyncs enabled, if the box crashes at this point you
loose the log
3. rsyslog decides if the message should go to a particular destination,
if you have a separate action queue for this destination, the message is
put into that queue.
again, unless you are using a disk queue, a crash can loose the message
4a. rsyslog sends the log to the remote server and deletes it from the
action queue.
unless you are using RELP, rsyslog may send the message to the TCP
stack, but it has no way of knowing if the remote server has received the
message.
4b. rsyslog writes the log to a local file
unless you have fsync enabled after each write, a crash at this time
will loose the log message.
note that disk queues are very slow, and fsync on ext3 with other write
activity can stall for seconds at a time.
I did some testing a year or so ago with a very high performance solid
state drive (a fusion-io PCI card that cost >$5K for 80G of storage), with
that drive and ext2, without an action queue, I was able to process 8K
logs/sec, compared to 400K logs/sec with memory queues (at the time I
could only write out around 80K logs/sec, but faster bursts that fit in
memory were handled just fine, since then there has been more improvement
to rsyslog and people are reporting write rates of several hundred
thousand logs/sec)
doing the same test on a standard SATA drive resulted in around 10 (yes
TEN) logs/sec being processed.
I also operate in a PCI environment, there are limites to what is expected
of you in terms of preserving logs.
I would suggest that you end up with two copies of rsyslog running on your
servers.
the first copy for compliance critical logs.
these should be a relativly low volume
the application should be double-logging everything
i.e. I am about to do X
I just tried to do X and it succeded/failed
this way you can tell if something failed in the middle of a
transaction and can investigate if the transaction took place or not
this instance of rsyslog can be configured to sync everything, use RELP,
write to mirrored drives, etc to do everything you can to make sure the
log does not get lost.
the application need to either use RELP to talk to rsyslog, or use
/dev/log (writes to /dev/log do not return until the log is in the queue)
if this instance stops (runs out of disk space, crashes, etc) the
application will halt.
the second copy is for normal activity logs (apache logs, etc)
these will be a fairly high volume (especially by comparison)
if systems fail you will loose some of these logs
at this point you can decide what reliability measures you deem prudent
for these logs.
personally, for this second category, failover syslog servers running UDP
on a fairly quiet network are good enough for me, I've tested this setup
to hundreds of thousands of logs/sec without loosing any logs in my tests
(and the tests have involved sending billions of log messages), so while
it is not guaranteed reliability, in practice it is 'good enough'
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com