On Fri, 11 Mar 2011, Todd Michael Bushnell wrote:

Appreciate the stellar advice Rainer.  I built a 5.6.x (latest stable), but 
before upgrading I wanted to do some tests with my existing version/config and 
then apply some of the rule syntax changes you recommended to gauge performance 
benefit.  I used the syslog_caller tool to perform a few tests.  Here are those 
results:

Running the following command on a test box and simply recording realtime for 
comparison:
time ./syslog_caller -m 50000

Initial (Apache killing) Config w/ Remote TCP logging:          5m38.163s
Switch to UDP Remote Logging:                                                   
4m22.497s
Disable Remote Logging:                                                         
        3m23.322s
Initial w/ Amended Rules (see RULES section):                           
4m54.055s
Amended Rules w/ expression-based rules commented out:  2m36.023s
Amended Rules w/ MainMsg Disk Queue                                     7m43.498

Sysklog (for comparison):  2m13.986s

Glad to see just changing my rules improved performance by about 13%.  My 
initial reaction was to send this info and ask a series of questions based on 
the data, but instead decided to give it a whirl with the latest stable 
version: 5.6.4.

5.6.4 w/ Amended rules:  0m3.773s

Wow - I almost fell off my chair!  This is AMAZING!  Thank you!  Given these 
results, I just have a couple final questions:

we really mean it when we say that performance has improved with the later versions ;-)

now, given what you have been describing, I suspect that you are still going to have problems, because I think that your central log server just can't quite keep up, so with TCP logging you will still block eventually.

In compliance heavy environments (which I'm in), I assume the recommendation is to add disk queuing for the main queue. Is this correct? Something like:

$MainMsgQueueFileName mainqueue
$MainMsgQueueType LinkedList
$MainMsgQueueSaveOnShutdown on

I understand there is a performance tradeoff, but given PCI-DSS, it'll be worth it, I think.

the thing is that this isn't giving you the reliability that you think it is.

the process of logging with rsyslog has many steps

1. write the log to /dev/log

if the system crashes here the log is lost. has the application already completed the action it's trying to log? if so you have no record of it.

2. rsyslog accepts the message and puts it in the main message queue

unless the main message queue is a disk queue (not a disk assisted queue) and you have fsyncs enabled, if the box crashes at this point you loose the log

3. rsyslog decides if the message should go to a particular destination, if you have a separate action queue for this destination, the message is put into that queue.

  again, unless you are using a disk queue, a crash can loose the message

4a. rsyslog sends the log to the remote server and deletes it from the action queue.

unless you are using RELP, rsyslog may send the message to the TCP stack, but it has no way of knowing if the remote server has received the message.

4b. rsyslog writes the log to a local file

unless you have fsync enabled after each write, a crash at this time will loose the log message.


note that disk queues are very slow, and fsync on ext3 with other write activity can stall for seconds at a time.

I did some testing a year or so ago with a very high performance solid state drive (a fusion-io PCI card that cost >$5K for 80G of storage), with that drive and ext2, without an action queue, I was able to process 8K logs/sec, compared to 400K logs/sec with memory queues (at the time I could only write out around 80K logs/sec, but faster bursts that fit in memory were handled just fine, since then there has been more improvement to rsyslog and people are reporting write rates of several hundred thousand logs/sec)

doing the same test on a standard SATA drive resulted in around 10 (yes TEN) logs/sec being processed.

I also operate in a PCI environment, there are limites to what is expected of you in terms of preserving logs.

I would suggest that you end up with two copies of rsyslog running on your servers.

the first copy for compliance critical logs.

  these should be a relativly low volume

  the application should be double-logging everything

    i.e.  I am about to do X
          I just tried to do X and it succeded/failed

this way you can tell if something failed in the middle of a transaction and can investigate if the transaction took place or not

this instance of rsyslog can be configured to sync everything, use RELP, write to mirrored drives, etc to do everything you can to make sure the log does not get lost.

the application need to either use RELP to talk to rsyslog, or use /dev/log (writes to /dev/log do not return until the log is in the queue)

if this instance stops (runs out of disk space, crashes, etc) the application will halt.


the second copy is for normal activity logs (apache logs, etc)

  these will be a fairly high volume (especially by comparison)

  if systems fail you will loose some of these logs

at this point you can decide what reliability measures you deem prudent for these logs.


personally, for this second category, failover syslog servers running UDP on a fairly quiet network are good enough for me, I've tested this setup to hundreds of thousands of logs/sec without loosing any logs in my tests (and the tests have involved sending billions of log messages), so while it is not guaranteed reliability, in practice it is 'good enough'

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to