Aaron, I am replying on-list so that others can follow (and I hope they have some advise). I have stripped the sensitive part of your message. In short for all others: I got a debug log and have now reviewed it.
> During this debug run we saw NAS write volume increase from a "normal" > 2-3mbps and 35 iops up to 500 iops and 5.6mbps toward the end of the > test. I do understand how enabling file sync will increase iops but > I'm unclear on why the volume of data being written would increase so > much. If rsyslog has file sync "disabled" then I would expect bursts > of data and if file sync is "enabled" I would expect the same amount > of data to be written in much more frequent, but also much smaller > chunks. NFS could be a culprit here, but I'm curious what NFS > parameters would impact this behavior. The debug log does not show anything that looks specifically suspect. The incoming traffic is intense and the default queue size is overrun quickly, resulting in voluntary delays of the sender (via tcp). Looking at the traffic, I think this is perfectly legal and increasing the queue size would not bring any benefit (I think the end result would just be a larger memory footprint without any advantage). I do not see any unusual write behavior. Of course, the debug log could be more verbose as it does not specifically tell about the syncs (usually, the log is missing some information you'd like to see for a given instant, but you can't include everything...). But looking at the code base plus the log really does not reveal anything suspicious. One thing you might want to try is doing an strace so that we can look at it and see which exact OS calls are made and how long they last. > For the record, changing the sync behavior of rsyslog DOES fix the > original complaint. When we have file sync disabled we see up to 30 > second pauses between writes to files which get multiple messages per > second delivered to the rsyslog host. There is nothing in the log that points to rsyslog doing these delays. So my conclusion really is that you need to look at the NFS layer. To me, it looks like NFS does some local caching, and maybe 30 seconds is the timeout at which NFS writes data from files that are constantly being written in short intervals. With sync, NFS probably writes and rewrites each block as David has said, thus resulting in the far larger i/o footprint. But I am far from being a NFS expert, so it would probably make sense to ask one ;) > I have looked at the option > $OMFileFlushOnTXEnd and wonder if it would be a preferred method to > resolve this in the 4.1.1 version (or if it even works there). This makes no sense for v4 (I think it is not even available). Sorry I have no better answer. If get a strace, I'd gladly look at it. Rainer _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

