Aaron,

I am replying on-list so that others can follow (and I hope they have some
advise). I have stripped the sensitive part of your message. In short for all
others: I got a debug log and have now reviewed it. 

> During this debug run we saw NAS write volume increase from a "normal"
> 2-3mbps and 35 iops up to 500 iops and 5.6mbps toward the end of the
> test. I do understand how enabling file sync will increase iops but
> I'm unclear on why the volume of data being written would increase so
> much. If rsyslog has file sync "disabled" then I would expect bursts
> of data and if file sync is "enabled" I would expect the same amount
> of data to be written in much more frequent, but also much smaller
> chunks. NFS could be a culprit here, but I'm curious what NFS
> parameters would impact this behavior.

The debug log does not show anything that looks specifically suspect. The
incoming traffic is intense and the default queue size is overrun quickly,
resulting in voluntary delays of the sender (via tcp). Looking at the
traffic, I think this is perfectly legal and increasing the queue size would
not bring any benefit (I think the end result would just be a larger memory
footprint without any advantage).

I do not see any unusual write behavior. Of course, the debug log could be
more verbose as it does not specifically tell about the syncs (usually, the
log is missing some information you'd like to see for a given instant, but
you can't include everything...). But looking at the code base plus the log
really does not reveal anything suspicious.

One thing you might want to try is doing an strace so that we can look at it
and see which exact OS calls are made and how long they last.

> For the record, changing the sync behavior of rsyslog DOES fix the
> original complaint. When we have file sync disabled we see up to 30
> second pauses between writes to files which get multiple messages per
> second delivered to the rsyslog host. 

There is nothing in the log that points to rsyslog doing these delays.

So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)

> I have looked at the option
> $OMFileFlushOnTXEnd and wonder if it would be a preferred method to
> resolve this in the 4.1.1 version (or if it even works there).

This makes no sense for v4 (I think it is not even available).

Sorry I have no better answer. If get a strace, I'd gladly look at it.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to