Thank you everyone for your responses - rather than reply to each
post, here is some feedback.

The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.

Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise. The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.

I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.

I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.

Thanks again for all the information.

Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to