in our project, we have our own logging system analogous to syslog.
it is udp based and the writer (after a broadcast) waits for at least 3 acks from other nodes
before going on. so between all the nodes, we have a very reliable
logging scheme. the question is: how does any one node's  log
differ from the 'correct' log?

although in general they are the same, there are enough missing that we built a subsystem to aggregate and search the whole cluster's log files. in quantitative terms, this means for a cluster of 20 nodes, we might lose a few tens of entries per year
(out of 150-200M entries).

On Dec 24, 2009, at 8:51 AM, Leon Towns-von Stauber wrote:


On Dec 23, 2009, at 7:27 PM, Mark McCullough wrote:

I manage the central log server framework for a large set of
servers.  We use UDP.  There is no evidence of significant packet
loss anywhere.  Yes, older networks will have packet loss, be it TCP
or UDP.  But my experience managing a hefty volume of log data is we
just don't see evidence of loss on the network.

[...]

Yes, we periodically review the logs that would be very obvious if
any events are missing, and have yet to find a missing log event.

My experience matches yours. We've had network congestion issues affect other things, but rarely, if ever, have we lost UDP syslog messages due to network issues. I haven't conducted a thorough audit to say that it's never happened, just occasional spot checks, but if it were a problem at all, the missing messages would occasionally have blown one of numerous
multi-message correlations and been noticed, and that has never
happened.

--------------------------------------------------------------------
Leon Towns-von Stauber                  http://www.occam.com/leonvs/
"We have not come to save you, but you will not die in vain!"

_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

------------------
Andrew Hume  (best -> Telework) +1 732-886-1886
[email protected]  (Work) +1 973-360-8651
AT&T Labs - Research; member of USENIX and LOPSA



_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to