On 2009 Dec 23, at 23:13, Brad Knowles wrote: > On Dec 23, 2009, at 9:27 PM, Mark McCullough wrote: > >> Yes, we periodically review the logs that would be very obvious if any >> events are missing, and have yet to find a missing log event. > > I thought the same thing, while I was managing the network of Internet e-mail > gateways for AOL that were using high-speed FDDI connections. It wasn't > until I compared local logs for a slice of time across a selection of > gateways to the data that had been recorded on the central log server for > those same machines for that same period of time, that I saw the data loss. > The network never showed any errors -- no runts, drops, congestion, or > anything else (it was FDDI after all), but the data loss was there. And we > were generating gigabytes of mail log traffic per day, even with the 75% data > loss. > > Maybe you're not losing any data. Maybe you are. You won't know for sure > unless you compare local logs that are written at the same time as the > network logs for the same data, and then you do a comparison. And even then, > you might not see loss at a particular time, but perhaps there is a loss at > another time that you don't see.
Do I know for certain that none of the contributed log events are lost unless I tell them to get lost? No. Have I looked regularly in areas where log event loss of even one event in a hundred would be apparent to a human review? Yes. In addition to those logs, I do periodic spot checks, and if there were significant loss, then it would show up in the summary data that I review every week because the numbers would be off for the servers I monitor. I know what to expect in various categories for my log events, and if a number changes too much (including down), my weekly review will see that. Networks today are rather different, and I dare say, much more reliable, than they were several years ago. As someone else suggested, other things than network can result in loss such as buffers. I've seen cases where the local syslog daemon failed to write to disk but the remote could capture the event. (Writing locally and remotely). The system was in a highly unusual state and I doubt I could replicate that case if I tried, but we had no reason to suspect malice in the difference. That isn't to say that remote is more reliable than local, just that one should not presume complete reliability without a lot of additional work. There are a number of techniques to look for evidence of significant loss. Some of these include the obvious (periodic alive messages), and the not so obvious (log messages with a counter in the message so any message not arriving would be obvious). It doesn't matter if it is TCP, UDP, or something else that is transmitting the logs, you have to decide what your desired reliability threshold is once you start seeing evidence of loss. Since I haven't managed to find such evidence, I haven't had to make that decision. ---- "The speed of communications is wondrous to behold. It is also true that speed can multiply the distribution of information that we know to be untrue." Edward R Murrow (1964) Mark McCullough [email protected] _______________________________________________ Tech mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
