On 2009 Dec 23, at 23:13, Brad Knowles wrote:

> On Dec 23, 2009, at 9:27 PM, Mark McCullough wrote:
> 
>> Yes, we periodically review the logs that would be very obvious if any 
>> events are missing, and have yet to find a missing log event.
> 
> I thought the same thing, while I was managing the network of Internet e-mail 
> gateways for AOL that were using high-speed FDDI connections.  It wasn't 
> until I compared local logs for a slice of time across a selection of 
> gateways to the data that had been recorded on the central log server for 
> those same machines for that same period of time, that I saw the data loss.  
> The network never showed any errors -- no runts, drops, congestion, or 
> anything else (it was FDDI after all), but the data loss was there.  And we 
> were generating gigabytes of mail log traffic per day, even with the 75% data 
> loss.
> 
> Maybe you're not losing any data.  Maybe you are.  You won't know for sure 
> unless you compare local logs that are written at the same time as the 
> network logs for the same data, and then you do a comparison.  And even then, 
> you might not see loss at a particular time, but perhaps there is a loss at 
> another time that you don't see.

Do I know for certain that none of the contributed log events are lost unless I 
tell them to get lost?  No.  Have I looked regularly in areas where log event 
loss of even one event in a hundred would be apparent to a human review?  Yes.  
In addition to those logs, I do periodic spot checks, and if there were 
significant loss, then it would show up in the summary data that I review every 
week because the numbers would be off for the servers I monitor.  I know what 
to expect in various categories for my log events, and if a number changes too 
much (including down), my weekly review will see that. 

Networks today are rather different, and I dare say, much more reliable, than 
they were several years ago.  As someone else suggested, other things than 
network can result in loss such as buffers.  I've seen cases where the local 
syslog daemon failed to write to disk but the remote could capture the event.  
(Writing locally and remotely).  The system was in a highly unusual state and I 
doubt I could replicate that case if I tried, but we had no reason to suspect 
malice in the difference.  That isn't to say that remote is more reliable than 
local, just that one should not presume complete reliability without a lot of 
additional work.

There are a number of techniques to look for evidence of significant loss.  
Some of these include the obvious (periodic alive messages), and the not so 
obvious (log messages with a counter in the message so any message not arriving 
would be obvious).  

It doesn't matter if it is TCP, UDP, or something else that is transmitting the 
logs, you have to decide what your desired reliability threshold is once you 
start seeing evidence of loss.  Since I haven't managed to find such evidence, 
I haven't had to make that decision. 

----
"The speed of communications is wondrous to behold. It is also true that
speed can multiply the distribution of information that we know to be
untrue." Edward R Murrow (1964)

Mark McCullough
[email protected] 


_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to