Miguel Correia commented on ZOOKEEPER-816:

Let me give a longer explanation of the project. Practical experience with 
Zookeeper has shown that sometimes there are failures whose causes are hard to 
understand. Some of these failures may be caused by elusive bugs in the code; 
others may be due to failures rarer than crashes, say corruptions of data 
somewhere in a server.

Zookeeper's traces (i.e., logs in TRACE level) provide some information that 
can be helpful to understand what happened. For instance, they contain 
information about the clients that are connected, the operations issued, etc. 
However, in real deployments with many clients (say, hundreds), traces are 
typically turned off to avoid the high overhead that they cause. Furthermore, 
the data in the traces is probably not enough for our purposes because it does 
not include, e.g., the replies to operations or the data values. 

The project involves 3 subtasks:

1- improve the efficiency of logging

2- improve the traces with additional information needed

3- build the checking tool

> Detecting and diagnosing elusive bugs and faults in Zookeeper
> -------------------------------------------------------------
>                 Key: ZOOKEEPER-816
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
>             Project: Zookeeper
>          Issue Type: New Feature
>            Reporter: Miguel Correia
>            Priority: Minor
> Complex distributed systems like Zookeeper tend to fail in strange ways that 
> are hard to diagnose. The objective is to build a tool that helps understand 
> when and where these problems occurred based on Zookeeper's traces (i.e., 
> logs in TRACE level). Minor changes to the server code will be needed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to