[ https://issues.apache.org/jira/browse/ZOOKEEPER-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665412#action_12665412 ]
Luca Telloli commented on ZOOKEEPER-276: ---------------------------------------- BookKeeper is a system to reliably log streams of records. In BookKeeper, servers are "bookies", log streams are "ledgers", and each unit of a log (aka record) is a "ledger entry". BookKeeper is designed to be reliable; bookies, the servers that store ledgers can be byzantine, which means that some subset of the bookies can fail, corrupt data, discard data, but as long as there are enough correctly behaving servers the service as a whole behaves correctly; the meta data for BookKeeper is stored in ZooKeeper. The main motivation for this system comes from the namenode of the Hadoop Distributed File System (HDFS). Namenodes have to log operations in a reliable fashion so that recovery is possible in the case of failures. Currently, HDFS does write-ahead logging on its local storage. This allows the namenode to restart after a failure, but it does not allow recovery from a complete machine failure. A shared, fault-tolerant write-ahead log is needed so that the namenode can be started on another machine and recover state properly. We have found the applications for BookKeeper extend beyond HDFS. In fact, any application that uses write-ahead logging can take advantage of BookKeeper. Back to the namenode example, one potential solution is to replicate across a number of replicas, and use an agreement protocol (e.g., 3PC) to guarantee that operations hit enough replicas. This approach doesn't take full advantage of I/O parallelism that can be achieved by striping. We can also take advantage of a single writer and immutable ledger entries to simplify the protocol and obtain better performance and fault-tolerance guarantees. The approach we propose is different. We propose a highly-available system that receives entries from a client and stores it. Such a system can be used by several client systems, and not only HDFS. There are several advantages in building such a service: * We can use hardware that is optimized for such a service. We currently believe that such a system has to be optimized only for disk I/O; * We can have a pool of servers implementing such a log system, and shared among a number of servers. BookKeeper is hence a good candidate for a cloud service; * We can have a higher degree of replication with such a pool, which makes sense if the hardware necessary for it is cheaper compared to the one the application uses. In this patch, we include the source code of BookKeeper to be included in the "contrib" folder of the ZooKeeper distribution. We note that this is still experimental code, but the community is free to evaluate, discuss, and hopefully contribute. > Bookkeeper contribution > ----------------------- > > Key: ZOOKEEPER-276 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-276 > Project: Zookeeper > Issue Type: New Feature > Reporter: Luca Telloli > > BookKeeper is a system to reliably log streams of records. In BookKeeper, > servers are "bookies", log streams are "ledgers", and each unit of a log (aka > record) is a "ledger entry". BookKeeper is designed to be reliable; bookies, > the servers that store ledgers can be byzantine, which means that some subset > of the bookies can fail, corrupt data, discard data, but as long as there are > enough correctly behaving servers the service as a whole behaves correctly; > the meta data for BookKeeper is stored in ZooKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.