[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665412#action_12665412
 ] 

Luca Telloli commented on ZOOKEEPER-276:
----------------------------------------

BookKeeper is a system to reliably log streams of records. In BookKeeper, 
servers are "bookies", log streams are "ledgers", and each unit of a log (aka 
record) is a "ledger entry". BookKeeper is designed to be reliable; bookies, 
the servers that store ledgers can be byzantine, which means that some subset 
of the bookies can fail, corrupt data, discard data, but as long as there are 
enough correctly behaving servers the service as a whole behaves correctly; the 
meta data for BookKeeper is stored in ZooKeeper.

The main motivation for this system comes from the namenode of the Hadoop 
Distributed File System (HDFS). Namenodes have to log operations in a reliable 
fashion so that recovery is possible in the case of failures. Currently, HDFS 
does write-ahead logging on its local storage. This allows the namenode to 
restart after a failure, but it does not allow recovery from a complete machine 
failure. A shared, fault-tolerant write-ahead log is needed so that the 
namenode can be started on another machine and recover state properly. We have 
found the applications for BookKeeper extend beyond HDFS. In fact, any 
application that uses write-ahead logging can take advantage of BookKeeper.

Back to the namenode example, one potential solution is to replicate across a 
number of replicas, and use an agreement protocol (e.g., 3PC) to guarantee that 
operations hit enough replicas. This approach doesn't take full advantage of 
I/O parallelism that can be achieved by striping. We can also take advantage of 
a single writer and immutable ledger entries to simplify the protocol and 
obtain better performance and fault-tolerance guarantees.

The approach we propose is different. We propose a highly-available system that 
receives entries from a client and stores it. Such a system can be used by 
several client systems, and not only HDFS. There are several advantages in 
building such a service:

* We can use hardware that is optimized for such a service. We currently 
believe that such a system has to be optimized only for disk I/O;
* We can have a pool of servers implementing such a log system, and shared 
among a number of servers. BookKeeper is hence a good candidate for a cloud 
service;
* We can have a higher degree of replication with such a pool, which makes 
sense if the hardware necessary for it is cheaper compared to the one the 
application uses.

In this patch, we include the source code of BookKeeper to be included in the 
"contrib" folder of the ZooKeeper distribution. We note that this is still 
experimental code, but the community is free to evaluate, discuss, and 
hopefully contribute.


> Bookkeeper contribution
> -----------------------
>
>                 Key: ZOOKEEPER-276
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-276
>             Project: Zookeeper
>          Issue Type: New Feature
>            Reporter: Luca Telloli
>
> BookKeeper is a system to reliably log streams of records. In BookKeeper, 
> servers are "bookies", log streams are "ledgers", and each unit of a log (aka 
> record) is a "ledger entry". BookKeeper is designed to be reliable; bookies, 
> the servers that store ledgers can be byzantine, which means that some subset 
> of the bookies can fail, corrupt data, discard data, but as long as there are 
> enough correctly behaving servers the service as a whole behaves correctly; 
> the meta data for BookKeeper is stored in ZooKeeper.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to