Re: Questions about Hedwig architecture
replicated ZK). Second, it knows where the default server host/VIP for the hubs in all other regions are. This is stored in the ServerConfiguration. ZK will then be the central place that knows about all the Hedwig hubs in a given region. It knows what topics there are, who is the master of which topic, what load each of the hubs currently has, etc.. Note that it doesn't know anything about other regions. The Hedwig client knows only one thing, where the region's default server host is. This is so it can contact one of the hubs which in turn can redirect it to whoever the appropriate hub is that owns the topic. We didn't want the clients to directly contact ZK for this info to reduce load on the ZK server quorum. It also reduces some complexity on the client so we don't have to worry about caching the server hubs info, staleness, and a refresh policy. The only point of needing a VIP is not as a load balancer but as a hardware way of knowing which server hubs are still alive. We have a few ideas to make this part of the setup configuration easier and more elegant since not everyone has access to a hardware VIP server. Hope this makes sense! Erwin On Thu, 2010-11-04 at 08:02 -0700, Mahadev Konar wrote: Flavio, Ben, Adam, Erwin? Can you guys please respond on zookeeper-user mailing list? Thanks mahadev On 11/2/10 11:06 PM, amit jaiswal amit_...@yahoo.com wrote: Hi, I am trying to understand hedwig. I tried reading the documentation user.txt, dev.txt along with the code but still some design aspects are not clear. Can someone please tell the following: (Lets say there are 2 regions A and B) 1. When a subscriber X subscribes to topic T in region A, then does RegionManager automatically adds a subscription (with id = __A) to topic T in B. The RegionManager class has couple of callbacks and I was not able to understand it properly. 2. What happens when X and Y in region A subscribe to topic T. Does RegionManager tries to do separate subscription for X and Y in B? Since the RegionManager uses a static subscriber Id, the second subscription request will be considered duplicate. 3. How does X gets messages from region B? The RegionManager callbacks are bit confusing and I was not able to understand. 4. What is the purpose of org.apache.hedwig.server.proxy package classes (HedwigProxy etc.). There is no documentation to explain the same. 5. What happens when one of the hub dies. The publisher will try to contact another hub? But what about the subscribers? Do they need to do any error handling / recovery? 6. Hedwig architecture mandates the need for a load balancer. As per my understanding it is required because the zk instances of different regions is not shared. I would expect all hosts information to be maintained in zk, and even for cross colo, the information should be shared through zk (may be that requires SSL support in zk). -regards Amit
Questions about Hedwig architecture
Hi, I am trying to understand hedwig. I tried reading the documentation user.txt, dev.txt along with the code but still some design aspects are not clear. Can someone please tell the following: (Lets say there are 2 regions A and B) 1. When a subscriber X subscribes to topic T in region A, then does RegionManager automatically adds a subscription (with id = __A) to topic T in B. The RegionManager class has couple of callbacks and I was not able to understand it properly. 2. What happens when X and Y in region A subscribe to topic T. Does RegionManager tries to do separate subscription for X and Y in B? Since the RegionManager uses a static subscriber Id, the second subscription request will be considered duplicate. 3. How does X gets messages from region B? The RegionManager callbacks are bit confusing and I was not able to understand. 4. What is the purpose of org.apache.hedwig.server.proxy package classes (HedwigProxy etc.). There is no documentation to explain the same. 5. What happens when one of the hub dies. The publisher will try to contact another hub? But what about the subscribers? Do they need to do any error handling / recovery? 6. Hedwig architecture mandates the need for a load balancer. As per my understanding it is required because the zk instances of different regions is not shared. I would expect all hosts information to be maintained in zk, and even for cross colo, the information should be shared through zk (may be that requires SSL support in zk). -regards Amit
[hedwig] Can multiple publishers publish to the same topic simultaneously
Hi, In hedwig, how many publishers can publish to a particular topic simultaneously. Any concerns / important points on message ordering? -regards Amit
Re: Is it possible to read/write a ledger concurrently
Hi, How does Hedwig handles this scenario? Since only one of the hubs have the ownership of a topic, the same hub is able to serve both publish and subscribe requests concurrently. Is my understanding correct ? Also, what is the purpose of ReadAheadCache class in Hedwig? Is it used somewhere for this concurrent read/write problem? -regards Amit - Original Message From: Benjamin Reed br...@yahoo-inc.com To: zookeeper-user@hadoop.apache.org Sent: Fri, 22 October, 2010 11:09:07 AM Subject: Re: Is it possible to read/write a ledger concurrently currently program1 can read and write to an open ledger, but program2 must wait for the ledger to be closed before doing the read. the problem is that program2 needs to know the last valid entry in the ledger. (there may be entries that may not yet be valid.) for performance reasons, only program1 knows the end. so you need a way to propagate that information. we have talked about a way to push the last entry into the bookkeeper handle. flavio was working on it, but i don't think it has been implemented. ben On 10/21/2010 10:22 PM, amit jaiswal wrote: Hi, In BookKeeper documentation, the sample program creates a ledger, writes some entries and then *closes* the ledger. Then a client program opens the ledger, and reads the entries from it. Is it possible for program1 to write to a ledger, and program2 to read from the ledger at the same time. In BookKeeper code, if a client tries to read from a ledger which is not being closed (as per its metadata in zk), then a recovery process is started to check for consistency. Waiting for ledger to get closed can introduce lot of latency at the client side. Can somebody explain this functionality? -regards Amit
Question on production readiness, deployment, data of BookKeeper / Hedwig
Hi, In Hedwig talk (http://vimeo.com/13282102), it was mentioned that the primary use case for Hedwig comes from the distributed key-value store PNUTS in Yahoo!, but also said that the work is new. Could you please about the following: Production readiness / Deployment 1. What is the production readiness of Hedwig / BookKeeper. Is it being used anywhere (like in PNUTS)? 2. Is Hedwig designed to use as a generic message bus or only for multi-datacenter operations? 3. Hedwig installation and deployment is done through a script hw.bash, but that is difficult to use especially in a production environment. Are there any other packages available that can simplify the deployment of hedwig. 4. How does BK/Hedwig handle zookeeper session expiry? Data Deletion, Handling data loss, Quorum 1. Does BookKeeper support deletion of old log entries which have been consumed. 2. How does Hedwig handles the case when all subscribers have consumed all the messages. In the talk, it was said that a subscriber can come back after hours, days or weeks. Is there any data retention / expiration policy for the data that is published? 3. How does Hedwig handles data loss? There is a replication factor, and a write operation must be accepted by majority of the bookies, but how data conflicts are handled? Is there any possibility of data conflict at all? Is the replication only for recovery? When the hub is reading data from bookies, does it reads from all the bookies to satisfy quorum read? Code What is the difference between PubSubServer, HedwigSubscriber, HedwigHubSubscriber. Is there any HelloWorld program that simply illustrates how to instantiate a hedwig client, and publish/consume messages. (HedwigBenchmark class is helpful, but was looking something like API documentation). -regards Amit
Re: BookKeeper newbie question
Hi Flavio, I am using zookeeper 3.2.2. The documentation on apache (http://hadoop.apache.org/zookeeper/docs/r3.3.0/bookkeeperStarted.html) refers to example that uses LedgerSequence class. Basically I am trying to use BookKeeper / Hedwig and see if they can be used as a reliable message bus. Hedwig has only a video (http://vimeo.com/13282102) that explains the system overview. But installation is non-trivial. I tried building from trunk, but trunk version is 3.4 and hedwig bundles zk 3.2 code within itself. In short, was not able to build/install hedwig. 1. Could you please give some pointers for hedwig. It seems that BookKeeper APIs need higher level abstraction which Hedwig can provide. 2. How does bookkeeper handles zk session expiry? To be honest, zk itself should come up with a solution to recover from session expiry. Found this blog : http://sna-projects.com/blog/2010/08/zookeeper-experience/ which actually lists all the issues that I also faced while working with zk. 3. Any comments on '“Low Latency Message Bus With Scribe and HDFS : http://sna-projects.com/blog/2010/09/scribe-and-hdfs/ and how that compares to BookKeeper / Hedwig? -regards Amit - Original Message From: Flavio Junqueira f...@yahoo-inc.com To: zookeeper-user@hadoop.apache.org zookeeper-user@hadoop.apache.org Sent: Fri, 1 October, 2010 2:37:35 PM Subject: Re: BookKeeper newbie question Thanks for your questions, Amit. On Sep 28, 2010, at 6:37 PM, amit jaiswal wrote: Hi, I am experimenting with BookKeeper and have a question on LedgerHandler class. The readEntries(firstEntry, lastEntry) method takes the indexes of first and last entries. Also, the LedgerSequence object returned has method hasMoreElements(). Which version are you using? I don't think we have LedgerSequence any longer. Question: 1. How does a client knows the index of the last entry? I was expecting clients to make a call like readEntries(0, Integer.MAX_INT) and the hasMoreElements() to return false the moment there are no more entries. Am I missing something in the way the API is supposed to be used? I believe you should use public long getLastAddConfirmed(). 2. The LedgerSequence.hasMoreElements() returns true (even if there are no more entries), and the nextEntry returns null. readEntries currently return EnumerationLedgerEntry, but I just noticed that the documentation is not correct, so I'll open a jira to fix it. -Flavio
Re: BookKeeper newbie question
Hi, After going through the hedwig/doc section, I was able to run hedwig server and sample client (though still there are issues). Any more documentation that refers to the APIs will be helpful. -regards Amit - Original Message From: amit jaiswal amit_...@yahoo.com To: zookeeper-user@hadoop.apache.org Sent: Sat, 2 October, 2010 1:08:54 PM Subject: Re: BookKeeper newbie question Hi Flavio, I am using zookeeper 3.2.2. The documentation on apache (http://hadoop.apache.org/zookeeper/docs/r3.3.0/bookkeeperStarted.html) refers to example that uses LedgerSequence class. Basically I am trying to use BookKeeper / Hedwig and see if they can be used as a reliable message bus. Hedwig has only a video (http://vimeo.com/13282102) that explains the system overview. But installation is non-trivial. I tried building from trunk, but trunk version is 3.4 and hedwig bundles zk 3.2 code within itself. In short, was not able to build/install hedwig. 1. Could you please give some pointers for hedwig. It seems that BookKeeper APIs need higher level abstraction which Hedwig can provide. 2. How does bookkeeper handles zk session expiry? To be honest, zk itself should come up with a solution to recover from session expiry. Found this blog : http://sna-projects.com/blog/2010/08/zookeeper-experience/ which actually lists all the issues that I also faced while working with zk. 3. Any comments on '“Low Latency Message Bus With Scribe and HDFS : http://sna-projects.com/blog/2010/09/scribe-and-hdfs/ and how that compares to BookKeeper / Hedwig? -regards Amit - Original Message From: Flavio Junqueira f...@yahoo-inc.com To: zookeeper-user@hadoop.apache.org zookeeper-user@hadoop.apache.org Sent: Fri, 1 October, 2010 2:37:35 PM Subject: Re: BookKeeper newbie question Thanks for your questions, Amit. On Sep 28, 2010, at 6:37 PM, amit jaiswal wrote: Hi, I am experimenting with BookKeeper and have a question on LedgerHandler class. The readEntries(firstEntry, lastEntry) method takes the indexes of first and last entries. Also, the LedgerSequence object returned has method hasMoreElements(). Which version are you using? I don't think we have LedgerSequence any longer. Question: 1. How does a client knows the index of the last entry? I was expecting clients to make a call like readEntries(0, Integer.MAX_INT) and the hasMoreElements() to return false the moment there are no more entries. Am I missing something in the way the API is supposed to be used? I believe you should use public long getLastAddConfirmed(). 2. The LedgerSequence.hasMoreElements() returns true (even if there are no more entries), and the nextEntry returns null. readEntries currently return EnumerationLedgerEntry, but I just noticed that the documentation is not correct, so I'll open a jira to fix it. -Flavio
Re: BookKeeper newbie question
Hi, Can somebody please answer this query. -regards Amit - Original Message From: amit jaiswal amit_...@yahoo.com To: zookeeper-user@hadoop.apache.org Sent: Tue, 28 September, 2010 10:07:33 PM Subject: BookKeeper newbie question Hi, I am experimenting with BookKeeper and have a question on LedgerHandler class. The readEntries(firstEntry, lastEntry) method takes the indexes of first and last entries. Also, the LedgerSequence object returned has method hasMoreElements(). Question: 1. How does a client knows the index of the last entry? I was expecting clients to make a call like readEntries(0, Integer.MAX_INT) and the hasMoreElements() to return false the moment there are no more entries. Am I missing something in the way the API is supposed to be used? 2. The LedgerSequence.hasMoreElements() returns true (even if there are no more entries), and the nextEntry returns null. Can someone please clarify the correct semantics of these APIs. -amit