(A bit late reply).
Thanx for the response. This pretty much answers my doubts. I think hedwig has
lot of potential in terms of a reliable message bus/publish subscribe system in
the hadoop world which works across data centers (which most of the open source
products don't consider in the first cut).
More documentation and performance numbers would definitely help in
understanding it better.
----- Original Message ----
From: Erwin Tam <e...@yahoo-inc.com>
Sent: Fri, 5 November, 2010 4:12:39 AM
Subject: Re: Questions about Hedwig architecture
Hi Amit, sorry for the late responses to your Hedwig questions. I just
joined the mailing list so hopefully I can shed some light on things.
First, here's a response to a question you posted last month that I
don't think has been answered
> In hedwig, how many publishers can publish to a particular topic
> Any concerns / important points on message ordering?
You can have as many concurrent publishers publishing to the same topic.
The guarantee we make is that for each publisher, the messages are in
the same relative order they were published in. They can be interleaved
with other publishers though.
Here are the responses to your other questions.
1. Yes, each of the Hedwig server hubs has a RegionManager module that
implements the SubscriptionEventListener interface. So when a hub's
SubscriptionManager takes ownership for a topic the very first time, it
can invoke all of the listeners subscribed. The RegionManager will then
look up the list of all other regions it knows about (stored in the
ServerConfiguration), and subscribe to that topic using the "special"
topic subscriber name reserved for hub subscribers.
2. As described above in #1, the Subscription Manager is the one that
knows when the hub takes ownership of a topic. If subscriber X and Y
both simultaneously subscribe to a topic that the hub was previously not
the master of, the first subscribe call will cause the hub to become the
topic owner/master and trigger all of the registered
SubscriptionEventListeners. The second subscriber's call will not
trigger this anymore since the hub now has at least one local subscriber
for that topic it has just taken ownership of.
3. Region A will have a hub that itself is a subscriber to topics in all
other regions. So in this case, Subscriber X in Region A is interested
in topic T. The hub responsible for topic T in Region A will be a
remote subscriber to topic T in Region B. If messages are published to
topic T in Region B, they are first sent to all the local subscribers
there and then delivered to all remote subscribers (actually no real
ordering). The hub in Region A will receive the messages (like any
normal subscriber) and persist them via the PersistenceManager used.
Later on, the hub DeliveryManager will pick up that message and then
push it to all local subscribers. There is logic to prevent it from
being delivered to the RegionManager's hub subscription since it is the
one that "got" the message remotely in the first place.
4. The proxy stuff was an initial attempt to create something akin to a
web service call for C++ clients before we implemented a native Hedwig C
++ client. We didn't want to reproduce the logic in the java client so
the easiest solution would be a proxy type of service that the C++
clients can call which in turn will just call the java client stuff.
This is so the client apps that use Hedwig don't have to run a jvm on
5. When a Hedwig hub dies, the order of operations is this. First off,
all of the hubs are registered with ZK as ephemeral nodes to know which
are the "live" boxes. If one dies, the ephemeral node goes away and ZK
does not consider that hub as a live one. There is logic in the client
that caches where it thinks a certain topic's hub master is. If it
publishes to that hub directly and gets an error, it will redo the
initial "handshake" protocol where it publishes to the "default server
host" first. The default server host was designed to be a vip that
fronted all of the hubs. The idea is that it will just pick one of the
live hubs at random. So the client's publish request goes to a random
hub where it sees a publish request for a topic. This random hub checks
if it is the topic owner, if not, then it looks up who the topic owner
is in ZK. If nobody is the topic owner (topic is up for grabs now since
the last topic owner hub died), a random one out of all the live hubs is
chosen (weighted by each server's load). A redirect response is sent
back to the client (unless that particular hub is randomly chosen to be
the topic owner). The client will then send the publish request to the
redirected hub, and if successful, cache that locally as the hub that is
the owner for the given topic.
6. The current design is that every Hedwig hub knows two things. First
is where the local ZK server or quorum is (in case we're running
replicated ZK). Second, it knows where the default server host/VIP for
the hubs in all other regions are. This is stored in the
ServerConfiguration. ZK will then be the central place that knows about
all the Hedwig hubs in a given region. It knows what topics there are,
who is the master of which topic, what load each of the hubs currently
has, etc.. Note that it doesn't know anything about other regions.
The Hedwig client knows only one thing, where the region's default
server host is. This is so it can contact one of the hubs which in turn
can redirect it to whoever the appropriate hub is that owns the topic.
We didn't want the clients to directly contact ZK for this info to
reduce load on the ZK server quorum. It also reduces some complexity on
the client so we don't have to worry about caching the server hubs info,
staleness, and a refresh policy. The only point of needing a VIP is not
as a load balancer but as a hardware way of knowing which server hubs
are still alive. We have a few ideas to make this part of the setup
configuration easier and more elegant since not everyone has access to a
hardware VIP server.
Hope this makes sense!
On Thu, 2010-11-04 at 08:02 -0700, Mahadev Konar wrote:
> Flavio, Ben, Adam, Erwin?
> Can you guys please respond on zookeeper-user mailing list?
> On 11/2/10 11:06 PM, "amit jaiswal" <amit_...@yahoo.com> wrote:
> > Hi,
> > I am trying to understand hedwig. I tried reading the documentation
> > dev.txt along with the code but
> > still some design aspects are not clear.
> > Can someone please tell the following:
> > (Lets say there are 2 regions A and B)
> > 1. When a subscriber X subscribes to topic T in region A, then does
> > RegionManager automatically adds a subscription (with id = __A) to topic T
> > B.
> > The RegionManager class has couple of callbacks and I was not able to
> > understand
> > it properly.
> > 2. What happens when X and Y in region A subscribe to topic T. Does
> > RegionManager tries to do separate subscription for X and Y in B? Since the
> > RegionManager uses a static subscriber Id, the second subscription request
> > will
> > be considered duplicate.
> > 3. How does X gets messages from region B? The RegionManager callbacks are
> > confusing and I was not able to understand.
> > 4. What is the purpose of org.apache.hedwig.server.proxy package classes
> > (HedwigProxy etc.). There is no documentation to explain the same.
> > 5. What happens when one of the hub dies. The publisher will try to contact
> > another hub? But what about the subscribers? Do they need to do any error
> > handling / recovery?
> > 6. Hedwig architecture mandates the need for a load balancer. As per my
> > understanding it is required because the zk instances of different regions
> > not shared. I would expect all hosts information to be maintained in zk, and
> > even for cross colo, the information should be shared through zk (may be
> > requires SSL support in zk).
> > -regards
> > Amit