Re: Questions about Hedwig architecture

2010-11-16 Thread amit jaiswal
Hi,

(A bit late reply).

Thanx for the response. This pretty much answers my doubts. I think hedwig has 
lot of potential in terms of a reliable message bus/publish subscribe system in 
the hadoop world which works across data centers (which most of the open source 
products don't consider in the first cut).

More documentation and performance numbers would definitely help in 
understanding it better.

-regards
Amit

- Original Message 
From: Erwin Tam e...@yahoo-inc.com
To: zookeeper-user@hadoop.apache.org
Sent: Fri, 5 November, 2010 4:12:39 AM
Subject: Re: Questions about Hedwig architecture

Hi Amit, sorry for the late responses to your Hedwig questions.  I just
joined the mailing list so hopefully I can shed some light on things.
First, here's a response to a question you posted last month that I
don't think has been answered

 In hedwig, how many publishers can publish to a particular topic
simultaneously. 
 Any concerns / important points on message ordering?

You can have as many concurrent publishers publishing to the same topic.
The guarantee we make is that for each publisher, the messages are in
the same relative order they were published in.  They can be interleaved
with other publishers though.

Here are the responses to your other questions.

1. Yes, each of the Hedwig server hubs has a RegionManager module that
implements the SubscriptionEventListener interface.  So when a hub's
SubscriptionManager takes ownership for a topic the very first time, it
can invoke all of the listeners subscribed.  The RegionManager will then
look up the list of all other regions it knows about (stored in the
ServerConfiguration), and subscribe to that topic using the special
topic subscriber name reserved for hub subscribers.

2. As described above in #1, the Subscription Manager is the one that
knows when the hub takes ownership of a topic.  If subscriber X and Y
both simultaneously subscribe to a topic that the hub was previously not
the master of, the first subscribe call will cause the hub to become the
topic owner/master and trigger all of the registered
SubscriptionEventListeners.  The second subscriber's call will not
trigger this anymore since the hub now has at least one local subscriber
for that topic it has just taken ownership of.

3. Region A will have a hub that itself is a subscriber to topics in all
other regions.  So in this case, Subscriber X in Region A is interested
in topic T.  The hub responsible for topic T in Region A will be a
remote subscriber to topic T in Region B.  If messages are published to
topic T in Region B, they are first sent to all the local subscribers
there and then delivered to all remote subscribers (actually no real
ordering).  The hub in Region A will receive the messages (like any
normal subscriber) and persist them via the PersistenceManager used.
Later on, the hub DeliveryManager will pick up that message and then
push it to all local subscribers.  There is logic to prevent it from
being delivered to the RegionManager's hub subscription since it is the
one that got the message remotely in the first place.

4. The proxy stuff was an initial attempt to create something akin to a
web service call for C++ clients before we implemented a native Hedwig C
++ client.  We didn't want to reproduce the logic in the java client so
the easiest solution would be a proxy type of service that the C++
clients can call which in turn will just call the java client stuff.
This is so the client apps that use Hedwig don't have to run a jvm on
their box.

5. When a Hedwig hub dies, the order of operations is this.  First off,
all of the hubs are registered with ZK as ephemeral nodes to know which
are the live boxes.  If one dies, the ephemeral node goes away and ZK
does not consider that hub as a live one.  There is logic in the client
that caches where it thinks a certain topic's hub master is.  If it
publishes to that hub directly and gets an error, it will redo the
initial handshake protocol where it publishes to the default server
host first.  The default server host was designed to be a vip that
fronted all of the hubs.  The idea is that it will just pick one of the
live hubs at random.  So the client's publish request goes to a random
hub where it sees a publish request for a topic.  This random hub checks
if it is the topic owner, if not, then it looks up who the topic owner
is in ZK.  If nobody is the topic owner (topic is up for grabs now since
the last topic owner hub died), a random one out of all the live hubs is
chosen (weighted by each server's load).  A redirect response is sent
back to the client (unless that particular hub is randomly chosen to be
the topic owner).  The client will then send the publish request to the
redirected hub, and if successful, cache that locally as the hub that is
the owner for the given topic.

6. The current design is that every Hedwig hub knows two things.  First
is where the local ZK server or quorum is (in case we're running

Re: Questions about Hedwig architecture

2010-11-04 Thread Erwin Tam
Hi Amit, sorry for the late responses to your Hedwig questions.  I just
joined the mailing list so hopefully I can shed some light on things.
First, here's a response to a question you posted last month that I
don't think has been answered

 In hedwig, how many publishers can publish to a particular topic
simultaneously. 
 Any concerns / important points on message ordering?

You can have as many concurrent publishers publishing to the same topic.
The guarantee we make is that for each publisher, the messages are in
the same relative order they were published in.  They can be interleaved
with other publishers though.

Here are the responses to your other questions.

1. Yes, each of the Hedwig server hubs has a RegionManager module that
implements the SubscriptionEventListener interface.  So when a hub's
SubscriptionManager takes ownership for a topic the very first time, it
can invoke all of the listeners subscribed.  The RegionManager will then
look up the list of all other regions it knows about (stored in the
ServerConfiguration), and subscribe to that topic using the special
topic subscriber name reserved for hub subscribers.

2. As described above in #1, the Subscription Manager is the one that
knows when the hub takes ownership of a topic.  If subscriber X and Y
both simultaneously subscribe to a topic that the hub was previously not
the master of, the first subscribe call will cause the hub to become the
topic owner/master and trigger all of the registered
SubscriptionEventListeners.  The second subscriber's call will not
trigger this anymore since the hub now has at least one local subscriber
for that topic it has just taken ownership of.

3. Region A will have a hub that itself is a subscriber to topics in all
other regions.  So in this case, Subscriber X in Region A is interested
in topic T.  The hub responsible for topic T in Region A will be a
remote subscriber to topic T in Region B.  If messages are published to
topic T in Region B, they are first sent to all the local subscribers
there and then delivered to all remote subscribers (actually no real
ordering).  The hub in Region A will receive the messages (like any
normal subscriber) and persist them via the PersistenceManager used.
Later on, the hub DeliveryManager will pick up that message and then
push it to all local subscribers.  There is logic to prevent it from
being delivered to the RegionManager's hub subscription since it is the
one that got the message remotely in the first place.

4. The proxy stuff was an initial attempt to create something akin to a
web service call for C++ clients before we implemented a native Hedwig C
++ client.  We didn't want to reproduce the logic in the java client so
the easiest solution would be a proxy type of service that the C++
clients can call which in turn will just call the java client stuff.
This is so the client apps that use Hedwig don't have to run a jvm on
their box.

5. When a Hedwig hub dies, the order of operations is this.  First off,
all of the hubs are registered with ZK as ephemeral nodes to know which
are the live boxes.  If one dies, the ephemeral node goes away and ZK
does not consider that hub as a live one.  There is logic in the client
that caches where it thinks a certain topic's hub master is.  If it
publishes to that hub directly and gets an error, it will redo the
initial handshake protocol where it publishes to the default server
host first.  The default server host was designed to be a vip that
fronted all of the hubs.  The idea is that it will just pick one of the
live hubs at random.  So the client's publish request goes to a random
hub where it sees a publish request for a topic.  This random hub checks
if it is the topic owner, if not, then it looks up who the topic owner
is in ZK.  If nobody is the topic owner (topic is up for grabs now since
the last topic owner hub died), a random one out of all the live hubs is
chosen (weighted by each server's load).  A redirect response is sent
back to the client (unless that particular hub is randomly chosen to be
the topic owner).  The client will then send the publish request to the
redirected hub, and if successful, cache that locally as the hub that is
the owner for the given topic.

6. The current design is that every Hedwig hub knows two things.  First
is where the local ZK server or quorum is (in case we're running
replicated ZK).  Second, it knows where the default server host/VIP for
the hubs in all other regions are.  This is stored in the
ServerConfiguration.  ZK will then be the central place that knows about
all the Hedwig hubs in a given region.  It knows what topics there are,
who is the master of which topic, what load each of the hubs currently
has, etc..  Note that it doesn't know anything about other regions.  

The Hedwig client knows only one thing, where the region's default
server host is.  This is so it can contact one of the hubs which in turn
can redirect it to whoever the appropriate hub is that owns