RE: Config for destroying ephemeral node as soon as the client is disconnected

2010-11-16 Thread Fournier, Camille F. [Tech]
We have been discussing various implementation options for doing this, but 
right now this is not possible. If you are interested in this problem take a 
look at https://issues.apache.org/jira/browse/ZOOKEEPER-922, which is one 
proposed solution.

C

-Original Message-
From: Steve Gury [mailto:steve.g...@mimesis-republic.com] 
Sent: Tuesday, November 16, 2010 9:04 AM
To: zookeeper-user@hadoop.apache.org
Subject: Config for destroying ephemeral node as soon as the client is 
disconnected

Hi all,

I would like to customize the config of Zookeeper so that when a client is
disconnected, the server will instantaneously remove all ephemeral nodes
associated with the client.
No wait for session timeout or at least a very small amount of time. But I
would like to keep the feature that when a client is partitioned the server
wait a long amount of time (session timeout) before removing the ephemeral
node of the client.

Scenario 1:
- Server is up
- Client start and connect itself to the server
- Client create some ephemeral node
- Client crash and then the operating system reset the connection
- Server receive the end of connection
- Server remove all ephemeral nodes (no waiting for session timeout)

Scenario 2:
- Server is up
- Client start and connect itself to the server
- Client create some ephemeral node
- Client is unplugged from the network
- Server didn't receive ping from the client
- Server wait session timeout
- Server remove all ephemeral nodes

Is there a way to configure Zookeeper like this ?

Thanks,
Steve


A timeout problem

2010-11-16 Thread 俊贤
Hello everyone ,I met a problem recently in my app.
First an error “2010-11-17 09:45:00,089 INFO  zookeeper.ClientCnxn Client 
session timed out, have not heard from server in 3090ms for sessionid 
0x12c53b185d2000a, closing socket connection and attempting reconnect” occurred.
Then “2010-11-17 09:45:03,204 INFO  zookeeper.ClientCnxn Unable to reconnect to 
ZooKeeper service, session 0x12c53b185d2000a has expired, closing socket 
connection” will come after.
At last the application will went down………..

When this happened, my  machines seem to being transfering some big file each 
other, so I am wondering if the net bandwidth is all occupancied by the file 
transferring which lead to my zkclient’s bandwidth get to be 0??
Thanks for you replying……


This email (including any attachments) is confidential and may be legally 
privileged. If you received this email in error, please delete it immediately 
and do not copy it or use it for any purpose or disclose its contents to any 
other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。


Re: Questions about Hedwig architecture

2010-11-16 Thread amit jaiswal
Hi,

(A bit late reply).

Thanx for the response. This pretty much answers my doubts. I think hedwig has 
lot of potential in terms of a reliable message bus/publish subscribe system in 
the hadoop world which works across data centers (which most of the open source 
products don't consider in the first cut).

More documentation and performance numbers would definitely help in 
understanding it better.

-regards
Amit

- Original Message 
From: Erwin Tam e...@yahoo-inc.com
To: zookeeper-user@hadoop.apache.org
Sent: Fri, 5 November, 2010 4:12:39 AM
Subject: Re: Questions about Hedwig architecture

Hi Amit, sorry for the late responses to your Hedwig questions.  I just
joined the mailing list so hopefully I can shed some light on things.
First, here's a response to a question you posted last month that I
don't think has been answered

 In hedwig, how many publishers can publish to a particular topic
simultaneously. 
 Any concerns / important points on message ordering?

You can have as many concurrent publishers publishing to the same topic.
The guarantee we make is that for each publisher, the messages are in
the same relative order they were published in.  They can be interleaved
with other publishers though.

Here are the responses to your other questions.

1. Yes, each of the Hedwig server hubs has a RegionManager module that
implements the SubscriptionEventListener interface.  So when a hub's
SubscriptionManager takes ownership for a topic the very first time, it
can invoke all of the listeners subscribed.  The RegionManager will then
look up the list of all other regions it knows about (stored in the
ServerConfiguration), and subscribe to that topic using the special
topic subscriber name reserved for hub subscribers.

2. As described above in #1, the Subscription Manager is the one that
knows when the hub takes ownership of a topic.  If subscriber X and Y
both simultaneously subscribe to a topic that the hub was previously not
the master of, the first subscribe call will cause the hub to become the
topic owner/master and trigger all of the registered
SubscriptionEventListeners.  The second subscriber's call will not
trigger this anymore since the hub now has at least one local subscriber
for that topic it has just taken ownership of.

3. Region A will have a hub that itself is a subscriber to topics in all
other regions.  So in this case, Subscriber X in Region A is interested
in topic T.  The hub responsible for topic T in Region A will be a
remote subscriber to topic T in Region B.  If messages are published to
topic T in Region B, they are first sent to all the local subscribers
there and then delivered to all remote subscribers (actually no real
ordering).  The hub in Region A will receive the messages (like any
normal subscriber) and persist them via the PersistenceManager used.
Later on, the hub DeliveryManager will pick up that message and then
push it to all local subscribers.  There is logic to prevent it from
being delivered to the RegionManager's hub subscription since it is the
one that got the message remotely in the first place.

4. The proxy stuff was an initial attempt to create something akin to a
web service call for C++ clients before we implemented a native Hedwig C
++ client.  We didn't want to reproduce the logic in the java client so
the easiest solution would be a proxy type of service that the C++
clients can call which in turn will just call the java client stuff.
This is so the client apps that use Hedwig don't have to run a jvm on
their box.

5. When a Hedwig hub dies, the order of operations is this.  First off,
all of the hubs are registered with ZK as ephemeral nodes to know which
are the live boxes.  If one dies, the ephemeral node goes away and ZK
does not consider that hub as a live one.  There is logic in the client
that caches where it thinks a certain topic's hub master is.  If it
publishes to that hub directly and gets an error, it will redo the
initial handshake protocol where it publishes to the default server
host first.  The default server host was designed to be a vip that
fronted all of the hubs.  The idea is that it will just pick one of the
live hubs at random.  So the client's publish request goes to a random
hub where it sees a publish request for a topic.  This random hub checks
if it is the topic owner, if not, then it looks up who the topic owner
is in ZK.  If nobody is the topic owner (topic is up for grabs now since
the last topic owner hub died), a random one out of all the live hubs is
chosen (weighted by each server's load).  A redirect response is sent
back to the client (unless that particular hub is randomly chosen to be
the topic owner).  The client will then send the publish request to the
redirected hub, and if successful, cache that locally as the hub that is
the owner for the given topic.

6. The current design is that every Hedwig hub knows two things.  First
is where the local ZK server or quorum is (in case we're running