Thanks a lot Cameron. That makes sense a lot. So If I need to do it
properly I should have my executors declare like this at the top of my
class -
private static ExecutorService service =
Executors.newFixedThreadPool(15);
and then use the above "ExecutorService" like below -
cache.getListenable().addListener(listener, service);
Am I got everything right with the above code?
On Wed, Jul 23, 2014 at 2:49 PM, Cameron McKenzie <[email protected]>
wrote:
> When you call cache.getListenable().addListener(listener), if you don't
> provide an executor service, then it will use the default Curator one,
> which will only have a single thread. This means that your listener will
> only be called by a single thread, so if you do anything in that listener
> that blocks (i.e restarting the server) then you're not going to get any
> other events until after your blocking event has finished.
>
> You could either provide an executor that allows more than a single thread
> (and thus your childEvent method could get called concurrently for
> different events), or you could use your own executor to do the restarts,
> which would allow the curator event thread to keep processing stuff.
>
>
> On Thu, Jul 24, 2014 at 6:33 AM, Check Peck <[email protected]>
> wrote:
>
>> I am using Curator library for Zookeeper. I am using zookeeper to monitor
>> whether my app servers are up or not. If they are not up or shut down, then
>> bring them up. I need to keep a watch on three of my znodes on the
>> zookeeper. I am keeping a watch on ("/test/proc/phx/server",
>> ("/test/proc/slc/server") and ("/test/proc/lvs/server"))I have a znode
>> structure like below.
>>
>> /test/proc
>> /phx
>> /server
>> /h1
>> /h2
>> /h3
>> /h4
>> /h5
>> /slc
>> /server
>> /h1
>> /h2
>> /h3
>> /h4
>> /h5
>> /lvs
>> /server
>> /h1
>> /h2
>> /h3
>> /h4
>> /h5
>>
>> As you can see above, for "/test/proc/phx/server", we have 5 hosts
>> starting with "h", similary for slc and lvs as well. And all those hosts
>> starting with "h" are ephimeral znodes. Now as soon as any server dies,
>> let's say for PHX, h4 machine went down, then the "h4" ephemeral znodes
>> gets deleted from the "/test/proc/phx/server" and then I will try to
>> re-start h4 machine on PHX datacenter. Similarly with SLC and LVS.
>>
>> Below is my code by which I am keeping a watch and re-starting the
>> servers if any machine went down in any datacenters. With the below code
>> what I am seeing is, suppose if three machine went down in same datacenter,
>> then it restart those three one by one. Meaning let's say h1, h3, h5 went
>> down in PHX datacenter, then first it will restart h1 and as soon as h1 is
>> done, then it will restart h3 and then h5. So it is always waiting for one
>> to get finished and then restart another host. I am not sure why? Those
>> three should be restarted instantly right since it's a background thread ?
>>
>> And also sometimes what I am seeing if all the hosts went down instantly
>> then it doesn't restart anything? May be thread is getting stuck? Does my
>> below code looks right with the way I am keeping a watch on three
>> Datacenters PHX ("/test/proc/phx/server"), SLC("/test/proc/slc/server") and
>> LVS("/test/proc/lvs/server")
>>
>> List<String> datacenters = Arrays.asList("PHX", "SLC", "LVS");
>> for (String dc : datacenters) {
>> // in this example we will cache data. Notice that this is
>> optional.
>> PathChildrenCache cache = new
>> PathChildrenCache(zookClient.getClient(), "/test/proc" + "/" + dc + "/" +
>> "server", true);
>> cache.start();
>>
>> addListener(cache);
>> }
>>
>> private static void addListener(PathChildrenCache cache) {
>>
>> PathChildrenCacheListener listener = new
>> PathChildrenCacheListener() {
>> public void childEvent(CuratorFramework client,
>> PathChildrenCacheEvent event) throws Exception {
>> switch (event.getType()) {
>> case CHILD_ADDED: {
>> if (zookClient.isLeader()) {
>> String path =
>> ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
>> String node =
>> ZKPaths.getNodeFromPath(event.getData().getPath());
>> String datacenter = path.split("/")[3];
>>
>> System.out.println("Node added: Path= ", path, ",
>> Actual Node= ", node, ", Datacenter= ", datacenter);
>>
>> break;
>> }
>> }
>>
>> case CHILD_UPDATED: {
>> if (zookClient.isLeader()) {
>> String path =
>> ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
>> String node =
>> ZKPaths.getNodeFromPath(event.getData().getPath());
>> String datacenter = path.split("/")[3];
>>
>> System.out.println("Node updated: Path= ", path,
>> ", Actual Node= ", node, ", Datacenter= ", datacenter);
>>
>> break;
>> }
>> }
>>
>> case CHILD_REMOVED: {
>> if (zookClient.isLeader()) {
>> String path =
>> ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
>> String node =
>> ZKPaths.getNodeFromPath(event.getData().getPath());
>> String datacenter = path.split("/")[3];
>>
>> System.out.println("Node removed: Path= ", path,
>> ", Actual Node= ", node, ", Datacenter= ", datacenter);
>>
>> // restart machine which goes down
>> // I am assuming as soon as any machine went
>> down, call will come here instantly without waiting for anything?
>>
>> break;
>> }
>> }
>> default:
>> break;
>>
>> }
>> }
>> };
>> cache.getListenable().addListener(listener);
>> }
>>
>
>