Re: How to handle Node does not exist error?
Try using the logs, stat command or JMX to verify that each ZK server is indeed a leader/follower as expected. You should have one leader and n-1 followers. Verify that you don't have any standalone servers (this is the most frequent error I see - misconfiguration of a server such that it thinks it's a standalone server; I often see where a user has 3 standalone servers which they think is a single quorum, all of the servers will therefore be inconsistent to each other). Patrick On 08/12/2010 05:42 PM, Ted Dunning wrote: On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao Heh...@softtouchit.com wrote: hi, Ted, I am a little bit confused here. So, is the node inconsistency problem that Vishal and I have seen here most likely caused by configurations or embedding? If it is the former, I'd appreciate if you can point out where those silly mistakes have been made and the correct way to embed ZK. I think it is likely due to misconfiguration, but I don't know what the issue is exactly. I think that another poster suggested that you ape the normal ZK startup process more closely. That sounds good but it may be incompatible with your goals of integrating all configuration into a single XML file and not using the normal ZK configuration process. Your thought about forking ZK is a good one since there are calls to System.exit() that could wreak havoc. Although I agree with your comments about the architectural issues that embedding may lead to and we are aware of those, I do not agree that embedding will always lead to those issues. I agree that embedding won't always lead to those issues and your application is a reasonable counter-example. As is common, I think that the exception proves the rule since your system is really just another way to launch an independent ZK cluster rather than an example of ZK being embedded into an application.
Re: How to handle Node does not exist error?
In my case, I am pretty sure that the configuration was right. I will reproduce it and post more info later. Thanks. On Mon, Aug 16, 2010 at 1:08 PM, Patrick Hunt ph...@apache.org wrote: Try using the logs, stat command or JMX to verify that each ZK server is indeed a leader/follower as expected. You should have one leader and n-1 followers. Verify that you don't have any standalone servers (this is the most frequent error I see - misconfiguration of a server such that it thinks it's a standalone server; I often see where a user has 3 standalone servers which they think is a single quorum, all of the servers will therefore be inconsistent to each other). Patrick On 08/12/2010 05:42 PM, Ted Dunning wrote: On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao Heh...@softtouchit.com wrote: hi, Ted, I am a little bit confused here. So, is the node inconsistency problem that Vishal and I have seen here most likely caused by configurations or embedding? If it is the former, I'd appreciate if you can point out where those silly mistakes have been made and the correct way to embed ZK. I think it is likely due to misconfiguration, but I don't know what the issue is exactly. I think that another poster suggested that you ape the normal ZK startup process more closely. That sounds good but it may be incompatible with your goals of integrating all configuration into a single XML file and not using the normal ZK configuration process. Your thought about forking ZK is a good one since there are calls to System.exit() that could wreak havoc. Although I agree with your comments about the architectural issues that embedding may lead to and we are aware of those, I do not agree that embedding will always lead to those issues. I agree that embedding won't always lead to those issues and your application is a reasonable counter-example. As is common, I think that the exception proves the rule since your system is really just another way to launch an independent ZK cluster rather than an example of ZK being embedded into an application.
Re: How to handle Node does not exist error?
It doesn't. But running a ZK cluster that is incorrectly configured can cause this problem and configuring ZK using setters is likely to be subject to changes in what configuration is needed. Thus, your style of code is more subject to decay over time than is nice. The rest of my comments detail *other* reasons why embedding a coordination layer in the code being coordinated is a bad idea. On Thu, Aug 12, 2010 at 6:33 AM, Vishal K vishalm...@gmail.com wrote: Hi Ted, Can you explain why running ZK in embedded mode can cause znode inconsistencies? Thanks. -Vishal On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning ted.dunn...@gmail.com wrote: Try running the server in non-embedded mode. Also, you are assuming that you know everything about how to configure the quorumPeer. That is going to change and your code will break at that time. If you use a non-embedded cluster, this won't be a problem and you will be able to upgrade ZK version without having to restart your service. My own opinion is that running an embedded ZK is a serious architectural error. Since I don't know your particular situation, it might be different, but there is an inherent contradiction involved in running a coordination layer as part of the thing being coordinated. Whatever your software does, it isn't what ZK does. As such, it is better to factor out the ZK functionality and make it completely stable. That gives you a much simpler world and will make it easier for you to trouble shoot your system. The simple fact that you can't take down your service without affecting the reliability of your ZK layer makes this a very bad idea. The problems you are having now are only a preview of what this architectural error leads to. There will be more problems and many of them are likely to be more subtle and lead to service interruptions and lots of wasted time. On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He h...@softtouchit.com wrote: hi, Ted and Mahadev, Here are some more details about my setup: I run zookeeper in the embedded mode with the following code: quorumPeer = new QuorumPeer(); quorumPeer.setClientPort(getClientPort()); quorumPeer.setTxnFactory(new FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir(; quorumPeer.setQuorumPeers(getServers()); quorumPeer.setElectionType(getElectionAlg()); quorumPeer.setMyid(getServerId()); quorumPeer.setTickTime(getTickTime()); quorumPeer.setInitLimit(getInitLimit()); quorumPeer.setSyncLimit(getSyncLimit()); quorumPeer.setQuorumVerifier(getQuorumVerifier()); quorumPeer.setCnxnFactory(cnxnFactory); quorumPeer.start(); The configuration values are read from the following XML document for server 1: cluster tickTime=1000 initLimit=10 syncLimit=5 clientPort=2181 serverId=1 member id=1 host=192.168.2.6:2888:3888/ member id=2 host=192.168.2.3:2888:3888/ member id=3 host=192.168.2.4:2888:3888/ /cluster The other servers have the same configurations except their ids being changed to 2 and 3. The error occurred on server 3 when I batch loaded some messages to server 1. However, this error does not always happen. I am not sure exactly what trigged this error yet. I also performed the stat operation on one of the No exit node and got: stat /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg001583 Exception in thread main java.lang.NullPointerException at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268) [...@t43 zookeeper-3.2.2]$ bin/zkCli.sh Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and are deleted by the last server who has read them. If I remove the troubled server's zookeeper log directory and restart the server, then everything is ok. I will try to get the nc result next time I see this problem. Dr Hao He XPE - the truly SOA platform h...@softtouchit.com http://softtouchit.com http://itunes.com/apps/Scanmobile On 12/08/2010, at 12:32 AM, Mahadev Konar wrote: HI Dr Hao, Can you please post the configuration of all the 3 zookeeper servers? I suspect it might be misconfigured clusters and they might not belong to the same
Re: How to handle Node does not exist error?
Hi, I don't intend to hijack Dr. Hao's email thread here, but I would like to point out two things: 1. I use embedded server as well. But I don't use any setters. We extend QuorumPeerMain and call initializeAndRun() function. So we are doing pretty much the same thing that QuorumPeerMain is doing. However, note that I am seeing the same problem (in ZK 3.3.0) as Dr Hao is seeing. I haven't debugged the cause yet. I assumed that this was my implementation error (and it could still be). Nevertheless, this could turn out to be a bug as well. 2. With respect to Ted's point about backward compatibility, I would suggest to take an approach of having an API to support embedded ZK instead of asking users to not embed ZK. -Vishal On Thu, Aug 12, 2010 at 3:18 PM, Ted Dunning ted.dunn...@gmail.com wrote: It doesn't. But running a ZK cluster that is incorrectly configured can cause this problem and configuring ZK using setters is likely to be subject to changes in what configuration is needed. Thus, your style of code is more subject to decay over time than is nice. The rest of my comments detail *other* reasons why embedding a coordination layer in the code being coordinated is a bad idea. On Thu, Aug 12, 2010 at 6:33 AM, Vishal K vishalm...@gmail.com wrote: Hi Ted, Can you explain why running ZK in embedded mode can cause znode inconsistencies? Thanks. -Vishal On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning ted.dunn...@gmail.com wrote: Try running the server in non-embedded mode. Also, you are assuming that you know everything about how to configure the quorumPeer. That is going to change and your code will break at that time. If you use a non-embedded cluster, this won't be a problem and you will be able to upgrade ZK version without having to restart your service. My own opinion is that running an embedded ZK is a serious architectural error. Since I don't know your particular situation, it might be different, but there is an inherent contradiction involved in running a coordination layer as part of the thing being coordinated. Whatever your software does, it isn't what ZK does. As such, it is better to factor out the ZK functionality and make it completely stable. That gives you a much simpler world and will make it easier for you to trouble shoot your system. The simple fact that you can't take down your service without affecting the reliability of your ZK layer makes this a very bad idea. The problems you are having now are only a preview of what this architectural error leads to. There will be more problems and many of them are likely to be more subtle and lead to service interruptions and lots of wasted time. On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He h...@softtouchit.com wrote: hi, Ted and Mahadev, Here are some more details about my setup: I run zookeeper in the embedded mode with the following code: quorumPeer = new QuorumPeer(); quorumPeer.setClientPort(getClientPort()); quorumPeer.setTxnFactory(new FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir(; quorumPeer.setQuorumPeers(getServers()); quorumPeer.setElectionType(getElectionAlg()); quorumPeer.setMyid(getServerId()); quorumPeer.setTickTime(getTickTime()); quorumPeer.setInitLimit(getInitLimit()); quorumPeer.setSyncLimit(getSyncLimit()); quorumPeer.setQuorumVerifier(getQuorumVerifier()); quorumPeer.setCnxnFactory(cnxnFactory); quorumPeer.start(); The configuration values are read from the following XML document for server 1: cluster tickTime=1000 initLimit=10 syncLimit=5 clientPort=2181 serverId=1 member id=1 host=192.168.2.6:2888:3888/ member id=2 host=192.168.2.3:2888:3888/ member id=3 host=192.168.2.4:2888:3888/ /cluster The other servers have the same configurations except their ids being changed to 2 and 3. The error occurred on server 3 when I batch loaded some messages to server 1. However, this error does not always happen. I am not sure exactly what trigged this error yet. I also performed the stat operation on one of the No exit node and got: stat /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg001583 Exception in thread main java.lang.NullPointerException at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579) at
Re: How to handle Node does not exist error?
I am not saying that the API shouldn't support embedded ZK. I am just saying that it is almost always a bad idea. It isn't that I am asking you to not do it, it is just that I am describing the experience I have had and that I have seen others have. In a nutshell, embedding leads to problems and it isn't hard to see why. On Thu, Aug 12, 2010 at 3:02 PM, Vishal K vishalm...@gmail.com wrote: 2. With respect to Ted's point about backward compatibility, I would suggest to take an approach of having an API to support embedded ZK instead of asking users to not embed ZK.
Re: How to handle Node does not exist error?
i thought there was a jira about supporting embedded zookeeper. (i remember rejecting a patch to fix it. one of the problems is that we have a couple of places that do System.exit().) i can't seem to find it though. one case that would be great for embedding is writing test cases, so i think it would be useful for that. ben On 08/12/2010 03:25 PM, Ted Dunning wrote: I am not saying that the API shouldn't support embedded ZK. I am just saying that it is almost always a bad idea. It isn't that I am asking you to not do it, it is just that I am describing the experience I have had and that I have seen others have. In a nutshell, embedding leads to problems and it isn't hard to see why. On Thu, Aug 12, 2010 at 3:02 PM, Vishal Kvishalm...@gmail.com wrote: 2. With respect to Ted's point about backward compatibility, I would suggest to take an approach of having an API to support embedded ZK instead of asking users to not embed ZK.
How to handle Node does not exist error?
hi, All, I have a 3-host cluster running ZooKeeper 3.2.2. On one of the hosts, there are a number of nodes that I can get and ls using zkCli.sh . However, when I tried to delete any of them, I got Node does not exist error.Those nodes do not exist on the other two hosts. Any idea how we should handle this type of errors and what might have caused this problem? Dr Hao He XPE - the truly SOA platform h...@softtouchit.com http://softtouchit.com http://itunes.com/apps/Scanmobile
Re: How to handle Node does not exist error?
hi, Ted, Thanks for the reply. Here is what I did: [zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002948 [] zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs [msg002807, msg002700, msg002701, msg002804, msg002704, msg002706, msg002601, msg001849, msg001847, msg002508, msg002609, msg001841, msg002607, msg002606, msg002604, msg002809, msg002817, msg001633, msg002812, msg002814, msg002711, msg002815, msg002713, msg002716, msg001772, msg002811, msg001635, msg001774, msg002515, msg002610, msg001838, msg002517, msg002612, msg002519, msg001973, msg001835, msg001974, msg002619, msg001831, msg002510, msg002512, msg002615, msg002614, msg002617, msg002104, msg002106, msg001769, msg001768, msg002828, msg002822, msg001760, msg002820, msg001963, msg001961, msg002110, msg002118, msg002900, msg002836, msg001757, msg002907, msg001753, msg001752, msg001755, msg001952, msg001958, msg001852, msg001956, msg001854, msg002749, msg001608, msg001609, msg002747, msg002882, msg001743, msg002888, msg001605, msg002885, msg001487, msg001746, msg002330, msg001749, msg001488, msg001489, msg001881, msg001491, msg002890, msg001889, msg002758, msg002241, msg002892, msg002852, msg002759, msg002898, msg002850, msg001733, msg002751, msg001739, msg002753, msg002756, msg002332, msg001872, msg002233, msg001721, msg001627, msg001720, msg001625, msg001628, msg001629, msg001729, msg002350, msg001727, msg002352, msg001622, msg001726, msg001623, msg001723, msg001724, msg001621, msg002736, msg002738, msg002363, msg001717, msg002878, msg002362, msg002361, msg001611, msg001894, msg002357, msg002218, msg002358, msg002355, msg001895, msg002356, msg001898, msg002354, msg001996, msg001990, msg002093, msg002880, msg002576, msg002579, msg002267, msg002266, msg002366, msg001901, msg002365, msg001903, msg001799, msg001906, msg002368, msg001597, msg002679, msg002166, msg001595, msg002481, msg002482, msg002373, msg002374, msg002371, msg001599, msg002773, msg002274, msg002275, msg002270, msg002583, msg002271, msg002580, msg002067, msg002277, msg002278, msg002376, msg002180, msg002467, msg002378, msg002182, msg002377, msg002184, msg002379, msg002187, msg002186, msg002665, msg002666, msg002381, msg002382, msg002661, msg002662, msg002663, msg002385, msg002284, msg002766, msg002282, msg002190, msg002599, msg002054, msg002596, msg002453, msg002459, msg002457, msg002456, msg002191, msg002652, msg002395, msg002650, msg002656, msg002655, msg002189, msg002047, msg002658, msg002659, msg002796, msg002250, msg002255, msg002589, msg002257, msg002061, msg002064, msg002585, msg002258, msg002587, msg002444, msg002446, msg002447, msg002450, msg002646, msg001501, msg002591, msg002592, msg001503, msg001506, msg002260, msg002594, msg002262, msg002263, msg002264, msg002590, msg002132, msg002130, msg002530, msg002931, msg001559, msg001808, msg002024, msg001553, msg002939, msg002937, msg001556, msg002935, msg002933, msg002140, msg001937, msg002143, msg002520, msg002522, msg002429, msg002524, msg002920, msg002035, msg001561, msg002134, msg002138, msg002925, msg002151, msg002287, msg002555, msg002010, msg002002, msg002290, msg001537, msg002005, msg002147, msg002145, msg002698, msg001592, msg001810, msg002690, msg002691, msg001911, msg001910, msg002693, msg001812, msg001817, msg001547, msg002012, msg002015, msg002941, msg001688, msg002018, msg002684, msg002944, msg001540, msg002686, msg001541, msg002946, msg002688, msg001584, msg002948] [zk: localhost:2181(CONNECTED) 7] delete /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002948 Node does not exist: /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002948 When I
Re: How to handle Node does not exist error?
What do your nodes have in their logs during startup? Are you sure you have them configured correctly? Are the file ephemeral? Could they have disappeared on their own? Sent from my iPhone On Aug 11, 2010, at 12:10 AM, Dr Hao He h...@softtouchit.com wrote: hi, Ted, Thanks for the reply. Here is what I did: [zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/ 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002948 [] zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/ 3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs [msg002807, msg002700, msg002701, msg002804, msg002704, msg002706, msg002601, msg001849, msg001847, msg002508, msg002609, msg001841, msg002607, msg002606, msg002604, msg002809, msg002817, msg001633, msg002812, msg002814, msg002711, msg002815, msg002713, msg002716, msg001772, msg002811, msg001635, msg001774, msg002515, msg002610, msg001838, msg002517, msg002612, msg002519, msg001973, msg001835, msg001974, msg002619, msg001831, msg002510, msg002512, msg002615, msg002614, msg002617, msg002104, msg002106, msg001769, msg001768, msg002828, msg002822, msg001760, msg002820, msg001963, msg001961, msg002110, msg002118, msg002900, msg002836, msg001757, msg002907, msg001753, msg001752, msg001755, msg001952, msg001958, msg001852, msg001956, msg001854, msg002749, msg001608, msg001609, msg002747, msg002882, msg001743, msg002888, msg001605, msg002885, msg001487, msg001746, msg002330, msg001749, msg001488, msg001489, msg001881, msg001491, msg002890, msg001889, msg002758, msg002241, msg002892, msg002852, msg002759, msg002898, msg002850, msg001733, msg002751, msg001739, msg002753, msg002756, msg002332, msg001872, msg002233, msg001721, msg001627, msg001720, msg001625, msg001628, msg001629, msg001729, msg002350, msg001727, msg002352, msg001622, msg001726, msg001623, msg001723, msg001724, msg001621, msg002736, msg002738, msg002363, msg001717, msg002878, msg002362, msg002361, msg001611, msg001894, msg002357, msg002218, msg002358, msg002355, msg001895, msg002356, msg001898, msg002354, msg001996, msg001990, msg002093, msg002880, msg002576, msg002579, msg002267, msg002266, msg002366, msg001901, msg002365, msg001903, msg001799, msg001906, msg002368, msg001597, msg002679, msg002166, msg001595, msg002481, msg002482, msg002373, msg002374, msg002371, msg001599, msg002773, msg002274, msg002275, msg002270, msg002583, msg002271, msg002580, msg002067, msg002277, msg002278, msg002376, msg002180, msg002467, msg002378, msg002182, msg002377, msg002184, msg002379, msg002187, msg002186, msg002665, msg002666, msg002381, msg002382, msg002661, msg002662, msg002663, msg002385, msg002284, msg002766, msg002282, msg002190, msg002599, msg002054, msg002596, msg002453, msg002459, msg002457, msg002456, msg002191, msg002652, msg002395, msg002650, msg002656, msg002655, msg002189, msg002047, msg002658, msg002659, msg002796, msg002250, msg002255, msg002589, msg002257, msg002061, msg002064, msg002585, msg002258, msg002587, msg002444, msg002446, msg002447, msg002450, msg002646, msg001501, msg002591, msg002592, msg001503, msg001506, msg002260, msg002594, msg002262, msg002263, msg002264, msg002590, msg002132, msg002130, msg002530, msg002931, msg001559, msg001808, msg002024, msg001553, msg002939, msg002937, msg001556, msg002935, msg002933, msg002140, msg001937, msg002143, msg002520, msg002522, msg002429, msg002524, msg002920, msg002035, msg001561, msg002134, msg002138, msg002925, msg002151, msg002287, msg002555, msg002010, msg002002, msg002290, msg001537, msg002005, msg002147, msg002145, msg002698, msg001592, msg001810, msg002690, msg002691, msg001911, msg001910, msg002693, msg001812, msg001817, msg001547, msg002012, msg002015, msg002941, msg001688, msg002018, msg002684,
Re: How to handle Node does not exist error?
HI Dr Hao, Can you please post the configuration of all the 3 zookeeper servers? I suspect it might be misconfigured clusters and they might not belong to the same ensemble. Just to be clear: /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002807 And other such nodes exist on one of the zookeeper servers and the same node does not exist on other servers? Also, as ted pointed out, can you please post the output of echo ³stat² | nc localhost 2181 (on all the 3 servers) to the list? Thanks mahadev On 8/11/10 12:10 AM, Dr Hao He h...@softtouchit.com wrote: hi, Ted, Thanks for the reply. Here is what I did: [zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002948 [] zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs [msg002807, msg002700, msg002701, msg002804, msg002704, msg002706, msg002601, msg001849, msg001847, msg002508, msg002609, msg001841, msg002607, msg002606, msg002604, msg002809, msg002817, msg001633, msg002812, msg002814, msg002711, msg002815, msg002713, msg002716, msg001772, msg002811, msg001635, msg001774, msg002515, msg002610, msg001838, msg002517, msg002612, msg002519, msg001973, msg001835, msg001974, msg002619, msg001831, msg002510, msg002512, msg002615, msg002614, msg002617, msg002104, msg002106, msg001769, msg001768, msg002828, msg002822, msg001760, msg002820, msg001963, msg001961, msg002110, msg002118, msg002900, msg002836, msg001757, msg002907, msg001753, msg001752, msg001755, msg001952, msg001958, msg001852, msg001956, msg001854, msg002749, msg001608, msg001609, msg002747, msg002882, msg001743, msg002888, msg001605, msg002885, msg001487, msg001746, msg002330, msg001749, msg001488, msg001489, msg001881, msg001491, msg002890, msg001889, msg002758, msg002241, msg002892, msg002852, msg002759, msg002898, msg002850, msg001733, msg002751, msg001739, msg002753, msg002756, msg002332, msg001872, msg002233, msg001721, msg001627, msg001720, msg001625, msg001628, msg001629, msg001729, msg002350, msg001727, msg002352, msg001622, msg001726, msg001623, msg001723, msg001724, msg001621, msg002736, msg002738, msg002363, msg001717, msg002878, msg002362, msg002361, msg001611, msg001894, msg002357, msg002218, msg002358, msg002355, msg001895, msg002356, msg001898, msg002354, msg001996, msg001990, msg002093, msg002880, msg002576, msg002579, msg002267, msg002266, msg002366, msg001901, msg002365, msg001903, msg001799, msg001906, msg002368, msg001597, msg002679, msg002166, msg001595, msg002481, msg002482, msg002373, msg002374, msg002371, msg001599, msg002773, msg002274, msg002275, msg002270, msg002583, msg002271, msg002580, msg002067, msg002277, msg002278, msg002376, msg002180, msg002467, msg002378, msg002182, msg002377, msg002184, msg002379, msg002187, msg002186, msg002665, msg002666, msg002381, msg002382, msg002661, msg002662, msg002663, msg002385, msg002284, msg002766, msg002282, msg002190, msg002599, msg002054, msg002596, msg002453, msg002459, msg002457, msg002456, msg002191, msg002652, msg002395, msg002650, msg002656, msg002655, msg002189, msg002047, msg002658, msg002659, msg002796, msg002250, msg002255, msg002589, msg002257, msg002061, msg002064, msg002585, msg002258, msg002587, msg002444, msg002446, msg002447, msg002450, msg002646, msg001501, msg002591, msg002592, msg001503, msg001506, msg002260, msg002594, msg002262, msg002263, msg002264, msg002590, msg002132, msg002130, msg002530, msg002931, msg001559, msg001808, msg002024, msg001553, msg002939, msg002937, msg001556, msg002935, msg002933, msg002140, msg001937, msg002143, msg002520, msg002522, msg002429, msg002524, msg002920, msg002035, msg001561, msg002134, msg002138, msg002925, msg002151, msg002287, msg002555, msg002010, msg002002, msg002290, msg001537, msg002005, msg002147, msg002145, msg002698,
Re: How to handle Node does not exist error?
Try running the server in non-embedded mode. Also, you are assuming that you know everything about how to configure the quorumPeer. That is going to change and your code will break at that time. If you use a non-embedded cluster, this won't be a problem and you will be able to upgrade ZK version without having to restart your service. My own opinion is that running an embedded ZK is a serious architectural error. Since I don't know your particular situation, it might be different, but there is an inherent contradiction involved in running a coordination layer as part of the thing being coordinated. Whatever your software does, it isn't what ZK does. As such, it is better to factor out the ZK functionality and make it completely stable. That gives you a much simpler world and will make it easier for you to trouble shoot your system. The simple fact that you can't take down your service without affecting the reliability of your ZK layer makes this a very bad idea. The problems you are having now are only a preview of what this architectural error leads to. There will be more problems and many of them are likely to be more subtle and lead to service interruptions and lots of wasted time. On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He h...@softtouchit.com wrote: hi, Ted and Mahadev, Here are some more details about my setup: I run zookeeper in the embedded mode with the following code: quorumPeer = new QuorumPeer(); quorumPeer.setClientPort(getClientPort()); quorumPeer.setTxnFactory(new FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir(; quorumPeer.setQuorumPeers(getServers()); quorumPeer.setElectionType(getElectionAlg()); quorumPeer.setMyid(getServerId()); quorumPeer.setTickTime(getTickTime()); quorumPeer.setInitLimit(getInitLimit()); quorumPeer.setSyncLimit(getSyncLimit()); quorumPeer.setQuorumVerifier(getQuorumVerifier()); quorumPeer.setCnxnFactory(cnxnFactory); quorumPeer.start(); The configuration values are read from the following XML document for server 1: cluster tickTime=1000 initLimit=10 syncLimit=5 clientPort=2181 serverId=1 member id=1 host=192.168.2.6:2888:3888/ member id=2 host=192.168.2.3:2888:3888/ member id=3 host=192.168.2.4:2888:3888/ /cluster The other servers have the same configurations except their ids being changed to 2 and 3. The error occurred on server 3 when I batch loaded some messages to server 1. However, this error does not always happen. I am not sure exactly what trigged this error yet. I also performed the stat operation on one of the No exit node and got: stat /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg001583 Exception in thread main java.lang.NullPointerException at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268) [...@t43 zookeeper-3.2.2]$ bin/zkCli.sh Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and are deleted by the last server who has read them. If I remove the troubled server's zookeeper log directory and restart the server, then everything is ok. I will try to get the nc result next time I see this problem. Dr Hao He XPE - the truly SOA platform h...@softtouchit.com http://softtouchit.com http://itunes.com/apps/Scanmobile On 12/08/2010, at 12:32 AM, Mahadev Konar wrote: HI Dr Hao, Can you please post the configuration of all the 3 zookeeper servers? I suspect it might be misconfigured clusters and they might not belong to the same ensemble. Just to be clear: /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002807 And other such nodes exist on one of the zookeeper servers and the same node does not exist on other servers? Also, as ted pointed out, can you please post the output of echo ³stat² | nc localhost 2181 (on all the 3 servers) to the list? Thanks mahadev On 8/11/10 12:10 AM, Dr Hao He h...@softtouchit.com wrote: hi, Ted, Thanks for the reply. Here is what I did: [zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002948 [] zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs [msg002807, msg002700, msg002701, msg002804, msg002704, msg002706, msg002601, msg001849, msg001847,