Hi Ted, Can you explain why running ZK in embedded mode can cause znode inconsistencies? Thanks.
-Vishal On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Try running the server in non-embedded mode. > > Also, you are assuming that you know everything about how to configure the > quorumPeer. That is going to change and your code will break at that time. > If you use a non-embedded cluster, this won't be a problem and you will be > able to upgrade ZK version without having to restart your service. > > My own opinion is that running an embedded ZK is a serious architectural > error. Since I don't know your particular situation, it might be > different, > but there is an inherent contradiction involved in running a coordination > layer as part of the thing being coordinated. Whatever your software does, > it isn't what ZK does. As such, it is better to factor out the ZK > functionality and make it completely stable. That gives you a much simpler > world and will make it easier for you to trouble shoot your system. The > simple fact that you can't take down your service without affecting the > reliability of your ZK layer makes this a very bad idea. > > The problems you are having now are only a preview of what this > architectural error leads to. There will be more problems and many of them > are likely to be more subtle and lead to service interruptions and lots of > wasted time. > > On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <h...@softtouchit.com> wrote: > > > hi, Ted and Mahadev, > > > > > > Here are some more details about my setup: > > > > I run zookeeper in the embedded mode with the following code: > > > > quorumPeer = new QuorumPeer(); > > > > quorumPeer.setClientPort(getClientPort()); > > quorumPeer.setTxnFactory(new > > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir()))); > > > > quorumPeer.setQuorumPeers(getServers()); > > > > quorumPeer.setElectionType(getElectionAlg()); > > quorumPeer.setMyid(getServerId()); > > > > quorumPeer.setTickTime(getTickTime()); > > > > quorumPeer.setInitLimit(getInitLimit()); > > > > quorumPeer.setSyncLimit(getSyncLimit()); > > > > quorumPeer.setQuorumVerifier(getQuorumVerifier()); > > > > quorumPeer.setCnxnFactory(cnxnFactory); > > quorumPeer.start(); > > > > > > The configuration values are read from the following XML document for > > server 1: > > > > <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181" > > serverId="1"> > > <member id="1" host="192.168.2.6:2888:3888"/> > > <member id="2" host="192.168.2.3:2888:3888"/> > > <member id="3" host="192.168.2.4:2888:3888"/> > > </cluster> > > > > > > The other servers have the same configurations except their ids being > > changed to 2 and 3. > > > > The error occurred on server 3 when I batch loaded some messages to > server > > 1. However, this error does not always happen. I am not sure exactly > what > > trigged this error yet. > > > > I also performed the "stat" operation on one of the "No exit" node and > got: > > > > stat > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583 > > Exception in thread "main" java.lang.NullPointerException > > at > > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129) > > at > > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715) > > at > > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579) > > at > > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351) > > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309) > > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268) > > [...@t43 zookeeper-3.2.2]$ bin/zkCli.sh > > > > > > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and > are > > deleted by the last server who has read them. > > > > If I remove the troubled server's zookeeper log directory and restart the > > server, then everything is ok. > > > > I will try to get the nc result next time I see this problem. > > > > > > Dr Hao He > > > > XPE - the truly SOA platform > > > > h...@softtouchit.com > > http://softtouchit.com > > http://itunes.com/apps/Scanmobile > > > > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote: > > > > > HI Dr Hao, > > > Can you please post the configuration of all the 3 zookeeper servers? > I > > > suspect it might be misconfigured clusters and they might not belong to > > the > > > same ensemble. > > > > > > Just to be clear: > > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807 > > > > > > And other such nodes exist on one of the zookeeper servers and the same > > node > > > does not exist on other servers? > > > > > > Also, as ted pointed out, can you please post the output of echo ³stat² > | > > nc > > > localhost 2181 (on all the 3 servers) to the list? > > > > > > Thanks > > > mahadev > > > > > > > > > > > > On 8/11/10 12:10 AM, "Dr Hao He" <h...@softtouchit.com> wrote: > > > > > >> hi, Ted, > > >> > > >> Thanks for the reply. Here is what I did: > > >> > > >> [zk: localhost:2181(CONNECTED) 0] ls > > >> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948 > > >> [] > > >> zk: localhost:2181(CONNECTED) 1] ls > > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs > > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804, > > msg0000002704, > > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847, > > msg0000002508, > > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606, > > msg0000002604, > > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812, > > msg0000002814, > > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716, > > msg0000001772, > > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515, > > msg0000002610, > > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519, > > msg0000001973, > > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831, > > msg0000002510, > > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617, > > msg0000002104, > > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828, > > msg0000002822, > > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961, > > msg0000002110, > > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757, > > msg0000002907, > > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952, > > msg0000001958, > > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749, > > msg0000001608, > > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743, > > msg0000002888, > > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746, > > msg0000002330, > > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881, > > msg0000001491, > > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241, > > msg0000002892, > > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850, > > msg0000001733, > > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756, > > msg0000002332, > > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627, > > msg0000001720, > > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729, > > msg0000002350, > > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726, > > msg0000001623, > > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736, > > msg0000002738, > > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362, > > msg0000002361, > > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218, > > msg0000002358, > > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898, > > msg0000002354, > > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880, > > msg0000002576, > > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366, > > msg0000001901, > > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906, > > msg0000002368, > > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595, > > msg0000002481, > > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371, > > msg0000001599, > > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270, > > msg0000002583, > > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277, > > msg0000002278, > > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378, > > msg0000002182, > > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187, > > msg0000002186, > > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382, > > msg0000002661, > > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284, > > msg0000002766, > > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054, > > msg0000002596, > > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456, > > msg0000002191, > > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656, > > msg0000002655, > > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659, > > msg0000002796, > > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257, > > msg0000002061, > > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587, > > msg0000002444, > > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646, > > msg0000001501, > > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506, > > msg0000002260, > > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264, > > msg0000002590, > > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931, > > msg0000001559, > > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939, > > msg0000002937, > > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140, > > msg0000001937, > > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429, > > msg0000002524, > > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134, > > msg0000002138, > > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555, > > msg0000002010, > > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005, > > msg0000002147, > > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810, > > msg0000002690, > > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693, > > msg0000001812, > > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015, > > msg0000002941, > > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944, > > msg0000001540, > > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688, > > msg0000001584, > > >> msg0000002948] > > >> > > >> [zk: localhost:2181(CONNECTED) 7] delete > > >> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948 > > >> Node does not exist: > > >> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948 > > >> > > >> When I performed the same operations on another node, none of those > > nodes > > >> existed. > > >> > > >> > > >> Dr Hao He > > >> > > >> XPE - the truly SOA platform > > >> > > >> h...@softtouchit.com > > >> http://softtouchit.com > > >> http://itunes.com/apps/Scanmobile > > >> > > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote: > > >> > > >>> Can you provide some more information? The output of some of the > four > > >>> letter commands and a transcript of what you are doing would be very > > >>> helpful. > > >>> > > >>> Also, there is no way for znodes to exist on one node of a properly > > >>> operating ZK cluster and not on either of the other two. Something > has > > to > > >>> be wrong and I would vote for operator error (not to cast aspersions, > > it is > > >>> just that humans like you and *me* make more errors than ZK does). > > >>> > > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <h...@softtouchit.com> > > wrote: > > >>> > > >>>> hi, All, > > >>>> > > >>>> I have a 3-host cluster running ZooKeeper 3.2.2. On one of the > hosts, > > >>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh > . > > >>>> However, when I tried to "delete" any of them, I got "Node does not > > exist" > > >>>> error. Those nodes do not exist on the other two hosts. > > >>>> > > >>>> Any idea how we should handle this type of errors and what might > have > > >>>> caused this problem? > > >>>> > > >>>> Dr Hao He > > >>>> > > >>>> XPE - the truly SOA platform > > >>>> > > >>>> h...@softtouchit.com > > >>>> http://softtouchit.com > > >>>> http://itunes.com/apps/Scanmobile > > >>>> > > >>>> > > >> > > >> > > > > > > > > > > >