Hi,

I don't intend to hijack Dr. Hao's email thread here, but I would like to
point out two things:

1. I  use embedded server as well. But I don't use any setters. We extend
QuorumPeerMain and call initializeAndRun() function. So we are doing pretty
much the same thing that QuorumPeerMain is doing. However, note that I am
seeing the same problem (in ZK 3.3.0) as Dr Hao is seeing. I haven't
debugged the cause yet. I assumed that this was my implementation error (and
it could still be). Nevertheless, this could turn out to be a bug as well.

2. With respect to Ted's point about backward compatibility, I would suggest
to take an approach of having an API to support embedded ZK instead of
asking users to not embed ZK.

-Vishal

On Thu, Aug 12, 2010 at 3:18 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> It doesn't.
>
> But running a ZK cluster that is incorrectly configured can cause this
> problem and configuring ZK using setters is likely to be subject to changes
> in what configuration is needed.  Thus, your style of code is more subject
> to decay over time than is nice.
>
> The rest of my comments detail *other* reasons why embedding a coordination
> layer in the code being coordinated is a bad idea.
>
> On Thu, Aug 12, 2010 at 6:33 AM, Vishal K <vishalm...@gmail.com> wrote:
>
> > Hi Ted,
> >
> > Can you explain why running ZK in embedded mode can cause znode
> > inconsistencies?
> > Thanks.
> >
> > -Vishal
> >
> > On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> >
> > > Try running the server in non-embedded mode.
> > >
> > > Also, you are assuming that you know everything about how to configure
> > the
> > > quorumPeer.  That is going to change and your code will break at that
> > time.
> > >  If you use a non-embedded cluster, this won't be a problem and you
> will
> > be
> > > able to upgrade ZK version without having to restart your service.
> > >
> > > My own opinion is that running an embedded ZK is a serious
> architectural
> > > error.  Since I don't know your particular situation, it might be
> > > different,
> > > but there is an inherent contradiction involved in running a
> coordination
> > > layer as part of the thing being coordinated.  Whatever your software
> > does,
> > > it isn't what ZK does.  As such, it is better to factor out the ZK
> > > functionality and make it completely stable.  That gives you a much
> > simpler
> > > world and will make it easier for you to trouble shoot your system.
>  The
> > > simple fact that you can't take down your service without affecting the
> > > reliability of your ZK layer makes this a very bad idea.
> > >
> > > The problems you are having now are only a preview of what this
> > > architectural error leads to.  There will be more problems and many of
> > them
> > > are likely to be more subtle and lead to service interruptions and lots
> > of
> > > wasted time.
> > >
> > > On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <h...@softtouchit.com> wrote:
> > >
> > > > hi, Ted and Mahadev,
> > > >
> > > >
> > > > Here are some more details about my setup:
> > > >
> > > > I run zookeeper in the embedded mode with the following code:
> > > >
> > > >                                        quorumPeer = new QuorumPeer();
> > > >
> > > >  quorumPeer.setClientPort(getClientPort());
> > > >                                        quorumPeer.setTxnFactory(new
> > > > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
> > > >
> > > >  quorumPeer.setQuorumPeers(getServers());
> > > >
> > > >  quorumPeer.setElectionType(getElectionAlg());
> > > >
> >  quorumPeer.setMyid(getServerId());
> > > >
> > > >  quorumPeer.setTickTime(getTickTime());
> > > >
> > > >  quorumPeer.setInitLimit(getInitLimit());
> > > >
> > > >  quorumPeer.setSyncLimit(getSyncLimit());
> > > >
> > > >  quorumPeer.setQuorumVerifier(getQuorumVerifier());
> > > >
> > > >  quorumPeer.setCnxnFactory(cnxnFactory);
> > > >                                        quorumPeer.start();
> > > >
> > > >
> > > > The configuration values are read from the following XML document for
> > > > server 1:
> > > >
> > > > <cluster tickTime="1000" initLimit="10" syncLimit="5"
> clientPort="2181"
> > > > serverId="1">
> > > >                  <member id="1" host="192.168.2.6:2888:3888"/>
> > > >                  <member id="2" host="192.168.2.3:2888:3888"/>
> > > >                  <member id="3" host="192.168.2.4:2888:3888"/>
> > > > </cluster>
> > > >
> > > >
> > > > The other servers have the same configurations except their ids being
> > > > changed to 2 and 3.
> > > >
> > > > The error occurred on server 3 when I batch loaded some messages to
> > > server
> > > > 1.  However, this error does not always happen.  I am not sure
> exactly
> > > what
> > > > trigged this error yet.
> > > >
> > > > I also performed the "stat" operation on one of the "No exit" node
> and
> > > got:
> > > >
> > > > stat
> > > >
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> > > > Exception in thread "main" java.lang.NullPointerException
> > > >        at
> > > > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
> > > >        at
> > > >
> org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
> > > >        at
> > > > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
> > > >        at
> > > >
> org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
> > > >        at
> > org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
> > > >        at
> > org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> > > > [...@t43 zookeeper-3.2.2]$ bin/zkCli.sh
> > > >
> > > >
> > > > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL
> and
> > > are
> > > > deleted by the last server who has read them.
> > > >
> > > > If I remove the troubled server's zookeeper log directory and restart
> > the
> > > > server, then everything is ok.
> > > >
> > > > I will try to get the nc result next time I see this problem.
> > > >
> > > >
> > > > Dr Hao He
> > > >
> > > > XPE - the truly SOA platform
> > > >
> > > > h...@softtouchit.com
> > > > http://softtouchit.com
> > > > http://itunes.com/apps/Scanmobile
> > > >
> > > > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
> > > >
> > > > > HI Dr Hao,
> > > > >  Can you please post the configuration of all the 3 zookeeper
> > servers?
> > > I
> > > > > suspect it might be misconfigured clusters and they might not
> belong
> > to
> > > > the
> > > > > same ensemble.
> > > > >
> > > > > Just to be clear:
> > > > >
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> > > > >
> > > > > And other such nodes exist on one of the zookeeper servers and the
> > same
> > > > node
> > > > > does not exist on other servers?
> > > > >
> > > > > Also, as ted pointed out, can you please post the output of echo
> > ³stat²
> > > |
> > > > nc
> > > > > localhost 2181 (on all the 3 servers) to the list?
> > > > >
> > > > > Thanks
> > > > > mahadev
> > > > >
> > > > >
> > > > >
> > > > > On 8/11/10 12:10 AM, "Dr Hao He" <h...@softtouchit.com> wrote:
> > > > >
> > > > >> hi, Ted,
> > > > >>
> > > > >> Thanks for the reply.  Here is what I did:
> > > > >>
> > > > >> [zk: localhost:2181(CONNECTED) 0] ls
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >> []
> > > > >> zk: localhost:2181(CONNECTED) 1] ls
> > > > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> > > > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> > > > msg0000002704,
> > > > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> > > > msg0000002508,
> > > > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> > > > msg0000002604,
> > > > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> > > > msg0000002814,
> > > > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> > > > msg0000001772,
> > > > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> > > > msg0000002610,
> > > > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> > > > msg0000001973,
> > > > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> > > > msg0000002510,
> > > > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> > > > msg0000002104,
> > > > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> > > > msg0000002822,
> > > > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> > > > msg0000002110,
> > > > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> > > > msg0000002907,
> > > > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> > > > msg0000001958,
> > > > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> > > > msg0000001608,
> > > > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> > > > msg0000002888,
> > > > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> > > > msg0000002330,
> > > > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> > > > msg0000001491,
> > > > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> > > > msg0000002892,
> > > > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> > > > msg0000001733,
> > > > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> > > > msg0000002332,
> > > > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> > > > msg0000001720,
> > > > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> > > > msg0000002350,
> > > > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> > > > msg0000001623,
> > > > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> > > > msg0000002738,
> > > > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> > > > msg0000002361,
> > > > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> > > > msg0000002358,
> > > > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> > > > msg0000002354,
> > > > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> > > > msg0000002576,
> > > > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> > > > msg0000001901,
> > > > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> > > > msg0000002368,
> > > > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> > > > msg0000002481,
> > > > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> > > > msg0000001599,
> > > > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> > > > msg0000002583,
> > > > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> > > > msg0000002278,
> > > > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> > > > msg0000002182,
> > > > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> > > > msg0000002186,
> > > > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> > > > msg0000002661,
> > > > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> > > > msg0000002766,
> > > > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> > > > msg0000002596,
> > > > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> > > > msg0000002191,
> > > > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> > > > msg0000002655,
> > > > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> > > > msg0000002796,
> > > > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> > > > msg0000002061,
> > > > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> > > > msg0000002444,
> > > > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> > > > msg0000001501,
> > > > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> > > > msg0000002260,
> > > > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> > > > msg0000002590,
> > > > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> > > > msg0000001559,
> > > > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> > > > msg0000002937,
> > > > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> > > > msg0000001937,
> > > > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> > > > msg0000002524,
> > > > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> > > > msg0000002138,
> > > > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> > > > msg0000002010,
> > > > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> > > > msg0000002147,
> > > > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> > > > msg0000002690,
> > > > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> > > > msg0000001812,
> > > > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> > > > msg0000002941,
> > > > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> > > > msg0000001540,
> > > > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> > > > msg0000001584,
> > > > >> msg0000002948]
> > > > >>
> > > > >> [zk: localhost:2181(CONNECTED) 7] delete
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >> Node does not exist:
> > > > >>
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > > > >>
> > > > >> When I performed the same operations on another node, none of
> those
> > > > nodes
> > > > >> existed.
> > > > >>
> > > > >>
> > > > >> Dr Hao He
> > > > >>
> > > > >> XPE - the truly SOA platform
> > > > >>
> > > > >> h...@softtouchit.com
> > > > >> http://softtouchit.com
> > > > >> http://itunes.com/apps/Scanmobile
> > > > >>
> > > > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> > > > >>
> > > > >>> Can you provide some more information?  The output of some of the
> > > four
> > > > >>> letter commands and a transcript of what you are doing would be
> > very
> > > > >>> helpful.
> > > > >>>
> > > > >>> Also, there is no way for znodes to exist on one node of a
> properly
> > > > >>> operating ZK cluster and not on either of the other two.
>  Something
> > > has
> > > > to
> > > > >>> be wrong and I would vote for operator error (not to cast
> > aspersions,
> > > > it is
> > > > >>> just that humans like you and *me* make more errors than ZK
> does).
> > > > >>>
> > > > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <h...@softtouchit.com>
> > > > wrote:
> > > > >>>
> > > > >>>> hi, All,
> > > > >>>>
> > > > >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the
> > > hosts,
> > > > >>>> there are a number of nodes that I can "get" and "ls" using
> > zkCli.sh
> > > .
> > > > >>>> However, when I tried to "delete" any of them, I got "Node does
> > not
> > > > exist"
> > > > >>>> error.    Those nodes do not exist on the other two hosts.
> > > > >>>>
> > > > >>>> Any idea how we should handle this type of errors and what might
> > > have
> > > > >>>> caused this problem?
> > > > >>>>
> > > > >>>> Dr Hao He
> > > > >>>>
> > > > >>>> XPE - the truly SOA platform
> > > > >>>>
> > > > >>>> h...@softtouchit.com
> > > > >>>> http://softtouchit.com
> > > > >>>> http://itunes.com/apps/Scanmobile
> > > > >>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

Reply via email to