The reconfig is in process means something failed during reconfiguration and it couldn't complete. Perhaps the new server disconnected in the middle and never came back up. Notice that the second server's config file gets overwritten after it connects to the leader, and if it reboots at this stage it won't be able to connect again without you manually overwriting its config file again (since in the server's config server 2 is not part of the ensemble).
I checked it locally (running both servers on my laptop), and it worked. Perhaps start from that ? Like you said, I disabled acl by adding "-Dzookeeper.skipACL=yes" Here's the first server's config file: conf/zoo_replicated1.cfg dataDir=/Users/shralex/my-zookeeper/zookeeper1 syncLimit=2 initLimit=5 tickTime=2000 clientPort=2791 reconfigEnabled=true standaloneEnabled=false server.1=localhost:2721:2731:participant;localhost:2791 The second server's: conf/zoo_replicated2.cfg ataDir=/Users/shralex/my-zookeeper/zookeeper2 syncLimit=2 initLimit=5 tickTime=2000 clientPort=2792 reconfigEnabled=true standaloneEnabled=false server.1=localhost:2721:2731:participant;localhost:2791 server.2=localhost:2741:2751:participant;localhost:2792 create 2 directories for the servers: zookeeper1 and zookeeper2 and create myid files in each echo 1 > zookeeper1/myid echo 2 > zookeeper2/myid I find it easier for debugging to allow zkServer.sh to log to stdout. You can do this by changing zkServer.sh: - change nohup "$JAVA" to just "$JAVA" - remove " > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null" In two shells start both servers by export ZOOCFG=zoo_replicated1.cfg (change for server 2) ./bin/zkServer.sh start In a third shell I start the client by connecting it to server 2 as you did ./bin/zkCli.sh -server 127.0.0.1:2792 I run the following in the shell: [zk: 127.0.0.1:2792(CONNECTED) 2] config server.1=localhost:2721:2731:participant;localhost:2791 version=100000000 [zk: 127.0.0.1:2792(CONNECTED) 2] reconfig -add "server.2=localhost:2741:2751:participant;localhost:2792" Committed new configuration: server.1=localhost:2721:2731:participant;localhost:2791 server.2=localhost:2741:2751:participant;localhost:2792 version=200000003 On Thu, Dec 30, 2021 at 10:47 AM Eric Edgar <eric.ed...@smartthings.com.invalid> wrote: > I am a little closer I think. I disabled auth for testing using the server > flags .. but now I am getting a different error that the reconfig is in > process and I see a zookeeper.dynamic.next file on both servers but nothing > happens after that. > What would cause that file to not be merged into a new cfg. > Eric > > On Thu, Dec 30, 2021 at 11:47 AM Eric Edgar <eric.ed...@smartthings.com> > wrote: > > > Alex, > > so I have 2 nodes .. the first has itself in the dynamic list with an id > > of 1. > > server.1=10.1.1.104:2888:3888:participant;0.0.0.0:2181 > > > > I have brought the second node up with an id of 2 > > server.1=10.1.1.104:2888:3888:participant;0.0.0.0:2181 > > server.2=10.1.1.40:2888:3888:participant;2181 > > > > then i am trying to run from the second node. zkCli.sh -server > 10.1.1.104 > > reconfig -add "server.2=10.1.1.40:2888:3888:participant;2181" > > > > > > > > I get this error on the first server > > 2021-12-30 17:37:02,880 [myid:1] - INFO [ProcessThread(sid:1 > > cport:-1)::PrepRequestProcessor@461] - Incremental reconfig > > 2021-12-30 17:37:02,880 [myid:1] - WARN [ProcessThread(sid:1 > > cport:-1)::PrepRequestProcessor@532] - Reconfig failed - there must be a > > connected and synced quorum in new configuration > > 2021-12-30 17:37:02,880 [myid:1] - INFO [ProcessThread(sid:1 > > cport:-1)::PrepRequestProcessor@935] - Got user-level KeeperException > > when processing sessionid:0x1002dfe65610014 type:reconfig cxid:0x1 > > zxid:0x1600000033 txntype:-1 reqpath:n > > > > > > on the second server issuing the reconfig command I get this error > > No quorum of new config is connected and up-to-date with the leader of > > last commmitted config - try invoking reconfiguration after new servers > are > > connected and synced > > > > I have not set any security at this point. > > > > I am not sure what I am missing at this point, assuming I don't need 2 > > nodes fully clustered in advance as mentioned by Chris. > > > > Thanks, > > Eric > > > > On Thu, Dec 30, 2021 at 11:03 AM Alexander Shraer <shra...@gmail.com> > > wrote: > > > >> This is already possible, since the 3.5.0 release: > >> > >> > https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html#sc_reconfig_standaloneEnabled > >> > >> After your single node is up and running, you can connect other nodes to > >> it > >> as described in the reconfig manual. See "Adding servers" in the link > >> above. > >> Essentially, you need to specify the new server's initial config files > so > >> that they can find some existing server and start syncing data. Once a > >> quorum > >> of the new config is up to date, you can invoke the reconfig command to > >> officially make them part of the configuration. > >> > >> Thanks, > >> Alex > >> > >> On Thu, Dec 30, 2021 at 8:57 AM Eric Edgar > >> <eric.ed...@smartthings.com.invalid> wrote: > >> > >> > Also would it be possible to update the code for this edge case, eg > if > >> the > >> > current quorum is 1, and you want to add a node then add a flag > saying I > >> > trust the single master and reconfigure itself into a 2 node cluster? > >> > Thanks, > >> > Eric > >> > > >> > On Thu, Dec 30, 2021 at 10:49 AM Eric Edgar < > eric.ed...@smartthings.com > >> > > >> > wrote: > >> > > >> > > Are there any examples with a k8 orchestrator or some sort of docker > >> init > >> > > scripts handling the initial cluster configuration? > >> > > Thanks, > >> > > Eric > >> > > > >> > > On Thu, Dec 30, 2021 at 9:44 AM Chris T. <c.turks...@gmail.com> > >> wrote: > >> > > > >> > >> If you want to run a zookeeper cluster you have to start with at > >> least 2 > >> > >> members. From there you can scale up with the dynamic reconfig > >> commands. > >> > >> Regards > >> > >> Chris > >> > >> > >> > >> On 30 December 2021 16:40:40 Eric Edgar > >> > >> <eric.ed...@smartthings.com.INVALID> wrote: > >> > >> > >> > >> > I am experimenting with zk and the reconfig feature and trying to > >> > >> > understand if I can start a single zk node and then > >> reconfig/bootstrap > >> > >> the > >> > >> > other 2 nodes into the ensemble. The reconfig command is > throwing > >> an > >> > >> error > >> > >> > that there isn't a quorum yet. Is this line of thinking > >> possible? or > >> > >> do I > >> > >> > need to setup the first 3 nodes manually the first time? > >> > >> > I am basing this experiment off of this web page. > >> > >> > > >> > >> > >> > > >> > https://blog.container-solutions.com/dynamic-zookeeper-cluster-with-docker > >> > >> > > >> > >> > /opt/zookeeper/zookeeper/bin/zkCli.sh -server 10.1.1.104:2181 > >> > reconfig > >> > >> -add > >> > >> > "server.2=10.1.1.40:2888:3888:participant;2181" > >> > >> > No quorum of new config is connected and up-to-date with the > >> leader of > >> > >> last > >> > >> > commmitted config - try invoking reconfiguration after new > servers > >> are > >> > >> > connected and synced > >> > >> > > >> > >> > /opt/zookeeper/zookeeper/bin/zkCli.sh -server 10.1.1.104:2181 > >> config > >> > >> > server.1=10.1.1.104:2888:3888:participant;0.0.0.0:2181 > >> > >> > > >> > >> > cat ./zoo.cfg > >> > >> > autopurge.purgeInterval=1 > >> > >> > initLimit=10 > >> > >> > syncLimit=5 > >> > >> > autopurge.snapRetainCount=6 > >> > >> > tickTime=2000 > >> > >> > dataDir=/mnt/zookeeper/data > >> > >> > reconfigEnabled=true > >> > >> > standaloneEnabled=false > >> > >> > > >> > >> > >> > > >> > dynamicConfigFile=/opt/zookeeper/zookeeper/conf/zoo.cfg.dynamic.1600000000 > >> > >> > > >> > >> > What is the best solution for an unattended bootstrap setup of a > >> new > >> > >> > cluster from scratch? > >> > >> > > >> > >> > > >> > >> > This was something that we were able to accomplish with exhibitor > >> on > >> > >> older > >> > >> > versions of zookeeper in the past. > >> > >> > >> > >> > >> > > >> > > >