You told the master it needed a quorum of 2 and it's the only one online, so it's bombing out. That's the expected behaviour.
You need to start at least 2 zookeepers before it will be a functional group, same for the masters. You haven't mentioned how you setup your zookeeper cluster, so i'm assuming that's working correctly (3 nodes, all aware of the other 2 in their config). If not, you need to sort that out first. Also I think your zk URL is wrong - you want to list all 3 zookeeper nodes like this: sudo ./bin/mesos-master.sh --zk=zk://host1:2181,host2:2181,host3:2181/mesos --quorum=2 --work_dir=/var/lib/mesos/master when you've run that command on 2 hosts things should start working, you'll want all 3 up for redundancy. On 4 June 2016 at 16:42, Qian Zhang <zhq527...@gmail.com> wrote: > Hi Folks, > > I am trying to set up a Mesos HA env with 3 nodes, each of nodes has a > Zookeeper running, so they form a Zookeeper cluster. And then when I started > the first Mesos master in one node with: > sudo ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/mesos --quorum=2 > --work_dir=/var/lib/mesos/master > > I found it will hang here for 60 seconds: > I0604 23:39:56.488219 15330 zookeeper.cpp:259] A new leading master > (UPID=master@192.168.122.132:5050) is detected > I0604 23:39:56.489080 15337 master.cpp:1951] The newly elected leader is > master@192.168.122.132:5050 with id 40d387a6-4d61-49d6-af44-51dd41457390 > I0604 23:39:56.489791 15337 master.cpp:1964] Elected as the leading > master! > I0604 23:39:56.490401 15337 master.cpp:1651] Recovering from registrar > I0604 23:39:56.491706 15330 registrar.cpp:332] Recovering registrar > I0604 23:39:56.496448 15332 log.cpp:524] Attempting to start the writer > > And after 60s, master will fail: > F0604 23:40:56.499596 15337 master.cpp:1640] Recovery failed: Failed to > recover registrar: Failed to perform fetch within 1mins > *** Check failure stack trace: *** > @ 0x7f4b81372f4e google::LogMessage::Fail() > @ 0x7f4b81372e9a google::LogMessage::SendToLog() > @ 0x7f4b8137289c google::LogMessage::Flush() > @ 0x7f4b813757b0 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f4b8040eea0 mesos::internal::master::fail() > @ 0x7f4b804dbeb3 > _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvJS1_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x7f4b804ba453 > _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEEclIJS1_EvEET0_DpOT_ > @ 0x7f4b804898d7 > _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENUlS6_E_clES6_ > @ 0x7f4b804dbf80 > _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EEEEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_ > @ 0x49d257 std::function<>::operator()() > @ 0x49837f > _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_ > @ 0x493024 process::Future<>::fail() > @ 0x7f4b8015ad20 process::Promise<>::fail() > @ 0x7f4b804d9295 process::internal::thenf<>() > @ 0x7f4b8051788f > _ZNSt5_BindIFPFvRKSt8functionIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEERKSt10shared_ptrINS1_7PromiseIS3_EEERKNS2_IS7_EEESB_SH_St12_PlaceholderILi1EEEE6__callIvISM_EILm0ELm1ELm2EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x7f4b8050fa3b std::_Bind<>::operator()<>() > @ 0x7f4b804f94e3 std::_Function_handler<>::_M_invoke() > @ 0x7f4b8050fc69 std::function<>::operator()() > @ 0x7f4b804f9609 > _ZZNK7process6FutureIN5mesos8internal8RegistryEE5onAnyIRSt8functionIFvRKS4_EEvEES8_OT_NS4_6PreferEENUlS8_E_clES8_ > @ 0x7f4b80517936 > _ZNSt17_Function_handlerIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEZNKS5_5onAnyIRSt8functionIS8_EvEES7_OT_NS5_6PreferEEUlS7_E_E9_M_invokeERKSt9_Any_dataS7_ > @ 0x7f4b8050fc69 std::function<>::operator()() > @ 0x7f4b8056b1b4 process::internal::run<>() > @ 0x7f4b80561672 process::Future<>::fail() > @ 0x7f4b8059bf5f std::_Mem_fn<>::operator()<>() > @ 0x7f4b8059757f > _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x7f4b8058fad1 > _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEEclIJS8_EbEET0_DpOT_ > @ 0x7f4b80585a41 > _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENUlS9_E_clES9_ > @ 0x7f4b80597605 > _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS8_FbS1_EES8_St12_PlaceholderILi1EEEEbEERKS8_OT_NS8_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_ > @ 0x49d257 std::function<>::operator()() > @ 0x49837f > _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_ > @ 0x7f4b8056164a process::Future<>::fail() > @ 0x7f4b8055a378 process::Promise<>::fail() > > I tried both Zookeeper 3.4.8 and 3.4.6 with latest code of Mesos, but no > luck for both. Any ideas about what happened? Thanks. > > > > Thanks, > Qian Zhang