You should never specify a quorum of 0. For 1 master, you specify quorum of 1. For 3 masters, quorum is 2. For 5 masters, quorum is 3. For 7 masters, quorum is 4. The quorum dictates how many masters (log replicas) have to agree on a fact to win a vote. If you have a quorum of 0, then no masters vote, so nobody wins. On a related note, you should always have an odd number of masters, so that the vote is never tied.
I will admit that the master shouldn't crash with --quorum=0; it should just exit with an error that quorum must be >=1. Want to file a JIRA? On Tue, Dec 29, 2015 at 3:43 PM, Mehrotra, Ashish <ashish.mehro...@emc.com> wrote: > Hi All, > > I am running Centos 7.1, zookeeper version 3.4.7 and Mesos version 0.26.0. > After starting the zookeeper, when I tried to start to start the > meson-server with quorum 0 (everything being run on the same machine, not > as local but distributed set up), the server crashed. > This happened immediately after the fresh installs. > When I changed quorum=1, the mesos master ran fine and the slave could get > connected. > Then on restarting the mesos master, there was no issue. * The issue was > seen the very first time only.* > The error stack is incomprehensible. > > Anyone seen this issue previously? > The error log was — > > [root@abc123 build]# ./bin/mesos-master.sh --ip=10.10.10.118 > --work_dir=/var/lib/mesos --zk=zk://10.10.10.118:2181/mesos *--quorum=0* > I1229 13:41:24.925851 3345 main.cpp:232] Build: 2015-12-29 12:29:36 by > root > I1229 13:41:24.925983 3345 main.cpp:234] Version: 0.26.0 > I1229 13:41:24.929131 3345 main.cpp:255] Using 'HierarchicalDRF' allocator > I1229 13:41:24.953929 3345 leveldb.cpp:176] Opened db in 24.529078ms > I1229 13:41:24.955523 3345 leveldb.cpp:183] Compacted db in 1.525191ms > I1229 13:41:24.955688 3345 leveldb.cpp:198] Created db iterator in > 107413ns > I1229 13:41:24.955724 3345 leveldb.cpp:204] Seeked to beginning of db in > 4553ns > I1229 13:41:24.955737 3345 leveldb.cpp:273] Iterated through 0 keys in > the db in 224ns > I1229 13:41:24.956120 3345 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1229 13:41:24.961802 3345 main.cpp:464] Starting Mesos master > I1229 13:41:24.965438 3345 master.cpp:367] Master > a38658f7-89c1-4b1f-84f9-5796234b2104 (localhost) started on > 10.10.10.118:5050 > I1229 13:41:24.965459 3345 master.cpp:369] Flags at startup: > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --initialize_driver_logging="true" > --ip="10.10.10.118" --log_auto_initialize="true" --logbufsecs="0" > --logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" > --quiet="false" --quorum="0" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="5secs" --registry_strict="false" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/home/admin/mesos/build/../src/webui" > --work_dir="/var/lib/mesos" --zk="zk://10.10.10.118:2181/mesos" > --zk_session_timeout="10secs" > I1229 13:41:24.965761 3345 master.cpp:416] Master allowing > unauthenticated frameworks to register > I1229 13:41:24.965772 3345 master.cpp:421] Master allowing > unauthenticated slaves to register > I1229 13:41:24.965837 3345 master.cpp:458] Using default 'crammd5' > authenticator > W1229 13:41:24.965867 3345 authenticator.cpp:513] No credentials > provided, authentication requests will be refused > I1229 13:41:24.965881 3345 authenticator.cpp:520] Initializing server SASL > I1229 13:41:24.966788 3364 log.cpp:238] Attempting to join replica to > ZooKeeper group > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@712: Client > environment:zookeeper.version=zookeeper C client 3.4.5 > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@716: Client > environment:host.name=abc.def.com > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@723: Client > environment:os.name=Linux > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@724: Client > environment:os.arch=3.10.0-229.el7.x86_64 > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@725: Client > environment:os.version=#1 SMP Fri Mar 6 11:36:42 UTC 2015 > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@733: Client > environment:user.name=root > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@741: Client > environment:user.home=/root > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@753: Client > environment:user.dir=/home/admin/mesos/build > 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@zookeeper_init@786: > Initiating client connection, host=10.10.10.118:2181 sessionTimeout=10000 > watcher=0x7f2110e3f16c sessionId=0 sessionPasswd=<null> > context=0x7f20fc0038c0 flags=0 > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@712: Client > environment:zookeeper.version=zookeeper C client 3.4.5 > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@716: Client > environment:host.name=abc.def.com > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@723: Client > environment:os.name=Linux > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@724: Client > environment:os.arch=3.10.0-229.el7.x86_64 > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@725: Client > environment:os.version=#1 SMP Fri Mar 6 11:36:42 UTC 2015 > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@733: Client > environment:user.name=root > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@741: Client > environment:user.home=/root > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@753: Client > environment:user.dir=/home/admin/mesos/build > 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@zookeeper_init@786: > Initiating client connection, host=10.10.10.118:2181 sessionTimeout=10000 > watcher=0x7f2110e3f16c sessionId=0 sessionPasswd=<null> > context=0x7f20fc003a70 flags=0 > I1229 13:41:24.971629 3360 recover.cpp:449] Starting replica recovery > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@712: Client > environment:zookeeper.version=zookeeper C client 3.4.5 > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@716: Client > environment:host.name=abc.def.com > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@723: Client > environment:os.name=Linux > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@724: Client > environment:os.arch=3.10.0-229.el7.x86_64 > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@725: Client > environment:os.version=#1 SMP Fri Mar 6 11:36:42 UTC 2015 > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@733: Client > environment:user.name=root > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@741: Client > environment:user.home=/root > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@753: Client > environment:user.dir=/home/admin/mesos/build > 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@zookeeper_init@786: > Initiating client connection, host=10.10.10.118:2181 sessionTimeout=10000 > watcher=0x7f2110e3f16c sessionId=0 sessionPasswd=<null> > context=0x7f20fc0078b0 flags=0 > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@712: Client > environment:zookeeper.version=zookeeper C client 3.4.5 > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@716: Client > environment:host.name=abc.def.com > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@723: Client > environment:os.name=Linux > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@724: Client > environment:os.arch=3.10.0-229.el7.x86_64 > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@725: Client > environment:os.version=#1 SMP Fri Mar 6 11:36:42 UTC 2015 > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@733: Client > environment:user.name=root > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@741: Client > environment:user.home=/root > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@753: Client > environment:user.dir=/home/admin/mesos/build > 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@zookeeper_init@786: > Initiating client connection, host=10.10.10.118:2181 sessionTimeout=10000 > watcher=0x7f2110e3f16c sessionId=0 sessionPasswd=<null> > context=0x7f20fc007f00 flags=0 > I1229 13:41:24.973780 3362 recover.cpp:475] Replica is in EMPTY status > I1229 13:41:24.979076 3362 replica.cpp:676] Replica in EMPTY status > received a broadcasted recover request from (4)@10.10.10.118:5050 > I1229 13:41:24.979863 3362 recover.cpp:195] Received a recover response > from a replica in EMPTY status > F1229 13:41:24.980000 3362 recover.cpp:219] > CHECK_SOME(lowestBeginPosition): is NONE > *** Check failure stack trace: *** > @ 0x7f211143a6a2 google::LogMessage::Fail() > @ 0x7f211143a601 google::LogMessage::SendToLog() > 2015-12-29 13:41:24,995:3345(0x7f20f37fe700):ZOO_INFO@check_events@1703: > initiated connection to server [10.10.10.118:2181] > 2015-12-29 13:41:25,004:3345(0x7f21008d9700):ZOO_INFO@check_events@1703: > initiated connection to server [10.10.10.118:2181] > 2015-12-29 13:41:25,004:3345(0x7f21018db700):ZOO_INFO@check_events@1703: > initiated connection to server [10.10.10.118:2181] > 2015-12-29 13:41:25,004:3345(0x7f20f27fc700):ZOO_INFO@check_events@1703: > initiated connection to server [10.10.10.118:2181] > @ 0x7f211143a012 google::LogMessage::Flush() > @ 0x7f211143cd46 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f211056d44c _CheckFatal::~_CheckFatal() > @ 0x7f211125a243 > mesos::internal::log::RecoverProtocolProcess::received() > @ 0x7f2111265ae6 > _ZZN7process8dispatchI6OptionIN5mesos8internal3log15RecoverResponseEENS4_22RecoverProtocolProcessERKNS_6FutureIS5_EES9_EENS8_IT_EERKNS_3PIDIT0_EEMSF_FSD_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESO_ > @ 0x7f211127c5d5 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI6OptionIN5mesos8internal3log15RecoverResponseEENS8_22RecoverProtocolProcessERKNS0_6FutureIS9_EESD_EENSC_IT_EERKNS0_3PIDIT0_EEMSJ_FSH_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x7f21113c0d7d std::function<>::operator()() > @ 0x7f21113a8b95 process::ProcessBase::visit() > @ 0x7f21113ac960 process::DispatchEvent::visit() > @ 0x471dd8 process::ProcessBase::serve() > I1229 13:41:25.136451 3366 contender.cpp:149] Joining the ZK group > @ 0x7f21113a4f81 process::ProcessManager::resume() > @ 0x7f21113a21b2 > _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ > @ 0x7f21113ac18c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x7f21113ac13c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ > @ 0x7f21113ac0ce > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x7f21113ac025 > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv > @ 0x7f21113abfbe > _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv > @ 0x7f210ce3f220 (unknown) > @ 0x7f210d099dc5 start_thread > @ 0x7f210c5a721d __clone > *Aborted (core dumped)* > >