You should never specify a quorum of 0.
For 1 master, you specify quorum of 1.
For 3 masters, quorum is 2.
For 5 masters, quorum is 3.
For 7 masters, quorum is 4.
The quorum dictates how many masters (log replicas) have to agree on a fact
to win a vote. If you have a quorum of 0, then no masters vote, so nobody
wins. On a related note, you should always have an odd number of masters,
so that the vote is never tied.

I will admit that the master shouldn't crash with --quorum=0; it should
just exit with an error that quorum must be >=1. Want to file a JIRA?

On Tue, Dec 29, 2015 at 3:43 PM, Mehrotra, Ashish <ashish.mehro...@emc.com>
wrote:

> Hi All,
>
> I am running Centos 7.1, zookeeper version 3.4.7 and Mesos version 0.26.0.
> After starting the zookeeper, when I tried to start to start the
> meson-server with quorum 0 (everything being run on the same machine, not
> as local but distributed set up), the server crashed.
> This happened immediately after the fresh installs.
> When I changed quorum=1, the mesos master ran fine and the slave could get
> connected.
> Then on restarting the mesos master, there was no issue. * The issue was
> seen the very first time only.*
> The error stack is incomprehensible.
>
> Anyone seen this issue previously?
> The error log was —
>
> [root@abc123 build]# ./bin/mesos-master.sh --ip=10.10.10.118
> --work_dir=/var/lib/mesos --zk=zk://10.10.10.118:2181/mesos *--quorum=0*
> I1229 13:41:24.925851  3345 main.cpp:232] Build: 2015-12-29 12:29:36 by
> root
> I1229 13:41:24.925983  3345 main.cpp:234] Version: 0.26.0
> I1229 13:41:24.929131  3345 main.cpp:255] Using 'HierarchicalDRF' allocator
> I1229 13:41:24.953929  3345 leveldb.cpp:176] Opened db in 24.529078ms
> I1229 13:41:24.955523  3345 leveldb.cpp:183] Compacted db in 1.525191ms
> I1229 13:41:24.955688  3345 leveldb.cpp:198] Created db iterator in
> 107413ns
> I1229 13:41:24.955724  3345 leveldb.cpp:204] Seeked to beginning of db in
> 4553ns
> I1229 13:41:24.955737  3345 leveldb.cpp:273] Iterated through 0 keys in
> the db in 224ns
> I1229 13:41:24.956120  3345 replica.cpp:780] Replica recovered with log
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1229 13:41:24.961802  3345 main.cpp:464] Starting Mesos master
> I1229 13:41:24.965438  3345 master.cpp:367] Master
> a38658f7-89c1-4b1f-84f9-5796234b2104 (localhost) started on
> 10.10.10.118:5050
> I1229 13:41:24.965459  3345 master.cpp:369] Flags at startup:
> --allocation_interval="1secs" --allocator="HierarchicalDRF"
> --authenticate="false" --authenticate_slaves="false"
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf"
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true"
> --ip="10.10.10.118" --log_auto_initialize="true" --logbufsecs="0"
> --logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050"
> --quiet="false" --quorum="0" --recovery_slave_removal_limit="100%"
> --registry="replicated_log" --registry_fetch_timeout="1mins"
> --registry_store_timeout="5secs" --registry_strict="false"
> --root_submissions="true" --slave_ping_timeout="15secs"
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false"
> --webui_dir="/home/admin/mesos/build/../src/webui"
> --work_dir="/var/lib/mesos" --zk="zk://10.10.10.118:2181/mesos"
> --zk_session_timeout="10secs"
> I1229 13:41:24.965761  3345 master.cpp:416] Master allowing
> unauthenticated frameworks to register
> I1229 13:41:24.965772  3345 master.cpp:421] Master allowing
> unauthenticated slaves to register
> I1229 13:41:24.965837  3345 master.cpp:458] Using default 'crammd5'
> authenticator
> W1229 13:41:24.965867  3345 authenticator.cpp:513] No credentials
> provided, authentication requests will be refused
> I1229 13:41:24.965881  3345 authenticator.cpp:520] Initializing server SASL
> I1229 13:41:24.966788  3364 log.cpp:238] Attempting to join replica to
> ZooKeeper group
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@712: Client
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@716: Client
> environment:host.name=abc.def.com
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@723: Client
> environment:os.name=Linux
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@724: Client
> environment:os.arch=3.10.0-229.el7.x86_64
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@725: Client
> environment:os.version=#1 SMP Fri Mar 6 11:36:42 UTC 2015
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@733: Client
> environment:user.name=root
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@741: Client
> environment:user.home=/root
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@log_env@753: Client
> environment:user.dir=/home/admin/mesos/build
> 2015-12-29 13:41:24,970:3345(0x7f21042f3700):ZOO_INFO@zookeeper_init@786:
> Initiating client connection, host=10.10.10.118:2181 sessionTimeout=10000
> watcher=0x7f2110e3f16c sessionId=0 sessionPasswd=<null>
> context=0x7f20fc0038c0 flags=0
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@712: Client
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@716: Client
> environment:host.name=abc.def.com
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@723: Client
> environment:os.name=Linux
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@724: Client
> environment:os.arch=3.10.0-229.el7.x86_64
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@725: Client
> environment:os.version=#1 SMP Fri Mar 6 11:36:42 UTC 2015
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@733: Client
> environment:user.name=root
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@741: Client
> environment:user.home=/root
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@log_env@753: Client
> environment:user.dir=/home/admin/mesos/build
> 2015-12-29 13:41:24,970:3345(0x7f2103af2700):ZOO_INFO@zookeeper_init@786:
> Initiating client connection, host=10.10.10.118:2181 sessionTimeout=10000
> watcher=0x7f2110e3f16c sessionId=0 sessionPasswd=<null>
> context=0x7f20fc003a70 flags=0
> I1229 13:41:24.971629  3360 recover.cpp:449] Starting replica recovery
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@712: Client
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@716: Client
> environment:host.name=abc.def.com
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@723: Client
> environment:os.name=Linux
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@724: Client
> environment:os.arch=3.10.0-229.el7.x86_64
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@725: Client
> environment:os.version=#1 SMP Fri Mar 6 11:36:42 UTC 2015
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@733: Client
> environment:user.name=root
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@741: Client
> environment:user.home=/root
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@log_env@753: Client
> environment:user.dir=/home/admin/mesos/build
> 2015-12-29 13:41:24,972:3345(0x7f21062f7700):ZOO_INFO@zookeeper_init@786:
> Initiating client connection, host=10.10.10.118:2181 sessionTimeout=10000
> watcher=0x7f2110e3f16c sessionId=0 sessionPasswd=<null>
> context=0x7f20fc0078b0 flags=0
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@712: Client
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@716: Client
> environment:host.name=abc.def.com
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@723: Client
> environment:os.name=Linux
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@724: Client
> environment:os.arch=3.10.0-229.el7.x86_64
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@725: Client
> environment:os.version=#1 SMP Fri Mar 6 11:36:42 UTC 2015
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@733: Client
> environment:user.name=root
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@741: Client
> environment:user.home=/root
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@log_env@753: Client
> environment:user.dir=/home/admin/mesos/build
> 2015-12-29 13:41:24,972:3345(0x7f2105af6700):ZOO_INFO@zookeeper_init@786:
> Initiating client connection, host=10.10.10.118:2181 sessionTimeout=10000
> watcher=0x7f2110e3f16c sessionId=0 sessionPasswd=<null>
> context=0x7f20fc007f00 flags=0
> I1229 13:41:24.973780  3362 recover.cpp:475] Replica is in EMPTY status
> I1229 13:41:24.979076  3362 replica.cpp:676] Replica in EMPTY status
> received a broadcasted recover request from (4)@10.10.10.118:5050
> I1229 13:41:24.979863  3362 recover.cpp:195] Received a recover response
> from a replica in EMPTY status
> F1229 13:41:24.980000  3362 recover.cpp:219]
> CHECK_SOME(lowestBeginPosition): is NONE
> *** Check failure stack trace: ***
>     @     0x7f211143a6a2  google::LogMessage::Fail()
>     @     0x7f211143a601  google::LogMessage::SendToLog()
> 2015-12-29 13:41:24,995:3345(0x7f20f37fe700):ZOO_INFO@check_events@1703:
> initiated connection to server [10.10.10.118:2181]
> 2015-12-29 13:41:25,004:3345(0x7f21008d9700):ZOO_INFO@check_events@1703:
> initiated connection to server [10.10.10.118:2181]
> 2015-12-29 13:41:25,004:3345(0x7f21018db700):ZOO_INFO@check_events@1703:
> initiated connection to server [10.10.10.118:2181]
> 2015-12-29 13:41:25,004:3345(0x7f20f27fc700):ZOO_INFO@check_events@1703:
> initiated connection to server [10.10.10.118:2181]
>     @     0x7f211143a012  google::LogMessage::Flush()
>     @     0x7f211143cd46  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f211056d44c  _CheckFatal::~_CheckFatal()
>     @     0x7f211125a243
> mesos::internal::log::RecoverProtocolProcess::received()
>     @     0x7f2111265ae6
> _ZZN7process8dispatchI6OptionIN5mesos8internal3log15RecoverResponseEENS4_22RecoverProtocolProcessERKNS_6FutureIS5_EES9_EENS8_IT_EERKNS_3PIDIT0_EEMSF_FSD_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESO_
>     @     0x7f211127c5d5
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI6OptionIN5mesos8internal3log15RecoverResponseEENS8_22RecoverProtocolProcessERKNS0_6FutureIS9_EESD_EENSC_IT_EERKNS0_3PIDIT0_EEMSJ_FSH_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>     @     0x7f21113c0d7d  std::function<>::operator()()
>     @     0x7f21113a8b95  process::ProcessBase::visit()
>     @     0x7f21113ac960  process::DispatchEvent::visit()
>     @           0x471dd8  process::ProcessBase::serve()
> I1229 13:41:25.136451  3366 contender.cpp:149] Joining the ZK group
>     @     0x7f21113a4f81  process::ProcessManager::resume()
>     @     0x7f21113a21b2
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
>     @     0x7f21113ac18c
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
>     @     0x7f21113ac13c
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
>     @     0x7f21113ac0ce
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
>     @     0x7f21113ac025
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
>     @     0x7f21113abfbe
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>     @     0x7f210ce3f220  (unknown)
>     @     0x7f210d099dc5  start_thread
>     @     0x7f210c5a721d  __clone
> *Aborted (core dumped)*
>
>

Reply via email to