My mesos version is 1.2.0. Sorry.
________________________________
From: Qi Feng <[email protected]>
Sent: Friday, September 15, 2017 10:14 AM
To: [email protected]; Bayou
Subject: Re: [ISSUE] Check failed: slave.maintenance.isSome()
This case could be reproduced by calling `for i in {1..8}; do python call.py;
done` (call.py gist:
https://gist.github.com/athlum/e2cd04bfb9f81a790d31643606252a49 ).
Looks like there is something wrong when call /maintenance/schedule
concurrently.
We met this case because we use wrote a service base on ansible that manage the
mesos cluster. When we create a task to update slave configs with a certain
number of workers. Just like:
1. call schedule for 3 machine: a,b,c.
2. as machine a was done, maintenance window updates to: b,c
3. as an other machine "d" assigned after a immediately, windows will update
to: b,c,d
This change sometimes happen in little interval. Then we find the fatal log
just in Bayou's mail.
What's the right way to update maintanence window? Thanks to any reply.
________________________________
From: Bayou <[email protected]>
Sent: Thursday, September 14, 2017 12:06 PM
To: [email protected]
Subject: [ISSUE] Check failed: slave.maintenance.isSome()
Hi all,
I’m trying to continuously do mesos-maintenance-schedule, machine-down,
machine-up, mesos-maintenance-schedule-cancel over and over again in a
three-slaves cluster, no any other operations, just trying mesos API to do
these to schedule the three slaves asynchronously. At the beginning, It worked
well, after I tried many times, about hundreds times, unfortunately, there were
alway a Check failed of slave.maintenance.isSome() and mesos master crashed,
the origin code at
https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L983
And some log from mesos master at below:
2017-09-12 16:39:07.394 err mesos-master[254491]: F0912 16:39:07.393944 254527
hierarchical.cpp:903] Check failed: slave.maintenance.isSome()
2017-09-12 16:39:07.394 err mesos-master[254491]: *** Check failure stack
trace: ***
2017-09-12 16:39:07.402 err mesos-master[254491]: @ 0x7f4cf356fba6
google::LogMessage::Fail()
2017-09-12 16:39:07.413 err mesos-master[254491]: @ 0x7f4cf356fb05
google::LogMessage::SendToLog()
2017-09-12 16:39:07.420 err mesos-master[254491]: @ 0x7f4cf356f516
google::LogMessage::Flush()
2017-09-12 16:39:07.424 err mesos-master[254491]: @ 0x7f4cf357224a
google::LogMessageFatal::~LogMessageFatal()
2017-09-12 16:39:07.429 err mesos-master[254491]: @ 0x7f4cf2344a32
mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::updateInverseOffer()
2017-09-12 16:39:07.435 err mesos-master[254491]: @ 0x7f4cf1f8d9f9
_ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_7SlaveIDERKNS1_11FrameworkIDERK6OptionINS1_20UnavailableResourcesEERKSC_INS1_9allocator18InverseOfferStatusEERKSC_INS1_7FiltersEES6_S9_SE_SJ_SN_EEvRKNS_3PIDIT_EEMSR_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES18_
2017-09-12 16:39:07.445 err mesos-master[254491]: @ 0x7f4cf1f938bb
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_7SlaveIDERKNS5_11FrameworkIDERK6OptionINS5_20UnavailableResourcesEERKSG_INS5_9allocator18InverseOfferStatusEERKSG_INS5_7FiltersEESA_SD_SI_SN_SR_EEvRKNS0_3PIDIT_EEMSV_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
2017-09-12 16:39:07.455 err mesos-master[254491]: @ 0x7f4cf34dd049
std::function<>::operator()()
2017-09-12 16:39:07.460 err mesos-master[254491]: @ 0x7f4cf34c1285
process::ProcessBase::visit()
2017-09-12 16:39:07.464 err mesos-master[254491]: @ 0x7f4cf34cc58a
process::DispatchEvent::visit()
2017-09-12 16:39:07.465 err mesos-master[254491]: @ 0x7f4cf4e4ad4e
process::ProcessBase::serve()
2017-09-12 16:39:07.469 err mesos-master[254491]: @ 0x7f4cf34bd281
process::ProcessManager::resume()
2017-09-12 16:39:07.471 err mesos-master[254491]: @ 0x7f4cf34b9a2c
_ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
2017-09-12 16:39:07.473 err mesos-master[254491]: @ 0x7f4cf34cbbf2
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
2017-09-12 16:39:07.475 err mesos-master[254491]: @ 0x7f4cf34cbb36
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
2017-09-12 16:39:07.477 err mesos-master[254491]: @ 0x7f4cf34cbac0
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced3ba1e0
(unknown)
2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced613dc5
start_thread
2017-09-12 16:39:07.479 err mesos-master[254491]: @ 0x7f4cecb21ced
__clone
2017-09-12 16:39:07.486 notice systemd[1]: mesos-master.service: main process
exited, code=killed, status=6/ABRT
Is this an issue or what I did something wrong? Hope someone could help me to
work out this. Thank you.