This looks similar to https://issues.apache.org/jira/browse/MESOS-7966. Can you add your information and logs to that ticket?
On Fri, Sep 15, 2017 at 3:18 AM, Qi Feng <athlum5...@outlook.com> wrote: > My mesos version is 1.2.0. Sorry. > > ------------------------------ > *From:* Qi Feng <athlum5...@outlook.com> > *Sent:* Friday, September 15, 2017 10:14 AM > *To:* user@mesos.apache.org; Bayou > *Subject:* Re: [ISSUE] Check failed: slave.maintenance.isSome() > > > This case could be reproduced by calling `for i in {1..8}; do python > call.py; done` (call.py gist: https://gist.github.com/athlum/ > e2cd04bfb9f81a790d31643606252a49 ). > > Looks like there is something wrong when call /maintenance/schedule > concurrently. > > We met this case because we use wrote a service base on ansible that > manage the mesos cluster. When we create a task to update slave configs > with a certain number of workers. Just like: > > 1. call schedule for 3 machine: a,b,c. > 2. as machine a was done, maintenance window updates to: b,c > 3. as an other machine "d" assigned after a immediately, windows will > update to: b,c,d > > This change sometimes happen in little interval. Then we find the fatal > log just in Bayou's mail. > > What's the right way to update maintanence window? Thanks to any reply. > > > ------------------------------ > *From:* Bayou <zkcresc...@gmail.com> > *Sent:* Thursday, September 14, 2017 12:06 PM > *To:* user@mesos.apache.org > *Subject:* [ISSUE] Check failed: slave.maintenance.isSome() > > Hi all, > I’m trying to continuously do mesos-maintenance-schedule, machine-down, > machine-up, mesos-maintenance-schedule-cancel over and over again in a > three-slaves cluster, no any other operations, just trying mesos API to do > these to schedule the three slaves asynchronously. At the beginning, It > worked well, after I tried many times, about hundreds times, unfortunately, > there were alway a Check failed of slave.maintenance.isSome() and mesos > master crashed, the origin code at > https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd > 347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L983 > And some log from mesos master at below: > 2017-09-12 16:39:07.394 err mesos-master[254491]: F0912 16:39:07.393944 > 254527 hierarchical.cpp:903] Check failed: slave.maintenance.isSome() > 2017-09-12 16:39:07.394 err mesos-master[254491]: *** Check failure stack > trace: *** > 2017-09-12 16:39:07.402 err mesos-master[254491]: @ > 0x7f4cf356fba6 google::LogMessage::Fail() > 2017-09-12 16:39:07.413 err mesos-master[254491]: @ > 0x7f4cf356fb05 google::LogMessage::SendToLog() > 2017-09-12 16:39:07.420 err mesos-master[254491]: @ > 0x7f4cf356f516 google::LogMessage::Flush() > 2017-09-12 16:39:07.424 err mesos-master[254491]: @ > 0x7f4cf357224a google::LogMessageFatal::~LogMessageFatal() > 2017-09-12 16:39:07.429 err mesos-master[254491]: @ > 0x7f4cf2344a32 mesos::internal::master::allocator::internal:: > HierarchicalAllocatorProcess::updateInverseOffer() > 2017-09-12 16:39:07.435 err mesos-master[254491]: @ > 0x7f4cf1f8d9f9 _ZZN7process8dispatchIN5mesos8i > nternal6master9allocator21MesosAllocatorProcessERKNS1_7SlaveIDERKNS1_ > 11FrameworkIDERK6OptionINS1_20UnavailableResourcesEERKSC_INS1_ > 9allocator18InverseOfferStatusEERKSC_INS1_7FiltersEES6_S9_ > SE_SJ_SN_EEvRKNS_3PIDIT_EEMSR_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_ > T9_ENKUlPNS_11ProcessBaseEE_clES18_ > 2017-09-12 16:39:07.445 err mesos-master[254491]: @ > 0x7f4cf1f938bb _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_ > 8dispatchIN5mesos8internal6master9allocator21MesosAllocatorP > rocessERKNS5_7SlaveIDERKNS5_11FrameworkIDERK6OptionINS5_ > 20UnavailableResourcesEERKSG_INS5_9allocator18InverseOfferStatus > EERKSG_INS5_7FiltersEESA_SD_SI_SN_SR_EEvRKNS0_3PIDIT_ > EEMSV_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_ > invokeERKSt9_Any_dataS2_ > 2017-09-12 16:39:07.455 err mesos-master[254491]: @ > 0x7f4cf34dd049 std::function<>::operator()() > 2017-09-12 16:39:07.460 err mesos-master[254491]: @ > 0x7f4cf34c1285 process::ProcessBase::visit() > 2017-09-12 16:39:07.464 err mesos-master[254491]: @ > 0x7f4cf34cc58a process::DispatchEvent::visit() > 2017-09-12 16:39:07.465 err mesos-master[254491]: @ > 0x7f4cf4e4ad4e process::ProcessBase::serve() > 2017-09-12 16:39:07.469 err mesos-master[254491]: @ > 0x7f4cf34bd281 process::ProcessManager::resume() > 2017-09-12 16:39:07.471 err mesos-master[254491]: @ > 0x7f4cf34b9a2c _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv > 2017-09-12 16:39:07.473 err mesos-master[254491]: @ > 0x7f4cf34cbbf2 _ZNSt12_Bind_simpleIFZN7process14ProcessMan > ager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > 2017-09-12 16:39:07.475 err mesos-master[254491]: @ > 0x7f4cf34cbb36 _ZNSt12_Bind_simpleIFZN7process14ProcessMan > ager12init_threadsEvEUt_vEEclEv > 2017-09-12 16:39:07.477 err mesos-master[254491]: @ > 0x7f4cf34cbac0 _ZNSt6thread5_ImplISt12_Bind_ > simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > 2017-09-12 16:39:07.478 err mesos-master[254491]: @ > 0x7f4ced3ba1e0 (unknown) > 2017-09-12 16:39:07.478 err mesos-master[254491]: @ > 0x7f4ced613dc5 start_thread > 2017-09-12 16:39:07.479 err mesos-master[254491]: @ > 0x7f4cecb21ced __clone > 2017-09-12 16:39:07.486 notice systemd[1]: mesos-master.service: main > process exited, code=killed, status=6/ABRT > > Is this an issue or what I did something wrong? Hope someone could help me > to work out this. Thank you. >