Re: [ISSUE] Check failed: slave.maintenance.isSome()
This case could be reproduced by calling `for i in {1..8}; do python call.py; done` (call.py gist: https://gist.github.com/athlum/e2cd04bfb9f81a790d31643606252a49 ). Looks like there is something wrong when call /maintenance/schedule concurrently. We met this case because we use wrote a service base on ansible that manage the mesos cluster. When we create a task to update slave configs with a certain number of workers. Just like: 1. call schedule for 3 machine: a,b,c. 2. as machine a was done, maintenance window updates to: b,c 3. as an other machine "d" assigned after a immediately, windows will update to: b,c,d This change sometimes happen in little interval. Then we find the fatal log just in Bayou's mail. What's the right way to update maintanence window? Thanks to any reply. From: BayouSent: Thursday, September 14, 2017 12:06 PM To: user@mesos.apache.org Subject: [ISSUE] Check failed: slave.maintenance.isSome() Hi all, I’m trying to continuously do mesos-maintenance-schedule, machine-down, machine-up, mesos-maintenance-schedule-cancel over and over again in a three-slaves cluster, no any other operations, just trying mesos API to do these to schedule the three slaves asynchronously. At the beginning, It worked well, after I tried many times, about hundreds times, unfortunately, there were alway a Check failed of slave.maintenance.isSome() and mesos master crashed, the origin code at https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L983 And some log from mesos master at below: 2017-09-12 16:39:07.394 err mesos-master[254491]: F0912 16:39:07.393944 254527 hierarchical.cpp:903] Check failed: slave.maintenance.isSome() 2017-09-12 16:39:07.394 err mesos-master[254491]: *** Check failure stack trace: *** 2017-09-12 16:39:07.402 err mesos-master[254491]: @ 0x7f4cf356fba6 google::LogMessage::Fail() 2017-09-12 16:39:07.413 err mesos-master[254491]: @ 0x7f4cf356fb05 google::LogMessage::SendToLog() 2017-09-12 16:39:07.420 err mesos-master[254491]: @ 0x7f4cf356f516 google::LogMessage::Flush() 2017-09-12 16:39:07.424 err mesos-master[254491]: @ 0x7f4cf357224a google::LogMessageFatal::~LogMessageFatal() 2017-09-12 16:39:07.429 err mesos-master[254491]: @ 0x7f4cf2344a32 mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::updateInverseOffer() 2017-09-12 16:39:07.435 err mesos-master[254491]: @ 0x7f4cf1f8d9f9 _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_7SlaveIDERKNS1_11FrameworkIDERK6OptionINS1_20UnavailableResourcesEERKSC_INS1_9allocator18InverseOfferStatusEERKSC_INS1_7FiltersEES6_S9_SE_SJ_SN_EEvRKNS_3PIDIT_EEMSR_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES18_ 2017-09-12 16:39:07.445 err mesos-master[254491]: @ 0x7f4cf1f938bb _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_7SlaveIDERKNS5_11FrameworkIDERK6OptionINS5_20UnavailableResourcesEERKSG_INS5_9allocator18InverseOfferStatusEERKSG_INS5_7FiltersEESA_SD_SI_SN_SR_EEvRKNS0_3PIDIT_EEMSV_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ 2017-09-12 16:39:07.455 err mesos-master[254491]: @ 0x7f4cf34dd049 std::function<>::operator()() 2017-09-12 16:39:07.460 err mesos-master[254491]: @ 0x7f4cf34c1285 process::ProcessBase::visit() 2017-09-12 16:39:07.464 err mesos-master[254491]: @ 0x7f4cf34cc58a process::DispatchEvent::visit() 2017-09-12 16:39:07.465 err mesos-master[254491]: @ 0x7f4cf4e4ad4e process::ProcessBase::serve() 2017-09-12 16:39:07.469 err mesos-master[254491]: @ 0x7f4cf34bd281 process::ProcessManager::resume() 2017-09-12 16:39:07.471 err mesos-master[254491]: @ 0x7f4cf34b9a2c _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv 2017-09-12 16:39:07.473 err mesos-master[254491]: @ 0x7f4cf34cbbf2 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE 2017-09-12 16:39:07.475 err mesos-master[254491]: @ 0x7f4cf34cbb36 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv 2017-09-12 16:39:07.477 err mesos-master[254491]: @ 0x7f4cf34cbac0 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv 2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced3ba1e0 (unknown) 2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced613dc5 start_thread 2017-09-12 16:39:07.479 err mesos-master[254491]: @ 0x7f4cecb21ced __clone 2017-09-12 16:39:07.486 notice systemd[1]: mesos-master.service: main process exited, code=killed, status=6/ABRT Is this an issue or what I did something wrong? Hope someone could help me to work
Re: Question on Mesos 1.1.0 LaunchGroup
Thanks for the reply. To be honest, I'm not start from mesos container era. We use docker first, and then try mesos to be a scheduler. It's true, I could control more on mesos framwork. But it's much more costly than k8s. Switch to another containerizer technology may be simple and easy to me, but seems impassible to my company in a period of time. From: haosdent <haosd...@gmail.com> Sent: Monday, November 21, 2016 9:08:23 AM To: user Subject: Re: Question on Mesos 1.1.0 LaunchGroup Hi, @Qi Feng. Actually you could continue to use docker image via Mesos container. You could refer to https://github.com/apache/mesos/blob/master/docs/container-image.md for more details. [https://avatars1.githubusercontent.com/u/47359?v=3=400]<https://github.com/apache/mesos/blob/master/docs/container-image.md> mesos/container-image.md at master · apache/mesos · GitHub<https://github.com/apache/mesos/blob/master/docs/container-image.md> github.com mesos - Mirror of Apache Mesos ... release-0.11.0-incubating-RC3 release-0.11.0-incubating-RC2 release-0.11.0-incubating-RC1 release ... On Mon, Nov 21, 2016 at 5:04 PM, Qi Feng <athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote: I don't understand why leave docker. Would we could have launch_group for docker in the future? Or we can only write an executor for that. Thanks. From: haosdent <haosd...@gmail.com<mailto:haosd...@gmail.com>> Sent: Monday, November 21, 2016 8:18:13 AM To: user Subject: Re: Question on Mesos 1.1.0 LaunchGroup Yep, only mesos container is supported. On Mon, Nov 21, 2016 at 4:14 PM, Qi Feng <athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote: Thanks haosdent. I tried to use docker containerInfo to launch group task, but got "Docker ContainerInfo is not supported on the task". Is it support mesos container only? From: haosdent <haosd...@gmail.com<mailto:haosd...@gmail.com>> Sent: Friday, November 18, 2016 4:54:07 PM To: user Subject: Re: Question on Mesos 1.1.0 LaunchGroup Hi, @Qi You may refer `mesos-executor` about how to build `LaunchGroup` https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L498-L524 ``` operation->set_type(Offer::Operation::LAUNCH_GROUP); ExecutorInfo* executorInfo = operation->mutable_launch_group()->mutable_executor(); executorInfo->set_type(ExecutorInfo::DEFAULT); executorInfo->mutable_executor_id()->set_value( "default-executor"); ... ``` As you see, executor-id is a string here and you could use any string to identify the executor. On Fri, Nov 18, 2016 at 3:47 PM, Qi Feng <athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote: I'm trying the LaunchGroup feature. But I find the an executorInfo is required. message LaunchGroup { required ExecutorInfo executor =3D 1; required TaskGroupInfo task_group =3D 2; } What's more, an executor id is required in executorInfo. How would I build = the executorInfo if I use the default executor of mesos? https://github.com/apache/mesos/blob/1.1.x/include/mesos/mesos.proto#L566 Thanks for any reply. -- Best Regards, Haosdent Huang -- Best Regards, Haosdent Huang -- Best Regards, Haosdent Huang
Re: Question on Mesos 1.1.0 LaunchGroup
I don't understand why leave docker. Would we could have launch_group for docker in the future? Or we can only write an executor for that. Thanks. From: haosdent <haosd...@gmail.com> Sent: Monday, November 21, 2016 8:18:13 AM To: user Subject: Re: Question on Mesos 1.1.0 LaunchGroup Yep, only mesos container is supported. On Mon, Nov 21, 2016 at 4:14 PM, Qi Feng <athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote: Thanks haosdent. I tried to use docker containerInfo to launch group task, but got "Docker ContainerInfo is not supported on the task". Is it support mesos container only? From: haosdent <haosd...@gmail.com<mailto:haosd...@gmail.com>> Sent: Friday, November 18, 2016 4:54:07 PM To: user Subject: Re: Question on Mesos 1.1.0 LaunchGroup Hi, @Qi You may refer `mesos-executor` about how to build `LaunchGroup` https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L498-L524 ``` operation->set_type(Offer::Operation::LAUNCH_GROUP); ExecutorInfo* executorInfo = operation->mutable_launch_group()->mutable_executor(); executorInfo->set_type(ExecutorInfo::DEFAULT); executorInfo->mutable_executor_id()->set_value( "default-executor"); ... ``` As you see, executor-id is a string here and you could use any string to identify the executor. On Fri, Nov 18, 2016 at 3:47 PM, Qi Feng <athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote: I'm trying the LaunchGroup feature. But I find the an executorInfo is required. message LaunchGroup { required ExecutorInfo executor =3D 1; required TaskGroupInfo task_group =3D 2; } What's more, an executor id is required in executorInfo. How would I build = the executorInfo if I use the default executor of mesos? https://github.com/apache/mesos/blob/1.1.x/include/mesos/mesos.proto#L566 Thanks for any reply. -- Best Regards, Haosdent Huang -- Best Regards, Haosdent Huang
Re: Question on Mesos 1.1.0 LaunchGroup
Thanks haosdent. I tried to use docker containerInfo to launch group task, but got "Docker ContainerInfo is not supported on the task". Is it support mesos container only? From: haosdent <haosd...@gmail.com> Sent: Friday, November 18, 2016 4:54:07 PM To: user Subject: Re: Question on Mesos 1.1.0 LaunchGroup Hi, @Qi You may refer `mesos-executor` about how to build `LaunchGroup` https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L498-L524 ``` operation->set_type(Offer::Operation::LAUNCH_GROUP); ExecutorInfo* executorInfo = operation->mutable_launch_group()->mutable_executor(); executorInfo->set_type(ExecutorInfo::DEFAULT); executorInfo->mutable_executor_id()->set_value( "default-executor"); ... ``` As you see, executor-id is a string here and you could use any string to identify the executor. On Fri, Nov 18, 2016 at 3:47 PM, Qi Feng <athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote: I'm trying the LaunchGroup feature. But I find the an executorInfo is required. message LaunchGroup { required ExecutorInfo executor =3D 1; required TaskGroupInfo task_group =3D 2; } What's more, an executor id is required in executorInfo. How would I build = the executorInfo if I use the default executor of mesos? https://github.com/apache/mesos/blob/1.1.x/include/mesos/mesos.proto#L566 Thanks for any reply. -- Best Regards, Haosdent Huang
max_executors_per_agent does not take effect on mesos docker executor
I build mesos-1.0.0-rc2 with network isolator. And try to set max_executors_per_agent=10 to test if docker task would be limited in 10 on every mesos agent. In fact, my case is launching 40 tasks (0.1core 0.1M mem each) on three different agent machine, and both agent launched more than 10 tasks. I found mesos master hold executor data in a haspmap, and the key is ExecutorID. https://github.com/apache/mesos/blob/1.0.x/src/master/master.hpp#L306 https://github.com/apache/mesos/blob/1.0.x/src/master/master.cpp#L5747 Then I get state.json from mesos master to looking for any executor information. Then I found executer_id is empty string in taskInfo json. Is there any relation between the empty executor id and max_executors_per_agent issue? My OS is centos 7.2. Thanks for any reply [] .