Re: [ISSUE] Check failed: slave.maintenance.isSome()

2017-09-15 Thread Qi Feng
This case could be reproduced by calling `for i in {1..8}; do python call.py; 
done` (call.py gist: 
https://gist.github.com/athlum/e2cd04bfb9f81a790d31643606252a49 ).

Looks like there is something wrong when call /maintenance/schedule 
concurrently.

We met this case because we use wrote a service base on ansible that manage the 
mesos cluster. When we create a task to update slave configs with a certain 
number of workers. Just like:

  1.  call schedule for 3 machine: a,b,c.
  2.  as machine a was done, maintenance window updates to: b,c
  3.  as an other machine "d" assigned after a immediately, windows will update 
to: b,c,d

This change sometimes happen in little interval. Then we find the fatal log 
just in Bayou's mail.

What's the right way to update maintanence window? Thanks to any reply.


From: Bayou 
Sent: Thursday, September 14, 2017 12:06 PM
To: user@mesos.apache.org
Subject: [ISSUE] Check failed: slave.maintenance.isSome()

Hi all,
   I’m trying to continuously do mesos-maintenance-schedule, machine-down, 
machine-up, mesos-maintenance-schedule-cancel over and over again in a 
three-slaves cluster, no any other operations, just trying mesos API to do 
these to schedule the three slaves asynchronously. At the beginning, It worked 
well, after I tried many times, about hundreds times, unfortunately, there were 
alway a Check failed of slave.maintenance.isSome() and mesos master crashed, 
the origin code at
https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L983
And some log from mesos master at below:
2017-09-12 16:39:07.394 err mesos-master[254491]: F0912 16:39:07.393944 254527 
hierarchical.cpp:903] Check failed: slave.maintenance.isSome()
2017-09-12 16:39:07.394 err mesos-master[254491]: *** Check failure stack 
trace: ***
2017-09-12 16:39:07.402 err mesos-master[254491]: @ 0x7f4cf356fba6  
google::LogMessage::Fail()
2017-09-12 16:39:07.413 err mesos-master[254491]: @ 0x7f4cf356fb05  
google::LogMessage::SendToLog()
2017-09-12 16:39:07.420 err mesos-master[254491]: @ 0x7f4cf356f516  
google::LogMessage::Flush()
2017-09-12 16:39:07.424 err mesos-master[254491]: @ 0x7f4cf357224a  
google::LogMessageFatal::~LogMessageFatal()
2017-09-12 16:39:07.429 err mesos-master[254491]: @ 0x7f4cf2344a32  
mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::updateInverseOffer()
2017-09-12 16:39:07.435 err mesos-master[254491]: @ 0x7f4cf1f8d9f9  
_ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_7SlaveIDERKNS1_11FrameworkIDERK6OptionINS1_20UnavailableResourcesEERKSC_INS1_9allocator18InverseOfferStatusEERKSC_INS1_7FiltersEES6_S9_SE_SJ_SN_EEvRKNS_3PIDIT_EEMSR_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES18_
2017-09-12 16:39:07.445 err mesos-master[254491]: @ 0x7f4cf1f938bb  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_7SlaveIDERKNS5_11FrameworkIDERK6OptionINS5_20UnavailableResourcesEERKSG_INS5_9allocator18InverseOfferStatusEERKSG_INS5_7FiltersEESA_SD_SI_SN_SR_EEvRKNS0_3PIDIT_EEMSV_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
2017-09-12 16:39:07.455 err mesos-master[254491]: @ 0x7f4cf34dd049  
std::function<>::operator()()
2017-09-12 16:39:07.460 err mesos-master[254491]: @ 0x7f4cf34c1285  
process::ProcessBase::visit()
2017-09-12 16:39:07.464 err mesos-master[254491]: @ 0x7f4cf34cc58a  
process::DispatchEvent::visit()
2017-09-12 16:39:07.465 err mesos-master[254491]: @ 0x7f4cf4e4ad4e  
process::ProcessBase::serve()
2017-09-12 16:39:07.469 err mesos-master[254491]: @ 0x7f4cf34bd281  
process::ProcessManager::resume()
2017-09-12 16:39:07.471 err mesos-master[254491]: @ 0x7f4cf34b9a2c  
_ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
2017-09-12 16:39:07.473 err mesos-master[254491]: @ 0x7f4cf34cbbf2  
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
2017-09-12 16:39:07.475 err mesos-master[254491]: @ 0x7f4cf34cbb36  
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
2017-09-12 16:39:07.477 err mesos-master[254491]: @ 0x7f4cf34cbac0  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced3ba1e0  
(unknown)
2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced613dc5  
start_thread
2017-09-12 16:39:07.479 err mesos-master[254491]: @ 0x7f4cecb21ced  
__clone
2017-09-12 16:39:07.486 notice systemd[1]: mesos-master.service: main process 
exited, code=killed, status=6/ABRT

Is this an issue or what I did something wrong? Hope someone could help me to 
work 

Re: Question on Mesos 1.1.0 LaunchGroup

2016-11-21 Thread Qi Feng
Thanks for the reply.


To be honest, I'm not start from mesos container era. We use docker first, and 
then try mesos to be a scheduler.

It's true, I could control more on mesos framwork. But it's much more costly 
than k8s.

Switch to another containerizer technology may be simple and easy to me, but 
seems impassible to my company in a period of time.


From: haosdent <haosd...@gmail.com>
Sent: Monday, November 21, 2016 9:08:23 AM
To: user
Subject: Re: Question on Mesos 1.1.0 LaunchGroup

Hi, @Qi Feng. Actually you could continue to use docker image via Mesos 
container. You could refer to 
https://github.com/apache/mesos/blob/master/docs/container-image.md for more 
details.
[https://avatars1.githubusercontent.com/u/47359?v=3=400]<https://github.com/apache/mesos/blob/master/docs/container-image.md>

mesos/container-image.md at master · apache/mesos · 
GitHub<https://github.com/apache/mesos/blob/master/docs/container-image.md>
github.com
mesos - Mirror of Apache Mesos ... release-0.11.0-incubating-RC3 
release-0.11.0-incubating-RC2 release-0.11.0-incubating-RC1 release ...



On Mon, Nov 21, 2016 at 5:04 PM, Qi Feng 
<athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote:

I don't understand why leave docker. Would we could have launch_group for 
docker in the future?
Or we can only write an executor for that.

Thanks.


From: haosdent <haosd...@gmail.com<mailto:haosd...@gmail.com>>
Sent: Monday, November 21, 2016 8:18:13 AM

To: user
Subject: Re: Question on Mesos 1.1.0 LaunchGroup

Yep, only mesos container is supported.

On Mon, Nov 21, 2016 at 4:14 PM, Qi Feng 
<athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote:

Thanks haosdent.

I tried to use docker containerInfo to launch group task, but got "Docker 
ContainerInfo is not supported on the task".
Is it support mesos container only?


From: haosdent <haosd...@gmail.com<mailto:haosd...@gmail.com>>
Sent: Friday, November 18, 2016 4:54:07 PM
To: user
Subject: Re: Question on Mesos 1.1.0 LaunchGroup

Hi, @Qi You may refer `mesos-executor` about how to build `LaunchGroup` 
https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L498-L524

```
 operation->set_type(Offer::Operation::LAUNCH_GROUP);

 ExecutorInfo* executorInfo =
   operation->mutable_launch_group()->mutable_executor();

 executorInfo->set_type(ExecutorInfo::DEFAULT);
 executorInfo->mutable_executor_id()->set_value(
 "default-executor");
...
```
As you see, executor-id is a string here and you could use any string to 
identify the executor.

On Fri, Nov 18, 2016 at 3:47 PM, Qi Feng 
<athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote:

I'm trying the LaunchGroup feature.

But I find the an executorInfo is required.


message LaunchGroup {
  required ExecutorInfo executor =3D 1;
  required TaskGroupInfo task_group =3D 2;
}

What's more, an executor id is required in executorInfo. How would I build =
the executorInfo if I use the default executor of mesos?
https://github.com/apache/mesos/blob/1.1.x/include/mesos/mesos.proto#L566

Thanks for any reply.



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang


Re: Question on Mesos 1.1.0 LaunchGroup

2016-11-21 Thread Qi Feng
I don't understand why leave docker. Would we could have launch_group for 
docker in the future?
Or we can only write an executor for that.

Thanks.


From: haosdent <haosd...@gmail.com>
Sent: Monday, November 21, 2016 8:18:13 AM
To: user
Subject: Re: Question on Mesos 1.1.0 LaunchGroup

Yep, only mesos container is supported.

On Mon, Nov 21, 2016 at 4:14 PM, Qi Feng 
<athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote:

Thanks haosdent.

I tried to use docker containerInfo to launch group task, but got "Docker 
ContainerInfo is not supported on the task".
Is it support mesos container only?


From: haosdent <haosd...@gmail.com<mailto:haosd...@gmail.com>>
Sent: Friday, November 18, 2016 4:54:07 PM
To: user
Subject: Re: Question on Mesos 1.1.0 LaunchGroup

Hi, @Qi You may refer `mesos-executor` about how to build `LaunchGroup` 
https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L498-L524

```
 operation->set_type(Offer::Operation::LAUNCH_GROUP);

 ExecutorInfo* executorInfo =
   operation->mutable_launch_group()->mutable_executor();

 executorInfo->set_type(ExecutorInfo::DEFAULT);
 executorInfo->mutable_executor_id()->set_value(
 "default-executor");
...
```
As you see, executor-id is a string here and you could use any string to 
identify the executor.

On Fri, Nov 18, 2016 at 3:47 PM, Qi Feng 
<athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote:

I'm trying the LaunchGroup feature.

But I find the an executorInfo is required.


message LaunchGroup {
  required ExecutorInfo executor =3D 1;
  required TaskGroupInfo task_group =3D 2;
}

What's more, an executor id is required in executorInfo. How would I build =
the executorInfo if I use the default executor of mesos?
https://github.com/apache/mesos/blob/1.1.x/include/mesos/mesos.proto#L566

Thanks for any reply.



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang


Re: Question on Mesos 1.1.0 LaunchGroup

2016-11-21 Thread Qi Feng
Thanks haosdent.

I tried to use docker containerInfo to launch group task, but got "Docker 
ContainerInfo is not supported on the task".
Is it support mesos container only?


From: haosdent <haosd...@gmail.com>
Sent: Friday, November 18, 2016 4:54:07 PM
To: user
Subject: Re: Question on Mesos 1.1.0 LaunchGroup

Hi, @Qi You may refer `mesos-executor` about how to build `LaunchGroup` 
https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L498-L524

```
 operation->set_type(Offer::Operation::LAUNCH_GROUP);

 ExecutorInfo* executorInfo =
   operation->mutable_launch_group()->mutable_executor();

 executorInfo->set_type(ExecutorInfo::DEFAULT);
 executorInfo->mutable_executor_id()->set_value(
 "default-executor");
...
```
As you see, executor-id is a string here and you could use any string to 
identify the executor.

On Fri, Nov 18, 2016 at 3:47 PM, Qi Feng 
<athlum5...@outlook.com<mailto:athlum5...@outlook.com>> wrote:

I'm trying the LaunchGroup feature.

But I find the an executorInfo is required.


message LaunchGroup {
  required ExecutorInfo executor =3D 1;
  required TaskGroupInfo task_group =3D 2;
}

What's more, an executor id is required in executorInfo. How would I build =
the executorInfo if I use the default executor of mesos?
https://github.com/apache/mesos/blob/1.1.x/include/mesos/mesos.proto#L566

Thanks for any reply.



--
Best Regards,
Haosdent Huang


max_executors_per_agent does not take effect on mesos docker executor

2016-07-21 Thread Qi Feng
I build mesos-1.0.0-rc2 with network isolator. And try to set 
max_executors_per_agent=10 to test if docker task would be limited in 10 on 
every mesos agent. In fact, my case is launching 40 tasks (0.1core 0.1M mem 
each) on three different agent machine, and both agent launched more than 10 
tasks.


I found mesos master hold executor data in a haspmap, and the key is ExecutorID.

https://github.com/apache/mesos/blob/1.0.x/src/master/master.hpp#L306

https://github.com/apache/mesos/blob/1.0.x/src/master/master.cpp#L5747


Then I get state.json from mesos master to looking for any executor 
information. Then I found executer_id is empty string in taskInfo json. Is 
there any relation between the empty executor id and max_executors_per_agent 
issue?


My OS is centos 7.2.


Thanks for any reply [] .