[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources

2020-01-20 Thread Charles (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019740#comment-17019740
 ] 

Charles commented on MESOS-1807:


BTW, how come this problem doesn't affect the command executor?

> Disallow executors with cpu only or memory only resources
> -
>
> Key: MESOS-1807
> URL: https://issues.apache.org/jira/browse/MESOS-1807
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Priority: Major
> Attachments: Screenshot 2015-07-28 14.40.35.png
>
>
> Currently master allows executors to be launched with either only cpus or 
> only memory but we shouldn't allow that.
> This is because executor is an actual unix process that is launched by the 
> slave. If an executor doesn't specify cpus, what should the cpu limits be for 
> that executor when there are no tasks running on it? If no cpu limits are set 
> then it might starve other executors/tasks on the slave violating isolation 
> guarantees. Same goes with memory. Moreover, the current 
> containerizer/isolator code will throw failures when using such an executor, 
> e.g., when the last task on the executor finishes and Containerizer::update() 
> is called with 0 cpus or 0 mem.
> According to a source code [TODO | 
> https://github.com/apache/mesos/blob/0226620747e1769434a1a83da547bfc3470a9549/src/master/validation.cpp#L400]
>  this should also include checking whether requested resources are greater 
> than  MIN_CPUS/MIN_BYTES.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-1807) Disallow executors with cpu only or memory only resources

2020-01-20 Thread Charles (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019733#comment-17019733
 ] 

Charles edited comment on MESOS-1807 at 1/20/20 8:52 PM:
-

Is there any way I could help this move forward?

I just got bitten by this where my custom executor would lead to random errors 
described as [~vinodkone] "when the last task on the executor finishes and 
Containerizer::update() is called with 0 cpus or 0 mem.". See for example 
https://github.com/mesos/chronos/issues/428

{noformat}
ec2-__-___-___-___.compute-1.amazonaws.com E0414 00:41:50.864876 29069 
slave.cpp:2344] Failed to update resources for container 
867bfec1-ac28-4a4f-8904-3404e6d1e3e9 of executor shell-wrapper-executor running 
task ct:1428972109061:0:my-chronos-job on status update for terminal task, 
destroying container: Collect failed: No cpus resource given
{noformat}


In the mean time what's the proper workaround? Always define CPU and memory 
resources for the executor? It's a bit annoying because it effectively means 
arbitrarily limiting the CPU usage of the task (e.g. if there's 1 core and we 
allocate 0.01 CPU to the executor, we only have 0.99 left for the task), but I 
guess there's no really any way around that.

In any case returning an error before accepting the tasks is better than 
accepting them with a warning and then randomly fail at a later point when the 
last task on the executor finishes.

Maybe [~bmahler] has an idea?



was (Author: charle):
Is there any way I could help this move forward?

I just got bitten by this where my custom executor would lead to random errors 
described as [~vinodkone] "when the last task on the executor finishes and 
Containerizer::update() is called with 0 cpus or 0 mem.". See for example 
https://github.com/mesos/chronos/issues/428

{noformat}
ec2-__-___-___-___.compute-1.amazonaws.com E0414 00:41:50.864876 29069 
slave.cpp:2344] Failed to update resources for container 
867bfec1-ac28-4a4f-8904-3404e6d1e3e9 of executor shell-wrapper-executor running 
task ct:1428972109061:0:my-chronos-job on status update for terminal task, 
destroying container: Collect failed: No cpus resource given
{noformat}


In the mean time what's the proper workaround? Always define CPU and memory 
resources for the executor? It's a bit annoying because it effectively means 
arbitrarily limiting the CPU usage of the task (e.g. if there's 1 core and we 
allocate 0.01 CPU to the executor, we only have 0.99 left for the task), but I 
guess there's no really any way around that. Maybe [~bmahler] has an idea?


> Disallow executors with cpu only or memory only resources
> -
>
> Key: MESOS-1807
> URL: https://issues.apache.org/jira/browse/MESOS-1807
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Priority: Major
> Attachments: Screenshot 2015-07-28 14.40.35.png
>
>
> Currently master allows executors to be launched with either only cpus or 
> only memory but we shouldn't allow that.
> This is because executor is an actual unix process that is launched by the 
> slave. If an executor doesn't specify cpus, what should the cpu limits be for 
> that executor when there are no tasks running on it? If no cpu limits are set 
> then it might starve other executors/tasks on the slave violating isolation 
> guarantees. Same goes with memory. Moreover, the current 
> containerizer/isolator code will throw failures when using such an executor, 
> e.g., when the last task on the executor finishes and Containerizer::update() 
> is called with 0 cpus or 0 mem.
> According to a source code [TODO | 
> https://github.com/apache/mesos/blob/0226620747e1769434a1a83da547bfc3470a9549/src/master/validation.cpp#L400]
>  this should also include checking whether requested resources are greater 
> than  MIN_CPUS/MIN_BYTES.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources

2020-01-20 Thread Charles (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019733#comment-17019733
 ] 

Charles commented on MESOS-1807:


Is there any way I could help this move forward?

I just got bitten by this where my custom executor would lead to random errors 
described as [~vinodkone] "when the last task on the executor finishes and 
Containerizer::update() is called with 0 cpus or 0 mem.". See for example 
https://github.com/mesos/chronos/issues/428

{noformat}
ec2-__-___-___-___.compute-1.amazonaws.com E0414 00:41:50.864876 29069 
slave.cpp:2344] Failed to update resources for container 
867bfec1-ac28-4a4f-8904-3404e6d1e3e9 of executor shell-wrapper-executor running 
task ct:1428972109061:0:my-chronos-job on status update for terminal task, 
destroying container: Collect failed: No cpus resource given
{noformat}


In the mean time what's the proper workaround? Always define CPU and memory 
resources for the executor? It's a bit annoying because it effectively means 
arbitrarily limiting the CPU usage of the task (e.g. if there's 1 core and we 
allocate 0.01 CPU to the executor, we only have 0.99 left for the task), but I 
guess there's no really any way around that. Maybe [~bmahler] has an idea?


> Disallow executors with cpu only or memory only resources
> -
>
> Key: MESOS-1807
> URL: https://issues.apache.org/jira/browse/MESOS-1807
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Priority: Major
> Attachments: Screenshot 2015-07-28 14.40.35.png
>
>
> Currently master allows executors to be launched with either only cpus or 
> only memory but we shouldn't allow that.
> This is because executor is an actual unix process that is launched by the 
> slave. If an executor doesn't specify cpus, what should the cpu limits be for 
> that executor when there are no tasks running on it? If no cpu limits are set 
> then it might starve other executors/tasks on the slave violating isolation 
> guarantees. Same goes with memory. Moreover, the current 
> containerizer/isolator code will throw failures when using such an executor, 
> e.g., when the last task on the executor finishes and Containerizer::update() 
> is called with 0 cpus or 0 mem.
> According to a source code [TODO | 
> https://github.com/apache/mesos/blob/0226620747e1769434a1a83da547bfc3470a9549/src/master/validation.cpp#L400]
>  this should also include checking whether requested resources are greater 
> than  MIN_CPUS/MIN_BYTES.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-8537) Default executor doesn't wait for status updates to be ack'd before shutting down

2020-01-20 Thread Andrei Budnik (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik reassigned MESOS-8537:


Assignee: Andrei Budnik

> Default executor doesn't wait for status updates to be ack'd before shutting 
> down
> -
>
> Key: MESOS-8537
> URL: https://issues.apache.org/jira/browse/MESOS-8537
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Gastón Kleiman
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: containerization, default-executor, mesosphere
>
> The default executor doesn't wait for pending status updates to be 
> acknowledged before shutting down, instead it sleeps for one second and then 
> terminates:
> {code}
>   void _shutdown()
>   {
> const Duration duration = Seconds(1);
> LOG(INFO) << "Terminating after " << duration;
> // TODO(qianzhang): Remove this hack since the executor now receives
> // acknowledgements for status updates. The executor can terminate
> // after it receives an ACK for a terminal status update.
> os::sleep(duration);
> terminate(self());
>   }
> {code}
> The event handler should exit if upon receiving a {{Event::ACKNOWLEDGED}} the 
> executor is shutting down, no tasks are running anymore, and all pending 
> status updates have been acknowledged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-10053) Update Docker executor to set Docker container’s resource limits and `oom_score_adj`

2020-01-20 Thread Qian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017766#comment-17017766
 ] 

Qian Zhang edited comment on MESOS-10053 at 1/20/20 8:59 AM:
-

RR:

[https://reviews.apache.org/r/72022/]

[https://reviews.apache.org/r/72027/]


was (Author: qianzhang):
RR:

[https://reviews.apache.org/r/72022/]

> Update Docker executor to set Docker container’s resource limits and 
> `oom_score_adj`
> 
>
> Key: MESOS-10053
> URL: https://issues.apache.org/jira/browse/MESOS-10053
> Project: Mesos
>  Issue Type: Task
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Major
>
> This is to set resource limits for command task which will run as a Docker 
> container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)