[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019740#comment-17019740 ] Charles commented on MESOS-1807: BTW, how come this problem doesn't affect the command executor? > Disallow executors with cpu only or memory only resources > - > > Key: MESOS-1807 > URL: https://issues.apache.org/jira/browse/MESOS-1807 > Project: Mesos > Issue Type: Improvement >Reporter: Vinod Kone >Priority: Major > Attachments: Screenshot 2015-07-28 14.40.35.png > > > Currently master allows executors to be launched with either only cpus or > only memory but we shouldn't allow that. > This is because executor is an actual unix process that is launched by the > slave. If an executor doesn't specify cpus, what should the cpu limits be for > that executor when there are no tasks running on it? If no cpu limits are set > then it might starve other executors/tasks on the slave violating isolation > guarantees. Same goes with memory. Moreover, the current > containerizer/isolator code will throw failures when using such an executor, > e.g., when the last task on the executor finishes and Containerizer::update() > is called with 0 cpus or 0 mem. > According to a source code [TODO | > https://github.com/apache/mesos/blob/0226620747e1769434a1a83da547bfc3470a9549/src/master/validation.cpp#L400] > this should also include checking whether requested resources are greater > than MIN_CPUS/MIN_BYTES. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019733#comment-17019733 ] Charles edited comment on MESOS-1807 at 1/20/20 8:52 PM: - Is there any way I could help this move forward? I just got bitten by this where my custom executor would lead to random errors described as [~vinodkone] "when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem.". See for example https://github.com/mesos/chronos/issues/428 {noformat} ec2-__-___-___-___.compute-1.amazonaws.com E0414 00:41:50.864876 29069 slave.cpp:2344] Failed to update resources for container 867bfec1-ac28-4a4f-8904-3404e6d1e3e9 of executor shell-wrapper-executor running task ct:1428972109061:0:my-chronos-job on status update for terminal task, destroying container: Collect failed: No cpus resource given {noformat} In the mean time what's the proper workaround? Always define CPU and memory resources for the executor? It's a bit annoying because it effectively means arbitrarily limiting the CPU usage of the task (e.g. if there's 1 core and we allocate 0.01 CPU to the executor, we only have 0.99 left for the task), but I guess there's no really any way around that. In any case returning an error before accepting the tasks is better than accepting them with a warning and then randomly fail at a later point when the last task on the executor finishes. Maybe [~bmahler] has an idea? was (Author: charle): Is there any way I could help this move forward? I just got bitten by this where my custom executor would lead to random errors described as [~vinodkone] "when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem.". See for example https://github.com/mesos/chronos/issues/428 {noformat} ec2-__-___-___-___.compute-1.amazonaws.com E0414 00:41:50.864876 29069 slave.cpp:2344] Failed to update resources for container 867bfec1-ac28-4a4f-8904-3404e6d1e3e9 of executor shell-wrapper-executor running task ct:1428972109061:0:my-chronos-job on status update for terminal task, destroying container: Collect failed: No cpus resource given {noformat} In the mean time what's the proper workaround? Always define CPU and memory resources for the executor? It's a bit annoying because it effectively means arbitrarily limiting the CPU usage of the task (e.g. if there's 1 core and we allocate 0.01 CPU to the executor, we only have 0.99 left for the task), but I guess there's no really any way around that. Maybe [~bmahler] has an idea? > Disallow executors with cpu only or memory only resources > - > > Key: MESOS-1807 > URL: https://issues.apache.org/jira/browse/MESOS-1807 > Project: Mesos > Issue Type: Improvement >Reporter: Vinod Kone >Priority: Major > Attachments: Screenshot 2015-07-28 14.40.35.png > > > Currently master allows executors to be launched with either only cpus or > only memory but we shouldn't allow that. > This is because executor is an actual unix process that is launched by the > slave. If an executor doesn't specify cpus, what should the cpu limits be for > that executor when there are no tasks running on it? If no cpu limits are set > then it might starve other executors/tasks on the slave violating isolation > guarantees. Same goes with memory. Moreover, the current > containerizer/isolator code will throw failures when using such an executor, > e.g., when the last task on the executor finishes and Containerizer::update() > is called with 0 cpus or 0 mem. > According to a source code [TODO | > https://github.com/apache/mesos/blob/0226620747e1769434a1a83da547bfc3470a9549/src/master/validation.cpp#L400] > this should also include checking whether requested resources are greater > than MIN_CPUS/MIN_BYTES. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019733#comment-17019733 ] Charles commented on MESOS-1807: Is there any way I could help this move forward? I just got bitten by this where my custom executor would lead to random errors described as [~vinodkone] "when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem.". See for example https://github.com/mesos/chronos/issues/428 {noformat} ec2-__-___-___-___.compute-1.amazonaws.com E0414 00:41:50.864876 29069 slave.cpp:2344] Failed to update resources for container 867bfec1-ac28-4a4f-8904-3404e6d1e3e9 of executor shell-wrapper-executor running task ct:1428972109061:0:my-chronos-job on status update for terminal task, destroying container: Collect failed: No cpus resource given {noformat} In the mean time what's the proper workaround? Always define CPU and memory resources for the executor? It's a bit annoying because it effectively means arbitrarily limiting the CPU usage of the task (e.g. if there's 1 core and we allocate 0.01 CPU to the executor, we only have 0.99 left for the task), but I guess there's no really any way around that. Maybe [~bmahler] has an idea? > Disallow executors with cpu only or memory only resources > - > > Key: MESOS-1807 > URL: https://issues.apache.org/jira/browse/MESOS-1807 > Project: Mesos > Issue Type: Improvement >Reporter: Vinod Kone >Priority: Major > Attachments: Screenshot 2015-07-28 14.40.35.png > > > Currently master allows executors to be launched with either only cpus or > only memory but we shouldn't allow that. > This is because executor is an actual unix process that is launched by the > slave. If an executor doesn't specify cpus, what should the cpu limits be for > that executor when there are no tasks running on it? If no cpu limits are set > then it might starve other executors/tasks on the slave violating isolation > guarantees. Same goes with memory. Moreover, the current > containerizer/isolator code will throw failures when using such an executor, > e.g., when the last task on the executor finishes and Containerizer::update() > is called with 0 cpus or 0 mem. > According to a source code [TODO | > https://github.com/apache/mesos/blob/0226620747e1769434a1a83da547bfc3470a9549/src/master/validation.cpp#L400] > this should also include checking whether requested resources are greater > than MIN_CPUS/MIN_BYTES. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (MESOS-8537) Default executor doesn't wait for status updates to be ack'd before shutting down
[ https://issues.apache.org/jira/browse/MESOS-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik reassigned MESOS-8537: Assignee: Andrei Budnik > Default executor doesn't wait for status updates to be ack'd before shutting > down > - > > Key: MESOS-8537 > URL: https://issues.apache.org/jira/browse/MESOS-8537 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 1.4.1, 1.5.0 >Reporter: Gastón Kleiman >Assignee: Andrei Budnik >Priority: Major > Labels: containerization, default-executor, mesosphere > > The default executor doesn't wait for pending status updates to be > acknowledged before shutting down, instead it sleeps for one second and then > terminates: > {code} > void _shutdown() > { > const Duration duration = Seconds(1); > LOG(INFO) << "Terminating after " << duration; > // TODO(qianzhang): Remove this hack since the executor now receives > // acknowledgements for status updates. The executor can terminate > // after it receives an ACK for a terminal status update. > os::sleep(duration); > terminate(self()); > } > {code} > The event handler should exit if upon receiving a {{Event::ACKNOWLEDGED}} the > executor is shutting down, no tasks are running anymore, and all pending > status updates have been acknowledged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MESOS-10053) Update Docker executor to set Docker container’s resource limits and `oom_score_adj`
[ https://issues.apache.org/jira/browse/MESOS-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017766#comment-17017766 ] Qian Zhang edited comment on MESOS-10053 at 1/20/20 8:59 AM: - RR: [https://reviews.apache.org/r/72022/] [https://reviews.apache.org/r/72027/] was (Author: qianzhang): RR: [https://reviews.apache.org/r/72022/] > Update Docker executor to set Docker container’s resource limits and > `oom_score_adj` > > > Key: MESOS-10053 > URL: https://issues.apache.org/jira/browse/MESOS-10053 > Project: Mesos > Issue Type: Task >Reporter: Qian Zhang >Assignee: Qian Zhang >Priority: Major > > This is to set resource limits for command task which will run as a Docker > container. -- This message was sent by Atlassian Jira (v8.3.4#803005)