[jira] [Commented] (MESOS-3435) Add containerizer support for hyper
[ https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284173#comment-15284173 ] Abhishek Dasgupta commented on MESOS-3435: -- Thanks, I m checking it. > Add containerizer support for hyper > --- > > Key: MESOS-3435 > URL: https://issues.apache.org/jira/browse/MESOS-3435 > Project: Mesos > Issue Type: Story >Reporter: Deshi Xiao >Assignee: haosdent > > Secure as hypervisor, fast and easily used as Docker. This is hyper. > https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement > this through module way once MESOS-3709 finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5386) Add `HANDLE` overloads for functions that take a file descriptor
Alex Clemmer created MESOS-5386: --- Summary: Add `HANDLE` overloads for functions that take a file descriptor Key: MESOS-5386 URL: https://issues.apache.org/jira/browse/MESOS-5386 Project: Mesos Issue Type: Bug Components: stout Reporter: Alex Clemmer Assignee: Alex Clemmer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5381) Network portmapping isolator disable IPv6 failed
[ https://issues.apache.org/jira/browse/MESOS-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sha Zhengju reassigned MESOS-5381: -- Assignee: Sha Zhengju > Network portmapping isolator disable IPv6 failed > > > Key: MESOS-5381 > URL: https://issues.apache.org/jira/browse/MESOS-5381 > Project: Mesos > Issue Type: Bug > Components: isolation > Environment: CentOS7 >Reporter: Sha Zhengju >Assignee: Sha Zhengju > > We observed this error in our environment: > 1. enable --isolation=network/port_mapping for mesos 0.28.0 on CentOS7.2 > with kernel version: 3.10.0-327.10.1.el7.x86_64 > 2. create simple application on marathon framework with commands such as > "echo hello" > 3. mesos executor failed with error logs in sandbox stderr file: > {quote} > + mount --make-rslave /var/run/netns > + echo 1 > sh: line 3: /proc/sys/net/ipv6/conf/all/disable_ipv6: No such file or > directory Failed to execute a preparation shell command > {quote} > The reason is that we should do some ipv6 check in > isolators/network/port_mapping.cpp:PortMappingIsolatorProcess::scripts(), if > ipv6 module is not loaded while kernel booting(such as in CentOS7/RHEL7), we > don't need to disable ipv6 anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5380) Killing a queued task can cause the corresponding command executor to never terminate.
[ https://issues.apache.org/jira/browse/MESOS-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283944#comment-15283944 ] Vinod Kone edited comment on MESOS-5380 at 5/15/16 8:08 PM: Phase 2: https://reviews.apache.org/r/47402/ was (Author: vinodkone): https://reviews.apache.org/r/47402/ > Killing a queued task can cause the corresponding command executor to never > terminate. > -- > > Key: MESOS-5380 > URL: https://issues.apache.org/jira/browse/MESOS-5380 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.28.0, 0.28.1 >Reporter: Jie Yu >Assignee: Vinod Kone >Priority: Blocker > Labels: mesosphere > Fix For: 0.29.0, 0.28.2 > > > We observed this in our testing environment. Sequence of events: > 1) A command task is queued since the executor has not registered yet. > 2) The framework issues a killTask. > 3) Since executor is in REGISTERING state, agent calls > `statusUpdate(TASK_KILLED, UPID())` > 4) `statusUpdate` now will call `containerizer->status()` before calling > `executor->terminateTask(status.task_id(), status);` which will remove the > queued task. (Introduced in this patch: https://reviews.apache.org/r/43258). > 5) Since the above is async, it's possible that the task is still in queued > task when we trying to see if we need to kill unregistered executor in > `killTask`: > {code} > // TODO(jieyu): Here, we kill the executor if it no longer has > // any task to run and has not yet registered. This is a > // workaround for those single task executors that do not have a > // proper self terminating logic when they haven't received the > // task within a timeout. > if (executor->queuedTasks.empty()) { > CHECK(executor->launchedTasks.empty()) > << " Unregistered executor '" << executor->id > << "' has launched tasks"; > LOG(WARNING) << "Killing the unregistered executor " << *executor > << " because it has no tasks"; > executor->state = Executor::TERMINATING; > containerizer->destroy(executor->containerId); > } > {code} > 6) Consequently, the executor will never be terminated by Mesos. > Attaching the relevant agent log: > {noformat} > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.640527 1342 slave.cpp:1361] Got assigned task > mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework > a3ad8418-cb77-4705-b353-4b514ceca52c- > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.641034 1342 slave.cpp:1480] Launching task > mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework > a3ad8418-cb77-4705-b353-4b514ceca52c- > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.641440 1342 paths.cpp:528] Trying to chown > '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a' > to user 'root' > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.644664 1342 slave.cpp:5389] Launching executor > mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework > a3ad8418-cb77-4705-b353-4b514ceca52c- with resources cpus(*):0.1; > mem(*):32 in work directory > '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a' > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.645195 1342 slave.cpp:1698] Queuing task > 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' for executor > 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework > a3ad8418-cb77-4705-b353-4b514ceca52c- > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.645491 1338 containerizer.cpp:671] Starting container > '24762d43-2134-475e-b724-caa72110497a' for executor > 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework > 'a3ad8418-cb77-4705-b353-4b514ceca52c-' > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.647897 1345 cpushare.cpp:389] Updated 'cpu.shares' to 1126 > (cpus 1.1) for container 24762d43-2134-475e-b724-caa72110497a > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.648619 1345 cpushare.cpp:411] Updated 'cpu.cfs_period_us' to > 100ms and 'cpu.cfs_quota_us' to 110ms
[jira] [Commented] (MESOS-5380) Killing a queued task can cause the corresponding command executor to never terminate.
[ https://issues.apache.org/jira/browse/MESOS-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283944#comment-15283944 ] Vinod Kone commented on MESOS-5380: --- https://reviews.apache.org/r/47402/ > Killing a queued task can cause the corresponding command executor to never > terminate. > -- > > Key: MESOS-5380 > URL: https://issues.apache.org/jira/browse/MESOS-5380 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.28.0, 0.28.1 >Reporter: Jie Yu >Assignee: Vinod Kone >Priority: Blocker > Labels: mesosphere > Fix For: 0.29.0, 0.28.2 > > > We observed this in our testing environment. Sequence of events: > 1) A command task is queued since the executor has not registered yet. > 2) The framework issues a killTask. > 3) Since executor is in REGISTERING state, agent calls > `statusUpdate(TASK_KILLED, UPID())` > 4) `statusUpdate` now will call `containerizer->status()` before calling > `executor->terminateTask(status.task_id(), status);` which will remove the > queued task. (Introduced in this patch: https://reviews.apache.org/r/43258). > 5) Since the above is async, it's possible that the task is still in queued > task when we trying to see if we need to kill unregistered executor in > `killTask`: > {code} > // TODO(jieyu): Here, we kill the executor if it no longer has > // any task to run and has not yet registered. This is a > // workaround for those single task executors that do not have a > // proper self terminating logic when they haven't received the > // task within a timeout. > if (executor->queuedTasks.empty()) { > CHECK(executor->launchedTasks.empty()) > << " Unregistered executor '" << executor->id > << "' has launched tasks"; > LOG(WARNING) << "Killing the unregistered executor " << *executor > << " because it has no tasks"; > executor->state = Executor::TERMINATING; > containerizer->destroy(executor->containerId); > } > {code} > 6) Consequently, the executor will never be terminated by Mesos. > Attaching the relevant agent log: > {noformat} > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.640527 1342 slave.cpp:1361] Got assigned task > mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework > a3ad8418-cb77-4705-b353-4b514ceca52c- > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.641034 1342 slave.cpp:1480] Launching task > mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework > a3ad8418-cb77-4705-b353-4b514ceca52c- > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.641440 1342 paths.cpp:528] Trying to chown > '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a' > to user 'root' > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.644664 1342 slave.cpp:5389] Launching executor > mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework > a3ad8418-cb77-4705-b353-4b514ceca52c- with resources cpus(*):0.1; > mem(*):32 in work directory > '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a' > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.645195 1342 slave.cpp:1698] Queuing task > 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' for executor > 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework > a3ad8418-cb77-4705-b353-4b514ceca52c- > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.645491 1338 containerizer.cpp:671] Starting container > '24762d43-2134-475e-b724-caa72110497a' for executor > 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework > 'a3ad8418-cb77-4705-b353-4b514ceca52c-' > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.647897 1345 cpushare.cpp:389] Updated 'cpu.shares' to 1126 > (cpus 1.1) for container 24762d43-2134-475e-b724-caa72110497a > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: > I0513 15:36:13.648619 1345 cpushare.cpp:411] Updated 'cpu.cfs_period_us' to > 100ms and 'cpu.cfs_quota_us' to 110ms (cpus 1.1) for container > 24762d43-2134-475e-b724-caa72110497a > May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal
[jira] [Commented] (MESOS-3435) Add containerizer support for hyper
[ https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283899#comment-15283899 ] haosdent commented on MESOS-3435: - Just public the draft at https://github.com/haosdent/mesos/commit/ac918a23f7148411925e07d6e1ab5eff7c36a97c [~a10gupta] > Add containerizer support for hyper > --- > > Key: MESOS-3435 > URL: https://issues.apache.org/jira/browse/MESOS-3435 > Project: Mesos > Issue Type: Story >Reporter: Deshi Xiao >Assignee: haosdent > > Secure as hypervisor, fast and easily used as Docker. This is hyper. > https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement > this through module way once MESOS-3709 finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5278) Add a CLI allowing a user to enter a container.
[ https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283843#comment-15283843 ] Guangya Liu commented on MESOS-5278: [~haosd...@gmail.com] , we need to put {{mesos-*}} under {{$PATH}} if want {{mesos}} command pick up those sub commands, after putting {{mesos-*}} under {{$PATH}}, I can get all subcommands for {{mesos}}. > Add a CLI allowing a user to enter a container. > --- > > Key: MESOS-5278 > URL: https://issues.apache.org/jira/browse/MESOS-5278 > Project: Mesos > Issue Type: Improvement >Reporter: Jie Yu >Assignee: Guangya Liu > > Containers created by the unified containerizer (Mesos containerizer) uses > various namespaces (e.g., mount, network, etc.). > To improve debugability, we should create a CLI that allows an operator or a > user to enter the namespaces associated with the container, and execute an > arbitrary command in that container (similar to `docker exec`). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5377) Improve DRF behavior with scarce resources.
[ https://issues.apache.org/jira/browse/MESOS-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283837#comment-15283837 ] Qian Zhang commented on MESOS-5377: --- Can we introduce weight for each resource allocated by Mesos master? {{Each resource's weight = the number of agents have this resource / the number of total agents}} And then when we calculate the resource share for each role/framework in DRF sorter, we can take this weight into account: {{resource share = resource weight * (allocation / total)}}. So for the example in the description of this ticket, the weight of GPU will be 0.001, and the GPU share of the role which consumes the only 1 GPU will be 0.001 rather than 1. This can be the default behavior and we may consider to introduce a flag to Mesos master with which operator can explicitly set weight for each resource to override the default way to calculate the resource's weight. > Improve DRF behavior with scarce resources. > --- > > Key: MESOS-5377 > URL: https://issues.apache.org/jira/browse/MESOS-5377 > Project: Mesos > Issue Type: Epic > Components: allocation >Reporter: Benjamin Mahler > > The allocator currently uses the notion of Weighted [Dominant Resource > Fairness|https://www.cs.berkeley.edu/~alig/papers/drf.pdf] (WDRF) to > establish a linear notion of fairness across allocation roles. > DRF behaves well for resources that are present within each machine in a > cluster (e.g. CPUs, memory, disk). However, some resources (e.g. GPUs) are > only present on a subset of machines in the cluster. > Consider the behavior when there are the following agents in a cluster: > 1000 agents with (cpus:4,mem:1024,disk:1024) > 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024) > If a role wishes to use both GPU and non-GPU resources for tasks, consuming 1 > GPU will lead DRF to consider the role to have a 100% share of the cluster, > since it consumes 100% of the GPUs in the cluster. This framework will then > not receive any other offers. > Among possible improvements, fairness can have understanding of resource > packages. In a sense there is 1 GPU package that is competed on and 1000 > non-GPU packages competed on, and ideally a role's consumption of the single > GPU package does not have a large effect on the role's access to the other > 1000 non-GPU packages. > In the interim, we should consider having a recommended way to deal with > scarce resources in the current model. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4672) Implement aufs based provisioner backend.
[ https://issues.apache.org/jira/browse/MESOS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283823#comment-15283823 ] Shuai Lin commented on MESOS-4672: -- [~jieyu] The code is mostly copy-paste-modified from overlay backend, do you think it's worth to refactor out a common base class for overlay/aufs? Pros: Better abstraction Cons: The subclasses may be only overlay and aufs, and no more. The remaining possible backends are btrfs, devicemapper, which doesn't work like this. > Implement aufs based provisioner backend. > - > > Key: MESOS-4672 > URL: https://issues.apache.org/jira/browse/MESOS-4672 > Project: Mesos > Issue Type: Bug >Reporter: Jie Yu >Assignee: Shuai Lin > > Overlay fs support hasn't been merged until kernel 3.18. Docker's default > storage backend for ubuntu 14.04 is aufs. We should consider adding a aufs > based backend for unified containerizer as well to efficiently provide a > union fs (instead of relying on copy backend which is not space efficient). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3435) Add containerizer support for hyper
[ https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283812#comment-15283812 ] haosdent commented on MESOS-3435: - Cool, actually I have some feedbacks about your APIs. XD > Add containerizer support for hyper > --- > > Key: MESOS-3435 > URL: https://issues.apache.org/jira/browse/MESOS-3435 > Project: Mesos > Issue Type: Story >Reporter: Deshi Xiao >Assignee: haosdent > > Secure as hypervisor, fast and easily used as Docker. This is hyper. > https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement > this through module way once MESOS-3709 finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)