[jira] [Commented] (MESOS-3435) Add containerizer support for hyper

2016-05-15 Thread Abhishek Dasgupta (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284173#comment-15284173
 ] 

Abhishek Dasgupta commented on MESOS-3435:
--

Thanks, I m checking it.

> Add containerizer support for hyper
> ---
>
> Key: MESOS-3435
> URL: https://issues.apache.org/jira/browse/MESOS-3435
> Project: Mesos
>  Issue Type: Story
>Reporter: Deshi Xiao
>Assignee: haosdent
>
> Secure as hypervisor, fast and easily used as Docker. This is hyper. 
> https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement 
> this through module way once MESOS-3709 finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5386) Add `HANDLE` overloads for functions that take a file descriptor

2016-05-15 Thread Alex Clemmer (JIRA)
Alex Clemmer created MESOS-5386:
---

 Summary: Add `HANDLE` overloads for functions that take a file 
descriptor
 Key: MESOS-5386
 URL: https://issues.apache.org/jira/browse/MESOS-5386
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Alex Clemmer
Assignee: Alex Clemmer






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5381) Network portmapping isolator disable IPv6 failed

2016-05-15 Thread Sha Zhengju (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sha Zhengju reassigned MESOS-5381:
--

Assignee: Sha Zhengju

> Network portmapping isolator disable IPv6 failed
> 
>
> Key: MESOS-5381
> URL: https://issues.apache.org/jira/browse/MESOS-5381
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
> Environment: CentOS7
>Reporter: Sha Zhengju
>Assignee: Sha Zhengju
>
> We observed this error in our environment:
> 1.  enable --isolation=network/port_mapping for mesos 0.28.0 on CentOS7.2 
> with kernel version:  3.10.0-327.10.1.el7.x86_64
> 2. create simple application on marathon framework with commands such as 
> "echo hello"
> 3. mesos executor failed with error logs in sandbox stderr file:
> {quote}
> + mount --make-rslave /var/run/netns
> + echo 1
> sh: line 3: /proc/sys/net/ipv6/conf/all/disable_ipv6: No such file or 
> directory Failed to execute a preparation shell command
> {quote}
> The reason is that we should do some ipv6 check in 
> isolators/network/port_mapping.cpp:PortMappingIsolatorProcess::scripts(), if 
> ipv6 module is not loaded while kernel booting(such as in CentOS7/RHEL7), we 
> don't need to disable ipv6 anymore. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5380) Killing a queued task can cause the corresponding command executor to never terminate.

2016-05-15 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283944#comment-15283944
 ] 

Vinod Kone edited comment on MESOS-5380 at 5/15/16 8:08 PM:


Phase 2: https://reviews.apache.org/r/47402/


was (Author: vinodkone):
https://reviews.apache.org/r/47402/

> Killing a queued task can cause the corresponding command executor to never 
> terminate.
> --
>
> Key: MESOS-5380
> URL: https://issues.apache.org/jira/browse/MESOS-5380
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>Assignee: Vinod Kone
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> We observed this in our testing environment. Sequence of events:
> 1) A command task is queued since the executor has not registered yet.
> 2) The framework issues a killTask.
> 3) Since executor is in REGISTERING state, agent calls 
> `statusUpdate(TASK_KILLED, UPID())`
> 4) `statusUpdate` now will call `containerizer->status()` before calling 
> `executor->terminateTask(status.task_id(), status);` which will remove the 
> queued task. (Introduced in this patch: https://reviews.apache.org/r/43258).
> 5) Since the above is async, it's possible that the task is still in queued 
> task when we trying to see if we need to kill unregistered executor in 
> `killTask`:
> {code}
>   // TODO(jieyu): Here, we kill the executor if it no longer has
>   // any task to run and has not yet registered. This is a
>   // workaround for those single task executors that do not have a
>   // proper self terminating logic when they haven't received the
>   // task within a timeout.
>   if (executor->queuedTasks.empty()) {
> CHECK(executor->launchedTasks.empty())
> << " Unregistered executor '" << executor->id
> << "' has launched tasks";
> LOG(WARNING) << "Killing the unregistered executor " << *executor
>  << " because it has no tasks";
> executor->state = Executor::TERMINATING;
> containerizer->destroy(executor->containerId);
>   }
> {code}
> 6) Consequently, the executor will never be terminated by Mesos.
> Attaching the relevant agent log:
> {noformat}
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.640527  1342 slave.cpp:1361] Got assigned task 
> mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework 
> a3ad8418-cb77-4705-b353-4b514ceca52c-
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.641034  1342 slave.cpp:1480] Launching task 
> mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework 
> a3ad8418-cb77-4705-b353-4b514ceca52c-
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.641440  1342 paths.cpp:528] Trying to chown 
> '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a'
>  to user 'root'
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.644664  1342 slave.cpp:5389] Launching executor 
> mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework 
> a3ad8418-cb77-4705-b353-4b514ceca52c- with resources cpus(*):0.1; 
> mem(*):32 in work directory 
> '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a'
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.645195  1342 slave.cpp:1698] Queuing task 
> 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' for executor 
> 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework 
> a3ad8418-cb77-4705-b353-4b514ceca52c-
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.645491  1338 containerizer.cpp:671] Starting container 
> '24762d43-2134-475e-b724-caa72110497a' for executor 
> 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework 
> 'a3ad8418-cb77-4705-b353-4b514ceca52c-'
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.647897  1345 cpushare.cpp:389] Updated 'cpu.shares' to 1126 
> (cpus 1.1) for container 24762d43-2134-475e-b724-caa72110497a
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.648619  1345 cpushare.cpp:411] Updated 'cpu.cfs_period_us' to 
> 100ms and 'cpu.cfs_quota_us' to 110ms 

[jira] [Commented] (MESOS-5380) Killing a queued task can cause the corresponding command executor to never terminate.

2016-05-15 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283944#comment-15283944
 ] 

Vinod Kone commented on MESOS-5380:
---

https://reviews.apache.org/r/47402/

> Killing a queued task can cause the corresponding command executor to never 
> terminate.
> --
>
> Key: MESOS-5380
> URL: https://issues.apache.org/jira/browse/MESOS-5380
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Jie Yu
>Assignee: Vinod Kone
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> We observed this in our testing environment. Sequence of events:
> 1) A command task is queued since the executor has not registered yet.
> 2) The framework issues a killTask.
> 3) Since executor is in REGISTERING state, agent calls 
> `statusUpdate(TASK_KILLED, UPID())`
> 4) `statusUpdate` now will call `containerizer->status()` before calling 
> `executor->terminateTask(status.task_id(), status);` which will remove the 
> queued task. (Introduced in this patch: https://reviews.apache.org/r/43258).
> 5) Since the above is async, it's possible that the task is still in queued 
> task when we trying to see if we need to kill unregistered executor in 
> `killTask`:
> {code}
>   // TODO(jieyu): Here, we kill the executor if it no longer has
>   // any task to run and has not yet registered. This is a
>   // workaround for those single task executors that do not have a
>   // proper self terminating logic when they haven't received the
>   // task within a timeout.
>   if (executor->queuedTasks.empty()) {
> CHECK(executor->launchedTasks.empty())
> << " Unregistered executor '" << executor->id
> << "' has launched tasks";
> LOG(WARNING) << "Killing the unregistered executor " << *executor
>  << " because it has no tasks";
> executor->state = Executor::TERMINATING;
> containerizer->destroy(executor->containerId);
>   }
> {code}
> 6) Consequently, the executor will never be terminated by Mesos.
> Attaching the relevant agent log:
> {noformat}
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.640527  1342 slave.cpp:1361] Got assigned task 
> mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework 
> a3ad8418-cb77-4705-b353-4b514ceca52c-
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.641034  1342 slave.cpp:1480] Launching task 
> mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 for framework 
> a3ad8418-cb77-4705-b353-4b514ceca52c-
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.641440  1342 paths.cpp:528] Trying to chown 
> '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a'
>  to user 'root'
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.644664  1342 slave.cpp:5389] Launching executor 
> mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6 of framework 
> a3ad8418-cb77-4705-b353-4b514ceca52c- with resources cpus(*):0.1; 
> mem(*):32 in work directory 
> '/var/lib/mesos/slave/slaves/a3ad8418-cb77-4705-b353-4b514ceca52c-S0/frameworks/a3ad8418-cb77-4705-b353-4b514ceca52c-/executors/mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6/runs/24762d43-2134-475e-b724-caa72110497a'
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.645195  1342 slave.cpp:1698] Queuing task 
> 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' for executor 
> 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework 
> a3ad8418-cb77-4705-b353-4b514ceca52c-
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.645491  1338 containerizer.cpp:671] Starting container 
> '24762d43-2134-475e-b724-caa72110497a' for executor 
> 'mesosvol.6ccd993c-1920-11e6-a722-9648cb19afd6' of framework 
> 'a3ad8418-cb77-4705-b353-4b514ceca52c-'
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.647897  1345 cpushare.cpp:389] Updated 'cpu.shares' to 1126 
> (cpus 1.1) for container 24762d43-2134-475e-b724-caa72110497a
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal mesos-slave[1304]: 
> I0513 15:36:13.648619  1345 cpushare.cpp:411] Updated 'cpu.cfs_period_us' to 
> 100ms and 'cpu.cfs_quota_us' to 110ms (cpus 1.1) for container 
> 24762d43-2134-475e-b724-caa72110497a
> May 13 15:36:13 ip-10-0-2-74.us-west-2.compute.internal 

[jira] [Commented] (MESOS-3435) Add containerizer support for hyper

2016-05-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283899#comment-15283899
 ] 

haosdent commented on MESOS-3435:
-

Just public the draft at 
https://github.com/haosdent/mesos/commit/ac918a23f7148411925e07d6e1ab5eff7c36a97c
 [~a10gupta]

> Add containerizer support for hyper
> ---
>
> Key: MESOS-3435
> URL: https://issues.apache.org/jira/browse/MESOS-3435
> Project: Mesos
>  Issue Type: Story
>Reporter: Deshi Xiao
>Assignee: haosdent
>
> Secure as hypervisor, fast and easily used as Docker. This is hyper. 
> https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement 
> this through module way once MESOS-3709 finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5278) Add a CLI allowing a user to enter a container.

2016-05-15 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283843#comment-15283843
 ] 

Guangya Liu commented on MESOS-5278:


[~haosd...@gmail.com] , we need to put {{mesos-*}} under {{$PATH}} if want 
{{mesos}} command pick up those sub commands, after putting {{mesos-*}} under 
{{$PATH}}, I can get all subcommands for {{mesos}}.

> Add a CLI allowing a user to enter a container.
> ---
>
> Key: MESOS-5278
> URL: https://issues.apache.org/jira/browse/MESOS-5278
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Guangya Liu
>
> Containers created by the unified containerizer (Mesos containerizer) uses 
> various namespaces (e.g., mount, network, etc.).
> To improve debugability, we should create a CLI that allows an operator or a 
> user to enter the namespaces associated with the container, and execute an 
> arbitrary command in that container (similar to `docker exec`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5377) Improve DRF behavior with scarce resources.

2016-05-15 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283837#comment-15283837
 ] 

Qian Zhang commented on MESOS-5377:
---

Can we introduce weight for each resource allocated by Mesos master?
{{Each resource's weight = the number of agents have this resource / the number 
of total agents}}

And then when we calculate the resource share for each role/framework in DRF 
sorter, we can take this weight into account: {{resource share = resource 
weight * (allocation / total)}}. So for the example in the description of this 
ticket, the weight of GPU will be 0.001, and the GPU share of the role which 
consumes the only 1 GPU will be 0.001 rather than 1. This can be the default 
behavior and we may consider to introduce a flag to Mesos master with which 
operator can explicitly set weight for each resource to override the default 
way to calculate the resource's weight.

> Improve DRF behavior with scarce resources.
> ---
>
> Key: MESOS-5377
> URL: https://issues.apache.org/jira/browse/MESOS-5377
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>
> The allocator currently uses the notion of Weighted [Dominant Resource 
> Fairness|https://www.cs.berkeley.edu/~alig/papers/drf.pdf] (WDRF) to 
> establish a linear notion of fairness across allocation roles.
> DRF behaves well for resources that are present within each machine in a 
> cluster (e.g. CPUs, memory, disk). However, some resources (e.g. GPUs) are 
> only present on a subset of machines in the cluster.
> Consider the behavior when there are the following agents in a cluster:
> 1000 agents with (cpus:4,mem:1024,disk:1024)
> 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
> If a role wishes to use both GPU and non-GPU resources for tasks, consuming 1 
> GPU will lead DRF to consider the role to have a 100% share of the cluster, 
> since it consumes 100% of the GPUs in the cluster. This framework will then 
> not receive any other offers.
> Among possible improvements, fairness can have understanding of resource 
> packages. In a sense there is 1 GPU package that is competed on and 1000 
> non-GPU packages competed on, and ideally a role's consumption of the single 
> GPU package does not have a large effect on the role's access to the other 
> 1000 non-GPU packages.
> In the interim, we should consider having a recommended way to deal with 
> scarce resources in the current model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4672) Implement aufs based provisioner backend.

2016-05-15 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283823#comment-15283823
 ] 

Shuai Lin commented on MESOS-4672:
--

[~jieyu] The code is mostly copy-paste-modified from overlay backend, do you 
think it's worth to refactor out a common base class for overlay/aufs? 

Pros: Better abstraction

Cons: The subclasses may be only overlay and aufs, and no more.  The remaining 
possible backends are btrfs, devicemapper, which doesn't work like this.

> Implement aufs based provisioner backend.
> -
>
> Key: MESOS-4672
> URL: https://issues.apache.org/jira/browse/MESOS-4672
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Shuai Lin
>
> Overlay fs support hasn't been merged until kernel 3.18. Docker's default 
> storage backend for ubuntu 14.04 is aufs. We should consider adding a aufs 
> based backend for unified containerizer as well to efficiently provide a 
> union fs (instead of relying on copy backend which is not space efficient).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3435) Add containerizer support for hyper

2016-05-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283812#comment-15283812
 ] 

haosdent commented on MESOS-3435:
-

Cool, actually I have some feedbacks about your APIs. XD

> Add containerizer support for hyper
> ---
>
> Key: MESOS-3435
> URL: https://issues.apache.org/jira/browse/MESOS-3435
> Project: Mesos
>  Issue Type: Story
>Reporter: Deshi Xiao
>Assignee: haosdent
>
> Secure as hypervisor, fast and easily used as Docker. This is hyper. 
> https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement 
> this through module way once MESOS-3709 finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)