date:20160809

[jira] [Comment Edited] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-08-09 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414576#comment-15414576
 ] 

Sunzhe edited comment on MESOS-6010 at 8/10/16 1:47 AM:


The image is {{redis}}, and I didn't use the agent flag {{-\-docker_registry}}, 
that is to say, I used the default docker registry. If using the default Docker 
registry I was consistently getting this error.
But if I used the agent flag {{-\-docker_registry}} is a local 
path(i.e:{{/tmp/docker/images}}) in which Docker image archives(result of 
{{docker save}}) are stored, the {{mesos-execute}} works well.


was (Author: sunzhe):
The image is {{redis}}, and I didn't use the flag {{-\-docker_registry}}, that 
is to say, I used the default docker registry. If using the default Docker 
registry I was consistently getting this error.
But if I used the flag {{-\-docker_registry}} is a local 
path(i.e:{{/tmp/docker/images}}) in which Docker image archives(result of 
{{docker save}}) are stored, the {{mesos-execute}} works well.

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-08-09 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414576#comment-15414576
 ] 

Sunzhe edited comment on MESOS-6010 at 8/10/16 1:41 AM:


The image is {{redis}}, and I didn't use the flag {{-\-docker_registry}}, that 
is to say, I used the default docker registry. If using the default Docker 
registry I was consistently getting this error.
But if I used the flag {{-\-docker_registry}} is a local 
path(i.e:{{/tmp/docker/images}}) in which Docker image archives(result of 
{{docker save}}) are stored, the {{mesos-execute}} works well.


was (Author: sunzhe):
The image is {{redis}}, and I didn't use the flag {{-\-docker_registry}}, that 
is to say, I used the default docker registry. If using the default Docker 
registry I was consistently getting this error.
But if I used the flag {{-\-docker_registry}} is a local 
path(i.e:/tmp/docker/images) in which Docker image archives(result of {{docker 
save}}) are stored, the {{mesos-execute}} works well.

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-08-09 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414576#comment-15414576
 ] 

Sunzhe commented on MESOS-6010:
---

The image is {{redis}}, and I didn't use the flag {{-\-docker_registry}}, that 
is to say, I used the default docker registry. If using the default Docker 
registry I was consistently getting this error.
But if I used the flag {{-\-docker_registry}} is a local 
path(i.e:/tmp/docker/images) in which Docker image archives(result of {{docker 
save}}) are stored, the {{mesos-execute}} works well.

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6018) ExamplesTest.V1JavaFramework fails on OSX

2016-08-09 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6018:
-
Attachment: ExamplesTest.V1Java.txt

> ExamplesTest.V1JavaFramework fails on OSX
> -
>
> Key: MESOS-6018
> URL: https://issues.apache.org/jira/browse/MESOS-6018
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
> Environment: OSX 10.10.5
>Reporter: Greg Mann
>  Labels: mesosphere
> Attachments: ExamplesTest.V1Java.txt
>
>
> Find attached verbose logs from a test run of 
> {{ExamplesTest.V1JavaFramework}} on OSX 10.10.5. First I ran the whole test 
> suite and {{ExamplesTest.V1JavaFramework}} had failed, but the V0 version 
> passed. When I re-ran the V0 version by itself several times, it failed once 
> and succeeded the rest of the time. When I re-ran the V1 version by itself, 
> it failed every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6018) ExamplesTest.V1JavaFramework fails on OSX

2016-08-09 Thread Greg Mann (JIRA)

Greg Mann created MESOS-6018:


 Summary: ExamplesTest.V1JavaFramework fails on OSX
 Key: MESOS-6018
 URL: https://issues.apache.org/jira/browse/MESOS-6018
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
 Environment: OSX 10.10.5
Reporter: Greg Mann


Find attached verbose logs from a test run of {{ExamplesTest.V1JavaFramework}} 
on OSX 10.10.5. First I ran the whole test suite and 
{{ExamplesTest.V1JavaFramework}} had failed, but the V0 version passed. When I 
re-ran the V0 version by itself several times, it failed once and succeeded the 
rest of the time. When I re-ran the V1 version by itself, it failed every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6017) Introduce `PortMapping` protobuf.

2016-08-09 Thread Avinash Sridharan (JIRA)

Avinash Sridharan created MESOS-6017:


 Summary: Introduce `PortMapping` protobuf.
 Key: MESOS-6017
 URL: https://issues.apache.org/jira/browse/MESOS-6017
 Project: Mesos
  Issue Type: Task
  Components: containerization
 Environment: Linux
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan


Currently we have a `PortMapping` message defined for `DockerInfo`. This can be 
used only by the `DockerContainerizer`. We need to introduce a new Protobuf 
message in `NetworkInfo` which will allow frameworks to specify port mapping 
when using CNI with the `MesosContainerizer`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6016) Expose the unversioned Call and Event Scheduler/Executor Protobufs.

2016-08-09 Thread Anand Mazumdar (JIRA)

Anand Mazumdar created MESOS-6016:
-

 Summary: Expose the unversioned Call and Event Scheduler/Executor 
Protobufs.
 Key: MESOS-6016
 URL: https://issues.apache.org/jira/browse/MESOS-6016
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


Currently, we don't expose the un-versioned (v0) {{Call}}/{{Event}} 
scheduler/executor protobufs externally to framework authors. This is a bit 
disjoint since we already expose the unversioned Mesos protos. The reasoning 
for not doing so earlier was that Mesos would use the v0 protobufs as an 
alternative to having separate internal protobufs internally. 

However, that is not going to work. Eventually, when we introduce a backward 
incompatible change in {{v1}} protobufs, we would create new {{v2}} protobufs. 
But, we would need to ensure that {{v2}} protobufs can somehow be translated to 
{{v0}} without breaking existing users. That's a pretty hard thing to do! In 
the interim, to help framework authors migrate their frameworks (they might be 
storing old protobufs in ZK/other reliable storage) , we should expose the v0 
scheduler/executor protobufs too and create another internal translation layer 
for Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6012) Cannot compile resource estimator modules against installed Mesos headers

2016-08-09 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6012:
--
Fix Version/s: (was: 1.0.1)

> Cannot compile resource estimator modules against installed Mesos headers
> -
>
> Key: MESOS-6012
> URL: https://issues.apache.org/jira/browse/MESOS-6012
> Project: Mesos
>  Issue Type: Bug
>  Components: modules
>Affects Versions: 1.0.0
>Reporter: Matthias Bach
>Priority: Blocker
>  Labels: mesosphere
>
> As of version 1.0.0 it is no longer possible to compile custom resource 
> estimator modules against the installed Mesos headers. The error message that 
> is occurs on attempt is: 
> {{/usr/include/mesos/module/resource_estimator.hpp:23:46: fatal error: 
> mesos/slave/resource_estimator.hpp: No such file or directory}}.
> The root cause for this seems to be that on installation headers get moved 
> from {{mesos/slave}} to {{mesos/agent}}. Thus, the header path in 
> {{mesos/module/resource_estimator.hpp}} will resolve correctly during the 
> Mesos build, but not when compiling code against the installed headers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5967) Add support for 'docker image inspect' in our docker abstraction

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5967:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
40)

> Add support for 'docker image inspect' in our docker abstraction
> 
>
> Key: MESOS-5967
> URL: https://issues.apache.org/jira/browse/MESOS-5967
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, mesosphere
> Fix For: 1.1.0
>
>
> Our current {{docker inspect}} support in our docker abstraction only 
> supports inspecting containers (not images).  We should expand this support 
> to images.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5221) Add Documentation for Nvidia GPU support

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5221:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 35, Mesosphere Sprint 36, 
Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere 
Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 33, Mesosphere Sprint 
35, Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38, 
Mesosphere Sprint 39, Mesosphere Sprint 40)

> Add Documentation for Nvidia GPU support
> 
>
> Key: MESOS-5221
> URL: https://issues.apache.org/jira/browse/MESOS-5221
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Minor
>
> https://reviews.apache.org/r/46220/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5051) Create helpers for manipulating Linux capabilities.

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5051:
-
Sprint: Mesosphere Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 34, 
Mesosphere Sprint 35, Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere 
Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
32, Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere 
Sprint 40)

> Create helpers for manipulating Linux capabilities.
> ---
>
> Key: MESOS-5051
> URL: https://issues.apache.org/jira/browse/MESOS-5051
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> These helpers can either based on some existing library (e.g. libcap), or use 
> system calls directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5275) Add capabilities support for unified containerizer.

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5275:
-
Sprint: Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 37, 
Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere 
Sprint 41  (was: Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 
37, Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40)

> Add capabilities support for unified containerizer.
> ---
>
> Key: MESOS-5275
> URL: https://issues.apache.org/jira/browse/MESOS-5275
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> Add capabilities support for unified containerizer. 
> Requirements:
> 1. Use the mesos capabilities API.
> 2. Frameworks be able to add capability requests for containers.
> 3. Agents be able to add maximum allowed capabilities for all containers 
> launched.
> Design document: 
> https://docs.google.com/document/d/1YiTift8TQla2vq3upQr7K-riQ_pQ-FKOCOsysQJROGc/edit#heading=h.rgfwelqrskmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5582) Create a `cgroups/devices` isolator.

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5582:
-
Sprint: Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38, 
Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41  (was: 
Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere 
Sprint 39, Mesosphere Sprint 40)

> Create a `cgroups/devices` isolator.
> 
>
> Key: MESOS-5582
> URL: https://issues.apache.org/jira/browse/MESOS-5582
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: gpu, isolator, mesosphere
>
> Currently, all the logic for the `cgroups/devices` isolator is bundled into 
> the Nvidia GPU Isolator. We should abstract it out into it's own component 
> and remove the redundant logic from the Nvidia GPU Isolator. Assuming the 
> guaranteed ordering between isolators from MESOS-5581, we can be sure that 
> the dependency order between the `cgroups/devices` and `gpu/nvidia` isolators 
> is met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3753:
-
Sprint: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41  
(was: Mesosphere Sprint 39, Mesosphere Sprint 40)

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5303) Add capabilities support for mesos execute cli.

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5303:
-
Sprint: Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 37, 
Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere 
Sprint 41  (was: Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 
37, Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40)

> Add capabilities support for mesos execute cli.
> ---
>
> Key: MESOS-5303
> URL: https://issues.apache.org/jira/browse/MESOS-5303
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> Add support for `user` and `capabilities` to execute cli. This will help in 
> testing the `capabilities` feature for unified containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4766) Improve allocator performance.

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4766:
-
Sprint: Mesosphere Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 34, 
Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere 
Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41  
(was: Mesosphere Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 34, 
Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere 
Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40)

> Improve allocator performance.
> --
>
> Key: MESOS-4766
> URL: https://issues.apache.org/jira/browse/MESOS-4766
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> This is an epic to track the various tickets around improving the performance 
> of the allocator, including the following:
> * Preventing un-necessary backup of the allocator.
> * Reducing the cost of allocations and allocator state updates.
> * Improving performance of the DRF sorter.
> * More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5792) Add mesos tests to CMake (make check)

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5792:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
40)

> Add mesos tests to CMake (make check)
> -
>
> Key: MESOS-5792
> URL: https://issues.apache.org/jira/browse/MESOS-5792
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Srinivas
>Assignee: Srinivas
>  Labels: build, mesosphere
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Provide CMakeLists.txt and configuration files to build mesos tests using 
> CMake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5901) Make the command executor unversioned

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5901:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
40)

> Make the command executor unversioned
> -
>
> Key: MESOS-5901
> URL: https://issues.apache.org/jira/browse/MESOS-5901
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Currently, the command executor in {{src/launcher/executor.cpp}} is in the 
> {{v1}} namespace. As referenced in the versioning design doc, we had agreed 
> to keep the mesos internal code in the unversioned namespace and use 
> {{evolve/devolve}} helpers for requests/responses. 
> Following this pattern, we should bring the command executor in the 
> {{mesos::internal}} namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3934) Libprocess: Unify the initialization of the MetricsProcess and ReaperProcess

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3934:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
40)

> Libprocess: Unify the initialization of the MetricsProcess and ReaperProcess
> 
>
> Key: MESOS-3934
> URL: https://issues.apache.org/jira/browse/MESOS-3934
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> Related to this 
> [TODO|https://github.com/apache/mesos/blob/aa0cd7ed4edf1184cbc592b5caa2429a8373e813/3rdparty/libprocess/src/process.cpp#L949-L950].
> The {{MetricsProcess}} and {{ReaperProcess}} are global processes 
> (singletons) which are initialized upon first use.  The two processes could 
> be initialized alongside the {{gc}}, {{help}}, {{logging}}, {{profiler}}, and 
> {{system}} (statistics) processes inside {{process::initialize}}.
> This is also necessary for libprocess re-initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5878) Strict/RegistrarTest.UpdateQuota/0 is flaky

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5878:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
40)

> Strict/RegistrarTest.UpdateQuota/0 is flaky
> ---
>
> Key: MESOS-5878
> URL: https://issues.apache.org/jira/browse/MESOS-5878
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
> Attachments: strict_registrar_update_quota.log
>
>
> Observed on ASF CI 
> (https://builds.apache.org/job/Mesos/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)/2539/consoleFull).
>  Log file is attached. Note that this might have been uncovered due to the 
> recent removal of {{os::sleep}} from {{Clock::settle}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4778) Add appc/runtime isolator for runtime isolation for appc images.

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4778:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
40)

> Add appc/runtime isolator for runtime isolation for appc images.
> 
>
> Key: MESOS-4778
> URL: https://issues.apache.org/jira/browse/MESOS-4778
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Srinivas
>  Labels: containerizer, isolator, mesosphere
>
> Appc image also contains runtime information like 'exec', 'env', 
> 'workingDirectory' etc.
> https://github.com/appc/spec/blob/master/spec/aci.md
> Similar to docker images, we need to support a subset of them (mainly 'exec', 
> 'env' and 'workingDirectory').



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6006) Abstract mesos-style.py to allow future linters to be added more easily

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6006:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
40)

> Abstract mesos-style.py to allow future linters to be added more easily
> ---
>
> Key: MESOS-6006
> URL: https://issues.apache.org/jira/browse/MESOS-6006
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: cli, mesosphere
> Fix For: 1.1.0
>
>
> Currently, mesos-style.py is just a collection of functions that
> check the style of relevant files in the mesos code base.  However,
> the script assumes that we always wanted to run cpplint over every
> file we are checking. Since we are planning on adding a python linter
> to the codebase soon, it makes sense to abstract the common
> functionality from this script into a class so that a cpp-based linter
> and a python-based linter can inherit the same set of common
> functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5966) Add libprocess HTTP tests with SSL support

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5966:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
40)

> Add libprocess HTTP tests with SSL support
> --
>
> Key: MESOS-5966
> URL: https://issues.apache.org/jira/browse/MESOS-5966
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Libprocess contains SSL unit tests which test our SSL support using simple 
> sockets. We should add tests which also make use of libprocess's various HTTP 
> classes and helpers in a variety of SSL configurations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5228) Add tests for Capability API.

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5228:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere 
Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 33, Mesosphere Sprint 
34, Mesosphere Sprint 35, Mesosphere Sprint 37, Mesosphere Sprint 38, 
Mesosphere Sprint 39, Mesosphere Sprint 40)

> Add tests for Capability API.
> -
>
> Key: MESOS-5228
> URL: https://issues.apache.org/jira/browse/MESOS-5228
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Benjamin Bannier
>  Labels: mesosphere, unified-containerizer-mvp
>
> Add basic tests for the capability API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4690) Reorganize 3rdparty directory

2016-08-09 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4690:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere 
Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41  (was: Mesosphere Sprint 
33, Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 36, 
Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere 
Sprint 40)

> Reorganize 3rdparty directory
> -
>
> Key: MESOS-4690
> URL: https://issues.apache.org/jira/browse/MESOS-4690
> Project: Mesos
>  Issue Type: Epic
>  Components: build, libprocess, stout
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> This issues is currently being discussed in the dev mailing list:
> http://www.mail-archive.com/dev@mesos.apache.org/msg34349.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6015) Design for port-mapper CNI plugin

2016-08-09 Thread Avinash Sridharan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6015:
-
Sprint: Mesosphere Sprint 41

> Design for port-mapper CNI plugin
> -
>
> Key: MESOS-6015
> URL: https://issues.apache.org/jira/browse/MESOS-6015
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Create a design doc for port-mapper CNI plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5988) PollSocketImpl can write to a stale fd.

2016-08-09 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414303#comment-15414303
 ] 

Greg Mann commented on MESOS-5988:
--

Review here: https://reviews.apache.org/r/50936/

> PollSocketImpl can write to a stale fd.
> ---
>
> Key: MESOS-5988
> URL: https://issues.apache.org/jira/browse/MESOS-5988
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.1
>
>
> When tracking down MESOS-5986 with [~greggomann] and [~anandmazumdar]. We 
> were curious why PollSocketImpl avoids the same issue. It seems that 
> PollSocketImpl has a similar race, however in the case of PollSocketImpl we 
> will simply write to a stale file descriptor.
> One example is {{PollSocketImpl::send(const char*, size_t)}}:
> https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/poll_socket.cpp#L241-L245
> {code}
> Future PollSocketImpl::send(const char* data, size_t size)
> {
>   return io::poll(get(), io::WRITE)
> .then(lambda::bind(::socket_send_data, get(), data, size));
> }
> Future socket_send_data(int s, const char* data, size_t size)
> {
>   CHECK(size > 0);
>   while (true) {
> ssize_t length = send(s, data, size, MSG_NOSIGNAL);
> #ifdef __WINDOWS__
> int error = WSAGetLastError();
> #else
> int error = errno;
> #endif // __WINDOWS__
> if (length < 0 && net::is_restartable_error(error)) {
>   // Interrupted, try again now.
>   continue;
> } else if (length < 0 && net::is_retryable_error(error)) {
>   // Might block, try again later.
>   return io::poll(s, io::WRITE)
> .then(lambda::bind(::socket_send_data, s, data, size));
> } else if (length <= 0) {
>   // Socket error or closed.
>   if (length < 0) {
> const string error = os::strerror(errno);
> VLOG(1) << "Socket error while sending: " << error;
>   } else {
> VLOG(1) << "Socket closed while sending";
>   }
>   if (length == 0) {
> return length;
>   } else {
> return Failure(ErrnoError("Socket send failed"));
>   }
> } else {
>   CHECK(length > 0);
>   return length;
> }
>   }
> }
> {code}
> If the last reference to the {{Socket}} goes away before the 
> {{socket_send_data}} loop completes, then we will write to a stale fd!
> It turns out that we have avoided this issue because in libprocess we happen 
> to keep a reference to the {{Socket}} around when sending:
> https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/process.cpp#L1678-L1707
> {code}
> void send(Encoder* encoder, Socket socket)
> {
>   switch (encoder->kind()) {
> case Encoder::DATA: {
>   size_t size;
>   const char* data = static_cast(encoder)->next();
>   socket.send(data, size)
> .onAny(lambda::bind(
> ::_send,
> lambda::_1,
> socket,
> encoder,
> size));
>   break;
> }
> case Encoder::FILE: {
>   off_t offset;
>   size_t size;
>   int fd = static_cast(encoder)->next(, );
>   socket.sendfile(fd, offset, size)
> .onAny(lambda::bind(
> ::_send,
> lambda::_1,
> socket,
> encoder,
> size));
>   break;
> }
>   }
> }
> {code}
> However, this may not be true in all call-sites going forward. Currently, it 
> appears that http::Connection can trigger this bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6012) Cannot compile resource estimator modules against installed Mesos headers

2016-08-09 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414287#comment-15414287
 ] 

Jie Yu commented on MESOS-6012:
---

Hum, you sure you're using 1.0.0?

We create a symlink from slave -> agent in include/mesos. Can you double check 
if it's there or not?

> Cannot compile resource estimator modules against installed Mesos headers
> -
>
> Key: MESOS-6012
> URL: https://issues.apache.org/jira/browse/MESOS-6012
> Project: Mesos
>  Issue Type: Bug
>  Components: modules
>Affects Versions: 1.0.0
>Reporter: Matthias Bach
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.1
>
>
> As of version 1.0.0 it is no longer possible to compile custom resource 
> estimator modules against the installed Mesos headers. The error message that 
> is occurs on attempt is: 
> {{/usr/include/mesos/module/resource_estimator.hpp:23:46: fatal error: 
> mesos/slave/resource_estimator.hpp: No such file or directory}}.
> The root cause for this seems to be that on installation headers get moved 
> from {{mesos/slave}} to {{mesos/agent}}. Thus, the header path in 
> {{mesos/module/resource_estimator.hpp}} will resolve correctly during the 
> Mesos build, but not when compiling code against the installed headers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6012) Cannot compile resource estimator modules against installed Mesos headers

2016-08-09 Thread Kevin Klues (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414278#comment-15414278
 ] 

Kevin Klues commented on MESOS-6012:


Can you show us the contents of the folder where your installation headers are 
installed?
Is there no symlink in that folder from {{agent --> slave}}.
 

> Cannot compile resource estimator modules against installed Mesos headers
> -
>
> Key: MESOS-6012
> URL: https://issues.apache.org/jira/browse/MESOS-6012
> Project: Mesos
>  Issue Type: Bug
>  Components: modules
>Affects Versions: 1.0.0
>Reporter: Matthias Bach
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.1
>
>
> As of version 1.0.0 it is no longer possible to compile custom resource 
> estimator modules against the installed Mesos headers. The error message that 
> is occurs on attempt is: 
> {{/usr/include/mesos/module/resource_estimator.hpp:23:46: fatal error: 
> mesos/slave/resource_estimator.hpp: No such file or directory}}.
> The root cause for this seems to be that on installation headers get moved 
> from {{mesos/slave}} to {{mesos/agent}}. Thus, the header path in 
> {{mesos/module/resource_estimator.hpp}} will resolve correctly during the 
> Mesos build, but not when compiling code against the installed headers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6015) Design for port-mapper CNI plugin

2016-08-09 Thread Avinash Sridharan (JIRA)

Avinash Sridharan created MESOS-6015:


 Summary: Design for port-mapper CNI plugin
 Key: MESOS-6015
 URL: https://issues.apache.org/jira/browse/MESOS-6015
 Project: Mesos
  Issue Type: Task
  Components: containerization
 Environment: Linux
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan


Create a design doc for port-mapper CNI plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6014) Create a CNI plugin that provides port mapping functionality for various CNI plugins

2016-08-09 Thread Avinash Sridharan (JIRA)

Avinash Sridharan created MESOS-6014:


 Summary: Create a CNI plugin that provides port mapping 
functionality for various CNI plugins
 Key: MESOS-6014
 URL: https://issues.apache.org/jira/browse/MESOS-6014
 Project: Mesos
  Issue Type: Epic
  Components: containerization
 Environment: Linux
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan
 Fix For: 1.1.0


Currently there is no CNI plugin that supports port mapping. Given that the 
unified containerizer is starting to become the de-facto container run time, 
having  a CNI plugin that provides port mapping is a must have. This is 
primarily required for support BRIDGE networking mode, similar to docker bridge 
networking that users expect to have when using docker containers. 

While the most obvious use case is that of using the port-mapper plugin with 
the bridge plugin, the port-mapping functionality itself is generic and should 
be usable with any CNI plugin that needs it.

Keeping port-mapping as a CNI plugin gives operators the ability to use the 
default port-mapper (CNI plugin) that Mesos provides, or use their own plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6012) Cannot compile resource estimator modules against installed Mesos headers

2016-08-09 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6012:
--
Labels: mesosphere  (was: )

> Cannot compile resource estimator modules against installed Mesos headers
> -
>
> Key: MESOS-6012
> URL: https://issues.apache.org/jira/browse/MESOS-6012
> Project: Mesos
>  Issue Type: Bug
>  Components: modules
>Affects Versions: 1.0.0
>Reporter: Matthias Bach
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.1
>
>
> As of version 1.0.0 it is no longer possible to compile custom resource 
> estimator modules against the installed Mesos headers. The error message that 
> is occurs on attempt is: 
> {{/usr/include/mesos/module/resource_estimator.hpp:23:46: fatal error: 
> mesos/slave/resource_estimator.hpp: No such file or directory}}.
> The root cause for this seems to be that on installation headers get moved 
> from {{mesos/slave}} to {{mesos/agent}}. Thus, the header path in 
> {{mesos/module/resource_estimator.hpp}} will resolve correctly during the 
> Mesos build, but not when compiling code against the installed headers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6011) Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)

2016-08-09 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414263#comment-15414263
 ] 

Gilbert Song commented on MESOS-6011:
-

The license issue does not allow us to do this in Mesos.

I will fix it. Thanks for catching it [~nfnt]. :)


> Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)
> --
>
> Key: MESOS-6011
> URL: https://issues.apache.org/jira/browse/MESOS-6011
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: Jan Schlicht
>  Labels: test
>
> Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
> {{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
> Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
> the binaries provided by the rootfs link to certain versions of shared 
> libraries. Because Fedora 24 has newer versions of some of these libraries, 
> tests using the binaries will fail. E.g.
> {noformat}
> $ ldd /bin/sh
>   linux-vdso.so.1 (0x7ffc98bfb000)
>   libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
>   libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
>   libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
>   /lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
> {noformat}
> but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.5}} into 
> the rootfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-6011) Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)

2016-08-09 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-6011:
---

Assignee: Gilbert Song

> Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)
> --
>
> Key: MESOS-6011
> URL: https://issues.apache.org/jira/browse/MESOS-6011
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: Jan Schlicht
>Assignee: Gilbert Song
>  Labels: test
>
> Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
> {{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
> Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
> the binaries provided by the rootfs link to certain versions of shared 
> libraries. Because Fedora 24 has newer versions of some of these libraries, 
> tests using the binaries will fail. E.g.
> {noformat}
> $ ldd /bin/sh
>   linux-vdso.so.1 (0x7ffc98bfb000)
>   libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
>   libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
>   libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
>   /lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
> {noformat}
> but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.5}} into 
> the rootfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6012) Cannot compile resource estimator modules against installed Mesos headers

2016-08-09 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6012:
--
Priority: Blocker  (was: Major)

> Cannot compile resource estimator modules against installed Mesos headers
> -
>
> Key: MESOS-6012
> URL: https://issues.apache.org/jira/browse/MESOS-6012
> Project: Mesos
>  Issue Type: Bug
>  Components: modules
>Affects Versions: 1.0.0
>Reporter: Matthias Bach
>Priority: Blocker
>
> As of version 1.0.0 it is no longer possible to compile custom resource 
> estimator modules against the installed Mesos headers. The error message that 
> is occurs on attempt is: 
> {{/usr/include/mesos/module/resource_estimator.hpp:23:46: fatal error: 
> mesos/slave/resource_estimator.hpp: No such file or directory}}.
> The root cause for this seems to be that on installation headers get moved 
> from {{mesos/slave}} to {{mesos/agent}}. Thus, the header path in 
> {{mesos/module/resource_estimator.hpp}} will resolve correctly during the 
> Mesos build, but not when compiling code against the installed headers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3370) Deprecate the external containerizer

2016-08-09 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-3370:
--
Assignee: Gilbert Song

> Deprecate the external containerizer
> 
>
> Key: MESOS-3370
> URL: https://issues.apache.org/jira/browse/MESOS-3370
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Gilbert Song
> Fix For: 1.1.0
>
>
> To our knowledge, no one is using the external containerizer and we could 
> clean up code paths in the slave and containerizer interface (the dual 
> launch() signatures)
> In a deprecation cycle, we can move this code into a module (dependent on 
> containerizer modules landing) and from there, move it into it's own repo



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5879) cgroups/net_cls isolator causing agent recovery issues

2016-08-09 Thread Silas Snider (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414016#comment-15414016
 ] 

Silas Snider commented on MESOS-5879:
-

[~jieyu] -- are you willing to shepherd this?

> cgroups/net_cls isolator causing agent recovery issues
> --
>
> Key: MESOS-5879
> URL: https://issues.apache.org/jira/browse/MESOS-5879
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation, slave
>Reporter: Silas Snider
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> We run with 'cgroups/net_cls' in our isolator list, and when we restart any 
> agent process in a cluster running an experimental custom isolator as well, 
> the agents are unable to recover from checkpoint, because net_cls reports 
> that unknown orphan containers have duplicate net_cls handles.
> While this is a problem that needs to be solved (probably by fixing our 
> custom isolator), it's also a problem that the net_cls isolator fails 
> recovery just for duplicate handles in cgroups that it is literally about to 
> unconditionally destroy during recovery. Can this be fixed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-08-09 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413941#comment-15413941
 ] 

Gilbert Song commented on MESOS-6010:
-

BTW [~Sunzhe], I am assuming you are using the default docker registry. Then, 
which image are you trying to pull (most likely non-relavant)? I am trying to 
reproduce this error.

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-08-09 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413930#comment-15413930
 ] 

Gilbert Song commented on MESOS-6010:
-

Hmm..interesting. Obviously the responses are not empty but we still cannot 
decode it.

[~Sunzhe], are you consistently getting this error? or occasionally?

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6013) Consider avoiding readdir_r

2016-08-09 Thread Neil Conway (JIRA)

Neil Conway created MESOS-6013:
--

 Summary: Consider avoiding readdir_r
 Key: MESOS-6013
 URL: https://issues.apache.org/jira/browse/MESOS-6013
 Project: Mesos
  Issue Type: Bug
  Components: stout
 Environment: Linux archlinux.vagrant.vm 4.6.4-1-ARCH #1 SMP PREEMPT 
Mon Jul 11 19:12:32 CEST 2016 x86_64 GNU/Linux
Reporter: Neil Conway


Mesos doesn't build on recent Arch Linux:

{noformat}
/bin/sh ../libtool  --tag=CXX   --mode=compile ccache g++ 
-DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
-DPACKAGE_VERSION=\"1.1.0\" -DPACKAGE_STRING=\"mesos\ 1.1.0\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
-DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 
-DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 -DHAVE_LIBSASL2=1 
-DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
-DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. -I../../mesos/src   -Wall -Werror 
-Wsign-compare -DLIBDIR=\"/usr/local/lib\" 
-DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
-DPKGDATADIR=\"/usr/local/share/mesos\" 
-DPKGMODULEDIR=\"/usr/local/lib/mesos/modules\" -I../../mesos/include 
-I../include -I../include/mesos -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS 
-isystem ../3rdparty/boost-1.53.0 -I../3rdparty/elfio-3.1 
-I../3rdparty/glog-0.3.3/src -I../3rdparty/leveldb-1.4/include 
-I../../mesos/3rdparty/libprocess/include -I../3rdparty/nvml-352.79 
-I../3rdparty/picojson-1.3.0 -I../3rdparty/protobuf-2.6.1/src 
-I../../mesos/3rdparty/stout/include 
-I../3rdparty/zookeeper-3.4.8/src/c/include 
-I../3rdparty/zookeeper-3.4.8/src/c/generated -DHAS_AUTHENTICATION=1 
-I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0  
-pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT 
appc/libmesos_no_3rdparty_la-spec.lo -MD -MP -MF 
appc/.deps/libmesos_no_3rdparty_la-spec.Tpo -c -o 
appc/libmesos_no_3rdparty_la-spec.lo `test -f 'appc/spec.cpp' || echo 
'../../mesos/src/'`appc/spec.cpp
libtool: compile:  ccache g++ -DPACKAGE_NAME=\"mesos\" 
-DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.1.0\" 
"-DPACKAGE_STRING=\"mesos 1.1.0\"" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" 
-DPACKAGE=\"mesos\" -DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 
-DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 
-DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 
-DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. 
-I../../mesos/src -Wall -Werror -Wsign-compare -DLIBDIR=\"/usr/local/lib\" 
-DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
-DPKGDATADIR=\"/usr/local/share/mesos\" 
-DPKGMODULEDIR=\"/usr/local/lib/mesos/modules\" -I../../mesos/include 
-I../include -I../include/mesos -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS 
-isystem ../3rdparty/boost-1.53.0 -I../3rdparty/elfio-3.1 
-I../3rdparty/glog-0.3.3/src -I../3rdparty/leveldb-1.4/include 
-I../../mesos/3rdparty/libprocess/include -I../3rdparty/nvml-352.79 
-I../3rdparty/picojson-1.3.0 -I../3rdparty/protobuf-2.6.1/src 
-I../../mesos/3rdparty/stout/include 
-I../3rdparty/zookeeper-3.4.8/src/c/include 
-I../3rdparty/zookeeper-3.4.8/src/c/generated -DHAS_AUTHENTICATION=1 
-I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 
-pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 -MT 
appc/libmesos_no_3rdparty_la-spec.lo -MD -MP -MF 
appc/.deps/libmesos_no_3rdparty_la-spec.Tpo -c ../../mesos/src/appc/spec.cpp  
-fPIC -DPIC -o appc/.libs/libmesos_no_3rdparty_la-spec.o
In file included from ../../mesos/3rdparty/stout/include/stout/os.hpp:52:0,
 from ../../mesos/src/appc/spec.cpp:17:
../../mesos/3rdparty/stout/include/stout/os/ls.hpp: In function 
‘Try > os::ls(const 
string&)’:
../../mesos/3rdparty/stout/include/stout/os/ls.hpp:56:19: error: ‘int 
readdir_r(DIR*, dirent*, dirent**)’ is deprecated 
[-Werror=deprecated-declarations]
   while ((error = readdir_r(dir, temp, )) == 0 && entry != nullptr) {
   ^
In file included from ../../mesos/3rdparty/stout/include/stout/os/ls.hpp:19:0,
 from ../../mesos/3rdparty/stout/include/stout/os.hpp:52,
 from ../../mesos/src/appc/spec.cpp:17:
/usr/include/dirent.h:183:12: note: declared here
 extern int readdir_r (DIR *__restrict

[jira] [Commented] (MESOS-6004) Tasks fail when provisioning multiple containers with large docker images using copy backend

2016-08-09 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413863#comment-15413863
 ] 

Gilbert Song commented on MESOS-6004:
-

Yeah, the overlayfs backend currently is limited to around ~35 layers image, 
which is caused by the kernel overlay/aufs limited argument string size. We 
need some workaround for that. Please see MESOS-6000 for possible workarounds.

>  Tasks fail when provisioning multiple containers with large docker images 
> using copy backend
> -
>
> Key: MESOS-6004
> URL: https://issues.apache.org/jira/browse/MESOS-6004
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.28.2, 1.0.0
> Environment: h4. Agent Platform
> - Ubuntu 16.04
> - AWS g2.x2large instance
> - Nvidia support enabled
> h4. Agent Configuration
> -{noformat}
> --containerizers=mesos,docker
> --docker_config=
> --docker_store_dir=/mnt/mesos/store/docker
> --executor_registration_timeout=3mins
> --hostname=
> --image_providers=docker
> --image_provisioner_backend=copy
> --isolation=filesystem/linux,docker/runtime,cgroups/devices,gpu/nvidia
> --switch_user=false
> --work_dir=/mnt/mesos
> {noformat}
> h4. Framework
> - custom framework written in python
> - using unified containerizer with docker images
> h4. Test Setup
> * 1 master
> * 1 agent
> * 5 tasks scheduled at the same time:
> ** resources: cpus: 0.1, mem: 128
> ** command: `echo test`
> ** docker image: custom docker image, based on nvidia/cuda ~5gb
> ** the same docker image was for all tasks, already pulled.
>Reporter: Michael Thomas
>  Labels: containerizer, docker, performance
>
> When scheduling more than one task on the same agent, all tasks fail a as 
> containers seem to be destroyed during provisioning.
> Specifically, the errors on the agent logs are:
> {noformat}
>  E0808 15:53:09.691315 30996 slave.cpp:3976] Container 
> 'eb20f642-bb90-4293-8eec-6f1576ccaeb1' for executor '3' of framework 
> c9852a23-bc07-422d-8d69-23c167a1924d-0001 failed to start: Container is being 
> destroyed during provisioning
> {noformat}
> and 
> {noformat}
> I0808 15:52:32.510210 30999 slave.cpp:4539] Terminating executor ''2' of 
> framework c9852a23-bc07-422d-8d69-23c167a1924d-0001' because it did not 
> register within 3mins
> {noformat}
> As the default provisioning method {{copy}} is being used, I assume this is 
> due to the provisioning of multiple containers taking too long and the agent 
> will not wait. For large images, this method is simply not performant.
> The issue did not occur, when only one tasks was scheduled.
> Increasing the {{executor_registration_timeout}} parameter, seemed to help a 
> bit as it allowed to schedule at least 2 tasks at the same time. But still 
> fails with more (5 in this case)
> h4. Complete logs
> (with GLOG_v=1)
> {noformat}
> Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800375  
> 3738 slave.cpp:198] Agent started on 1)@172.31.23.17:5051
> Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800403  
> 3738 slave.cpp:199] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos,docker" --default_role="*" 
> --disk_watch_interval="1mins" --docker="docker" --docker_config="XXX" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
> --docker_stop_timeout="0ns" --docker_store_dir="/mnt/t" --docker_volume_checkp
> Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: 
> oint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" 
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" 
> --hadoop_home="" --help="false" 
> --hostname="ec2-52-59-113-0.eu-central-1.compute.amazonaws.com" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_command_executor="false" --image_providers="docker" 
> --image_provisioner_backend="copy" --initialize_driver_logging="true" 
> --isolation="filesystem/linux,docker/runtime,cgroups/devices,gpu/nvidia" 
> --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" 
>

[jira] [Commented] (MESOS-5981) task failed in windows Server 2012 client, test-framwork example

2016-08-09 Thread Sergei (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413799#comment-15413799
 ] 

Sergei commented on MESOS-5981:
---

And another one question. If I try to start simple command("echo hello", for 
example) without custom executor on windows agent, it's failed. As I 
understand, defult executor try to launch my simple task with 'sh' ( for example). It's bug or I don't understand something?
Thanks

> task failed in windows Server 2012 client, test-framwork example
> 
>
> Key: MESOS-5981
> URL: https://issues.apache.org/jira/browse/MESOS-5981
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.1.0
> Environment: Master: Ubuntu 14.04 (VirtualBox)
> Agent: Win Server 2012 (VirtualBox)
>Reporter: Sergei
>
> I tried to launch test-framework example. master on ubuntu machine.
> if start agent on ubuntu too, test-framework complete fine.
> but agent can't execute task on windows server 2012.
> agent info log:
> W0803 15:21:55.749181  2468 fcntl.hpp:82] `os::isNonblock` has been called, 
> but is a stub on Windows
> W0803 15:21:55.749181  2468 fcntl.hpp:82] `os::isNonblock` has been called, 
> but is a stub on Windows
> I0803 15:21:55.749181  2972 slave.cpp:1095] Registered with master 
> master@10.10.81.91:5050; given agent ID 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-S3
> I0803 15:21:55.749181  2972 slave.cpp:1155] Forwarding total oversubscribed 
> resources 
> I0803 15:21:55.749181  2972 status_update_manager.cpp:184] Resuming sending 
> status updates
> W0803 15:21:55.811792  2468 fcntl.hpp:82] `os::isNonblock` has been called, 
> but is a stub on Windows
> W0803 15:22:06.905844  2468 fcntl.hpp:82] `os::isNonblock` has been called, 
> but is a stub on Windows
> I0803 15:22:06.921710  2532 slave.cpp:1495] Got assigned task 0 for framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:06.921710  2532 slave.cpp:1614] Launching task 0 for framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:06.921710  2532 slave.cpp:5764] Launching executor default of 
> framework e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 with resources  in work 
> directory 
> 'C:\TEMP\mesosTemp\slaves\e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-S3\frameworks\e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002\executors\default\runs\8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab'
> I0803 15:22:06.921710  2532 slave.cpp:1840] Queuing task '0' for executor 
> 'default' of framework e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:06.921710  2532 containerizer.cpp:786] Starting container 
> '8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab' for executor 'default' of framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:06.936492  2532 launcher.cpp:126] Forked child with pid '2172' 
> for container '8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab'
> I0803 15:22:06.936492  2532 containerizer.cpp:1334] Checkpointing executor's 
> forked pid 2172 to 
> 'C:\TEMP\mesosTemp\meta\slaves\e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-S3\frameworks\e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002\executors\default\runs\8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab\pids\forked.pid'
> I0803 15:22:07.046447  2532 containerizer.cpp:1878] Executor for container 
> '8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab' has exited
> I0803 15:22:07.046447  2532 containerizer.cpp:1637] Destroying container 
> '8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab'
> W0803 15:22:07.108556  2532 status_utils.hpp:31] `WSTRINGIFY` has been 
> called, but it is not implemented.
> I0803 15:22:07.108556  2532 slave.cpp:4142] Executor 'default' of framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 
> I0803 15:22:07.108556  2532 slave.cpp:3264] Handling status update 
> TASK_FAILED (UUID: 4ef373ff-4f93-4e40-80c4-9335f2b463a5) for task 0 of 
> framework e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 from @0.0.0.0:0
> W0803 15:22:07.108556  2532 containerizer.cpp:1466] Ignoring update for 
> unknown container: 8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab
> I0803 15:22:07.108556  2532 status_update_manager.cpp:323] Received status 
> update TASK_FAILED (UUID: 4ef373ff-4f93-4e40-80c4-9335f2b463a5) for task 0 of 
> framework e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:07.108556  2532 status_update_manager.cpp:828] Checkpointing 
> UPDATE for status update TASK_FAILED (UUID: 
> 4ef373ff-4f93-4e40-80c4-9335f2b463a5) for task 0 of framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:07.124591  2532 slave.cpp:3657] Forwarding the update TASK_FAILED 
> (UUID: 4ef373ff-4f93-4e40-80c4-9335f2b463a5) for task 0 of framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 to master@10.10.81.91:5050
> I0803 15:22:07.140297  2532 slave.cpp:2218] Asked to shut down framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 by master@10.10.81.91:5050
> I0803 15:22:07.140297  2532

[jira] [Updated] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-08-09 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6010:
--
Summary: Docker registry puller shows decode error "No response decoded".  
(was: Docker registry puller shows decode error ")

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6010) Docker registry puller shows decode error "

2016-08-09 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6010:
--
Summary: Docker registry puller shows decode error "  (was: Can't pull 
Docker images when executing mesos-execute)

> Docker registry puller shows decode error "
> ---
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5996) Windows mesos-containerizer crashes

2016-08-09 Thread Lior Zeno (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lior Zeno updated MESOS-5996:
-
Description: 
I've been trying to run a mesos cluster with a windows agent. However, I can't 
run a task on windows since the container always fail with the following 
message: "failed to parse the command flag, syntax error at line 1", followed 
by a json that represents the Marathon job.  The json did not have a command 
key and it did not contain the command I was trying to run (notepad). 

I followed the instructions in the getting started section, and cloned the 
following repository: https://git-wip-us.apache.org/repos/asf/mesos.git. I did 
not use the 1.0 release tarball since it does not include the bootstrap batch 
script for windows.

Steps to reproduce:
# Start a mesos cluster with 4 nodes (3 Ubuntu 14.04 LTS nodes, and 1 Windows 
Server 2012 R2 node).
# Submit an application via Marathon, using a hostname constraint with 
"notepad" as command.


  was:
I've been trying to run a mesos cluster with a windows agent. However, I can't 
run a task on windows since the container always fail with the following 
message: "failed to parse the command flag", followed by a json. I don't have 
the exact message right now, but I'll update the ticket with it as soon as I 
have it. The json did not have a command key and it did not contain the command 
I was trying to run (notepad).

I followed the instructions in the getting started section, and cloned the 
following repository: https://git-wip-us.apache.org/repos/asf/mesos.git. I did 
not use the 1.0 release tarball since it does not include the bootstrap batch 
script for windows.

Steps to reproduce:
# Start a mesos cluster with 4 nodes (3 Ubuntu 14.04 LTS nodes, and 1 Windows 
Server 2012 R2 node).
# Submit an application via Marathon, using a hostname constraint with 
"notepad" as command.



> Windows mesos-containerizer crashes
> ---
>
> Key: MESOS-5996
> URL: https://issues.apache.org/jira/browse/MESOS-5996
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Windows Server 2012 R2
> Marathon 1.2.0 RC6
>Reporter: Lior Zeno
>  Labels: windows
>
> I've been trying to run a mesos cluster with a windows agent. However, I 
> can't run a task on windows since the container always fail with the 
> following message: "failed to parse the command flag, syntax error at line 
> 1", followed by a json that represents the Marathon job.  The json did not 
> have a command key and it did not contain the command I was trying to run 
> (notepad). 
> I followed the instructions in the getting started section, and cloned the 
> following repository: https://git-wip-us.apache.org/repos/asf/mesos.git. I 
> did not use the 1.0 release tarball since it does not include the bootstrap 
> batch script for windows.
> Steps to reproduce:
> # Start a mesos cluster with 4 nodes (3 Ubuntu 14.04 LTS nodes, and 1 Windows 
> Server 2012 R2 node).
> # Submit an application via Marathon, using a hostname constraint with 
> "notepad" as command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6012) Cannot compile resource estimator modules against installed Mesos headers

2016-08-09 Thread Matthias Bach (JIRA)

Matthias Bach created MESOS-6012:


 Summary: Cannot compile resource estimator modules against 
installed Mesos headers
 Key: MESOS-6012
 URL: https://issues.apache.org/jira/browse/MESOS-6012
 Project: Mesos
  Issue Type: Bug
  Components: modules
Affects Versions: 1.0.0
Reporter: Matthias Bach


As of version 1.0.0 it is no longer possible to compile custom resource 
estimator modules against the installed Mesos headers. The error message that 
is occurs on attempt is: 
{{/usr/include/mesos/module/resource_estimator.hpp:23:46: fatal error: 
mesos/slave/resource_estimator.hpp: No such file or directory}}.

The root cause for this seems to be that on installation headers get moved from 
{{mesos/slave}} to {{mesos/agent}}. Thus, the header path in 
{{mesos/module/resource_estimator.hpp}} will resolve correctly during the Mesos 
build, but not when compiling code against the installed headers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-08-09 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3937:

Assignee: (was: Jan Schlicht)

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master!
> I1117 15:08:09.296187 26399 master.cpp:1379]

[jira] [Commented] (MESOS-6011) Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)

2016-08-09 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413442#comment-15413442
 ] 

Jan Schlicht commented on MESOS-6011:
-

How about we use something like {{busybox}} in the rootfs? It would provide 
static binaries for many commands.

> Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)
> --
>
> Key: MESOS-6011
> URL: https://issues.apache.org/jira/browse/MESOS-6011
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: Jan Schlicht
>  Labels: test
>
> Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
> {{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
> Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
> the binaries provided by the rootfs link to certain versions of shared 
> libraries. Because Fedora 24 has newer versions of some of these libraries, 
> tests using the binaries will fail. E.g.
> {noformat}
> $ ldd /bin/sh
>   linux-vdso.so.1 (0x7ffc98bfb000)
>   libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
>   libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
>   libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
>   /lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
> {noformat}
> but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.5}} into 
> the rootfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6011) Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)

2016-08-09 Thread Jan Schlicht (JIRA)

Jan Schlicht created MESOS-6011:
---

 Summary: Tests relying on LinuxRootfs::create fail with Fedora 24 
(and probably others)
 Key: MESOS-6011
 URL: https://issues.apache.org/jira/browse/MESOS-6011
 Project: Mesos
  Issue Type: Bug
  Components: containerization, tests
Reporter: Jan Schlicht


Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
{{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
the binaries provided by the rootfs link to certain versions of shared 
libraries. Because Fedora 24 has newer versions of some of these libraries, 
tests using the binaries will fail. E.g.
{noformat}
$ ldd /bin/sh
linux-vdso.so.1 (0x7ffc98bfb000)
libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
/lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
{noformat}
but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.6}} into the 
rootfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6011) Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)

2016-08-09 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-6011:

Description: 
Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
{{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
the binaries provided by the rootfs link to certain versions of shared 
libraries. Because Fedora 24 has newer versions of some of these libraries, 
tests using the binaries will fail. E.g.
{noformat}
$ ldd /bin/sh
linux-vdso.so.1 (0x7ffc98bfb000)
libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
/lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
{noformat}
but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.5}} into the 
rootfs.

  was:
Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
{{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
the binaries provided by the rootfs link to certain versions of shared 
libraries. Because Fedora 24 has newer versions of some of these libraries, 
tests using the binaries will fail. E.g.
{noformat}
$ ldd /bin/sh
linux-vdso.so.1 (0x7ffc98bfb000)
libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
/lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
{noformat}
but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.6}} into the 
rootfs.


> Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)
> --
>
> Key: MESOS-6011
> URL: https://issues.apache.org/jira/browse/MESOS-6011
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: Jan Schlicht
>  Labels: test
>
> Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
> {{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
> Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
> the binaries provided by the rootfs link to certain versions of shared 
> libraries. Because Fedora 24 has newer versions of some of these libraries, 
> tests using the binaries will fail. E.g.
> {noformat}
> $ ldd /bin/sh
>   linux-vdso.so.1 (0x7ffc98bfb000)
>   libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
>   libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
>   libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
>   /lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
> {noformat}
> but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.5}} into 
> the rootfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5981) task failed in windows Server 2012 client, test-framwork example

2016-08-09 Thread Sergei (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413424#comment-15413424
 ] 

Sergei commented on MESOS-5981:
---

Is there any way to compile a mesos client library for windows - libmesos.dll ?
What path should be passed as a -Djava.library.path={?} parameter in 
TestExecutor?


> task failed in windows Server 2012 client, test-framwork example
> 
>
> Key: MESOS-5981
> URL: https://issues.apache.org/jira/browse/MESOS-5981
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.1.0
> Environment: Master: Ubuntu 14.04 (VirtualBox)
> Agent: Win Server 2012 (VirtualBox)
>Reporter: Sergei
>
> I tried to launch test-framework example. master on ubuntu machine.
> if start agent on ubuntu too, test-framework complete fine.
> but agent can't execute task on windows server 2012.
> agent info log:
> W0803 15:21:55.749181  2468 fcntl.hpp:82] `os::isNonblock` has been called, 
> but is a stub on Windows
> W0803 15:21:55.749181  2468 fcntl.hpp:82] `os::isNonblock` has been called, 
> but is a stub on Windows
> I0803 15:21:55.749181  2972 slave.cpp:1095] Registered with master 
> master@10.10.81.91:5050; given agent ID 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-S3
> I0803 15:21:55.749181  2972 slave.cpp:1155] Forwarding total oversubscribed 
> resources 
> I0803 15:21:55.749181  2972 status_update_manager.cpp:184] Resuming sending 
> status updates
> W0803 15:21:55.811792  2468 fcntl.hpp:82] `os::isNonblock` has been called, 
> but is a stub on Windows
> W0803 15:22:06.905844  2468 fcntl.hpp:82] `os::isNonblock` has been called, 
> but is a stub on Windows
> I0803 15:22:06.921710  2532 slave.cpp:1495] Got assigned task 0 for framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:06.921710  2532 slave.cpp:1614] Launching task 0 for framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:06.921710  2532 slave.cpp:5764] Launching executor default of 
> framework e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 with resources  in work 
> directory 
> 'C:\TEMP\mesosTemp\slaves\e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-S3\frameworks\e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002\executors\default\runs\8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab'
> I0803 15:22:06.921710  2532 slave.cpp:1840] Queuing task '0' for executor 
> 'default' of framework e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:06.921710  2532 containerizer.cpp:786] Starting container 
> '8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab' for executor 'default' of framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:06.936492  2532 launcher.cpp:126] Forked child with pid '2172' 
> for container '8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab'
> I0803 15:22:06.936492  2532 containerizer.cpp:1334] Checkpointing executor's 
> forked pid 2172 to 
> 'C:\TEMP\mesosTemp\meta\slaves\e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-S3\frameworks\e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002\executors\default\runs\8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab\pids\forked.pid'
> I0803 15:22:07.046447  2532 containerizer.cpp:1878] Executor for container 
> '8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab' has exited
> I0803 15:22:07.046447  2532 containerizer.cpp:1637] Destroying container 
> '8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab'
> W0803 15:22:07.108556  2532 status_utils.hpp:31] `WSTRINGIFY` has been 
> called, but it is not implemented.
> I0803 15:22:07.108556  2532 slave.cpp:4142] Executor 'default' of framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 
> I0803 15:22:07.108556  2532 slave.cpp:3264] Handling status update 
> TASK_FAILED (UUID: 4ef373ff-4f93-4e40-80c4-9335f2b463a5) for task 0 of 
> framework e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 from @0.0.0.0:0
> W0803 15:22:07.108556  2532 containerizer.cpp:1466] Ignoring update for 
> unknown container: 8bc8d68a-e42d-4189-a6d4-de4d7a7be2ab
> I0803 15:22:07.108556  2532 status_update_manager.cpp:323] Received status 
> update TASK_FAILED (UUID: 4ef373ff-4f93-4e40-80c4-9335f2b463a5) for task 0 of 
> framework e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:07.108556  2532 status_update_manager.cpp:828] Checkpointing 
> UPDATE for status update TASK_FAILED (UUID: 
> 4ef373ff-4f93-4e40-80c4-9335f2b463a5) for task 0 of framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:07.124591  2532 slave.cpp:3657] Forwarding the update TASK_FAILED 
> (UUID: 4ef373ff-4f93-4e40-80c4-9335f2b463a5) for task 0 of framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 to master@10.10.81.91:5050
> I0803 15:22:07.140297  2532 slave.cpp:2218] Asked to shut down framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002 by master@10.10.81.91:5050
> I0803 15:22:07.140297  2532 slave.cpp:2243] Shutting down framework 
> e1b7f9cc-c8df-492d-9e28-9b1c1423ed83-0002
> I0803 15:22:07.140297  2532

[jira] [Commented] (MESOS-5688) `PortMappingIsolatorTest.ROOT_DNS` fails on Fedora 23.

2016-08-09 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413406#comment-15413406
 ] 

Jan Schlicht commented on MESOS-5688:
-

Also affects Fedora 24. {{host}} is part of the {{bind-utils}} package which 
may not be installed by default. Installing that package resolves the problem.
It would be preferable to disable this test when {{host}} is not available, 
similar to how it's done for other tests (e.g. {{ROOT_NS_\*}} tests)

> `PortMappingIsolatorTest.ROOT_DNS` fails on Fedora 23.
> --
>
> Key: MESOS-5688
> URL: https://issues.apache.org/jira/browse/MESOS-5688
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network
> Environment: Fedora 23 with network isolation
>Reporter: Gilbert Song
>  Labels: isolation, mesosphere, network, tests
>
> Here is the log:
> {noformat}
> [20:18:04] :   [Step 10/10] [ RUN  ] PortMappingIsolatorTest.ROOT_DNS
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.877822 28395 
> port_mapping_tests.cpp:229] Using eth0 as the public interface
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.878106 28395 
> port_mapping_tests.cpp:237] Using lo as the loopback interface
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.891363 28395 resources.cpp:572] 
> Parsing resources as JSON failed: 
> cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
> [20:18:04]W:   [Step 10/10] Trying semicolon-delimited string format instead
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.892331 28395 
> port_mapping.cpp:1557] Using eth0 as the public interface
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.892638 28395 
> port_mapping.cpp:1582] Using lo as the loopback interface
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893723 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893770 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893806 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384   4194304'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893838 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893875 28395 
> port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893908 28395 
> port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893937 28395 
> port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893968 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380   6291456'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.893999 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.894029 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.894060 28395 
> port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.894093 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.894124 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.894153 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.894186 28395 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.902745 28395 
> linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.902940 28395 resources.cpp:572] 
> Parsing resources as JSON failed: ports:[31000-31499]
> [20:18:04]W:   [Step 10/10] Trying semicolon-delimited string format instead
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.903404 28412 
> port_mapping.cpp:2512] Using non-ephemeral ports {[31000,31500)} and 
> ephemeral ports [30016,30032) for container container1 of executor ''
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.904423 28395 
> linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | 
> CLONE_NEWNET
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.977530 28416 
> port_mapping.cpp:2576] Bind mounted '/proc/15781/ns/net' to 
> '/run/netns/15781' for container container1
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.977715 28416 
> port_mapping.cpp:2607] Created network namespace handle symlink 
> '/var/run/mesos/netns/container1' -> '/run/netns/15781'
> [20:18:04]W:   [Step 10/10] I0622 20:18:04.978752 28416 
>

[jira] [Commented] (MESOS-6004) Tasks fail when provisioning multiple containers with large docker images using copy backend

2016-08-09 Thread Michael Thomas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413374#comment-15413374
 ] 

Michael Thomas commented on MESOS-6004:
---


1. I don't think its the downloading, as the image was already cached locally. 
the image_provision_backend {{overlayer}} does not work for me, as I am missing 
the kernel extension of that.
2.  I updated the logs with GLOG_v=1

Also I tried with a smaller image (nvidia/cuda, 1.5GB 21 layers) and this works 
find and seems reasonable fast

>  Tasks fail when provisioning multiple containers with large docker images 
> using copy backend
> -
>
> Key: MESOS-6004
> URL: https://issues.apache.org/jira/browse/MESOS-6004
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.28.2, 1.0.0
> Environment: h4. Agent Platform
> - Ubuntu 16.04
> - AWS g2.x2large instance
> - Nvidia support enabled
> h4. Agent Configuration
> -{noformat}
> --containerizers=mesos,docker
> --docker_config=
> --docker_store_dir=/mnt/mesos/store/docker
> --executor_registration_timeout=3mins
> --hostname=
> --image_providers=docker
> --image_provisioner_backend=copy
> --isolation=filesystem/linux,docker/runtime,cgroups/devices,gpu/nvidia
> --switch_user=false
> --work_dir=/mnt/mesos
> {noformat}
> h4. Framework
> - custom framework written in python
> - using unified containerizer with docker images
> h4. Test Setup
> * 1 master
> * 1 agent
> * 5 tasks scheduled at the same time:
> ** resources: cpus: 0.1, mem: 128
> ** command: `echo test`
> ** docker image: custom docker image, based on nvidia/cuda ~5gb
> ** the same docker image was for all tasks, already pulled.
>Reporter: Michael Thomas
>  Labels: containerizer, docker, performance
>
> When scheduling more than one task on the same agent, all tasks fail a as 
> containers seem to be destroyed during provisioning.
> Specifically, the errors on the agent logs are:
> {noformat}
>  E0808 15:53:09.691315 30996 slave.cpp:3976] Container 
> 'eb20f642-bb90-4293-8eec-6f1576ccaeb1' for executor '3' of framework 
> c9852a23-bc07-422d-8d69-23c167a1924d-0001 failed to start: Container is being 
> destroyed during provisioning
> {noformat}
> and 
> {noformat}
> I0808 15:52:32.510210 30999 slave.cpp:4539] Terminating executor ''2' of 
> framework c9852a23-bc07-422d-8d69-23c167a1924d-0001' because it did not 
> register within 3mins
> {noformat}
> As the default provisioning method {{copy}} is being used, I assume this is 
> due to the provisioning of multiple containers taking too long and the agent 
> will not wait. For large images, this method is simply not performant.
> The issue did not occur, when only one tasks was scheduled.
> Increasing the {{executor_registration_timeout}} parameter, seemed to help a 
> bit as it allowed to schedule at least 2 tasks at the same time. But still 
> fails with more (5 in this case)
> h4. Complete logs
> (with GLOG_v=1)
> {noformat}
> Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800375  
> 3738 slave.cpp:198] Agent started on 1)@172.31.23.17:5051
> Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800403  
> 3738 slave.cpp:199] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos,docker" --default_role="*" 
> --disk_watch_interval="1mins" --docker="docker" --docker_config="XXX" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
> --docker_stop_timeout="0ns" --docker_store_dir="/mnt/t" --docker_volume_checkp
> Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: 
> oint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" 
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" 
> --hadoop_home="" --help="false" 
> --hostname="ec2-52-59-113-0.eu-central-1.compute.amazonaws.com" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_command_executor="false" --image_providers="docker" 
> --image_provisioner_backend="copy" --initialize_driver_logging="true" 
>

[jira] [Updated] (MESOS-6004) Tasks fail when provisioning multiple containers with large docker images using copy backend

2016-08-09 Thread Michael Thomas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Thomas updated MESOS-6004:
--
Description: 
When scheduling more than one task on the same agent, all tasks fail a as 
containers seem to be destroyed during provisioning.

Specifically, the errors on the agent logs are:

{noformat}
 E0808 15:53:09.691315 30996 slave.cpp:3976] Container 
'eb20f642-bb90-4293-8eec-6f1576ccaeb1' for executor '3' of framework 
c9852a23-bc07-422d-8d69-23c167a1924d-0001 failed to start: Container is being 
destroyed during provisioning
{noformat}

and 

{noformat}
I0808 15:52:32.510210 30999 slave.cpp:4539] Terminating executor ''2' of 
framework c9852a23-bc07-422d-8d69-23c167a1924d-0001' because it did not 
register within 3mins
{noformat}

As the default provisioning method {{copy}} is being used, I assume this is due 
to the provisioning of multiple containers taking too long and the agent will 
not wait. For large images, this method is simply not performant.

The issue did not occur, when only one tasks was scheduled.

Increasing the {{executor_registration_timeout}} parameter, seemed to help a 
bit as it allowed to schedule at least 2 tasks at the same time. But still 
fails with more (5 in this case)



h4. Complete logs

(with GLOG_v=1)

{noformat}
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800375  3738 
slave.cpp:198] Agent started on 1)@172.31.23.17:5051
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800403  3738 
slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos,docker" --default_role="*" 
--disk_watch_interval="1mins" --docker="docker" --docker_config="XXX" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
--docker_stop_timeout="0ns" --docker_store_dir="/mnt/t" --docker_volume_checkp
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: 
oint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname="ec2-52-59-113-0.eu-central-1.compute.amazonaws.com" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_command_executor="false" --image_providers="docker" 
--image_provisioner_backend="copy" --initialize_driver_logging="true" 
--isolation="filesystem/linux,docker/runtime,cgroups/devices,gpu/nvidia" 
--launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --master="zk://172.31.19.240:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --re
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: covery_timeout="15mins" 
--registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/mnt/mesos"
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: 2016-08-09 
10:11:41,800:3738(0x7f71172a1700):ZOO_INFO@check_events@1728: initiated 
connection to server [172.31.19.240:2181]
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.801007  3738 
slave.cpp:519] Agent resources: gpus(*):1; cpus(*):8; mem(*):14014; 
disk(*):60257; ports(*):[31000-32000]
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.801059  3738 
slave.cpp:527] Agent attributes: [  ]
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.801077  3738 
slave.cpp:532] Agent hostname: 
ec2-52-59-113-0.eu-central-1.compute.amazonaws.com
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.802232  3772 
state.cpp:57] Recovering state from '/mnt/mesos/meta'
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.803539  3774 
status_update_manager.cpp:200] Recovering status update manager
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.803745  3772 
containerizer.cpp:522] Recovering containerizer
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.803788  3770 
docker.cpp:775]

[jira] [Updated] (MESOS-6004) Tasks fail when provisioning multiple containers with large docker images using copy backend

2016-08-09 Thread Michael Thomas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Thomas updated MESOS-6004:
--
Description: 
When scheduling more than one task on the same agent, all tasks fail a as 
containers seem to be destroyed during provisioning.

Specifically, the errors on the agent logs are:

{noformat}
 E0808 15:53:09.691315 30996 slave.cpp:3976] Container 
'eb20f642-bb90-4293-8eec-6f1576ccaeb1' for executor '3' of framework 
c9852a23-bc07-422d-8d69-23c167a1924d-0001 failed to start: Container is being 
destroyed during provisioning
{noformat}

and 

{noformat}
I0808 15:52:32.510210 30999 slave.cpp:4539] Terminating executor ''2' of 
framework c9852a23-bc07-422d-8d69-23c167a1924d-0001' because it did not 
register within 3mins
{noformat}

As the default provisioning method {{copy}} is being used, I assume this is due 
to the provisioning of multiple containers taking too long and the agent will 
not wait. For large images, this method is simply not performant.

The issue did not occur, when only one tasks was scheduled.

Increasing the {{executor_registration_timeout}} parameter, seemed to help a 
bit as it allowed to schedule at least 2 tasks at the same time. But still 
fails with more (5 in this case)



h4. Complete logs

(with GLOG_v=1)

{noformat}
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800375  3738 
slave.cpp:198] Agent started on 1)@172.31.23.17:5051
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800403  3738 
slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos,docker" --default_role="*" 
--disk_watch_interval="1mins" --docker="docker" 
--docker_config="{"auths":{"https:\/\/index.docker.io\/v1\/":{"auth":"dGVycmFsb3VwZTpUYWxFWUFOSXR5","email":"sebastian.ge...@terraloupe.com"}}}"
 --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
--docker_stop_timeout="0ns" --docker_store_dir="/mnt/t" --docker_volume_checkp
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: 
oint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname="ec2-52-59-113-0.eu-central-1.compute.amazonaws.com" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_command_executor="false" --image_providers="docker" 
--image_provisioner_backend="copy" --initialize_driver_logging="true" 
--isolation="filesystem/linux,docker/runtime,cgroups/devices,gpu/nvidia" 
--launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --master="zk://172.31.19.240:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --re
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: covery_timeout="15mins" 
--registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/mnt/mesos"
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: 2016-08-09 
10:11:41,800:3738(0x7f71172a1700):ZOO_INFO@check_events@1728: initiated 
connection to server [172.31.19.240:2181]
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.801007  3738 
slave.cpp:519] Agent resources: gpus(*):1; cpus(*):8; mem(*):14014; 
disk(*):60257; ports(*):[31000-32000]
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.801059  3738 
slave.cpp:527] Agent attributes: [  ]
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.801077  3738 
slave.cpp:532] Agent hostname: 
ec2-52-59-113-0.eu-central-1.compute.amazonaws.com
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.802232  3772 
state.cpp:57] Recovering state from '/mnt/mesos/meta'
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.803539  3774 
status_update_manager.cpp:200] Recovering status update manager
Aug  9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.803745  3772 
containerizer.cpp:522]

[jira] [Updated] (MESOS-6004) Tasks fail when provisioning multiple containers with large docker images using copy backend

2016-08-09 Thread Michael Thomas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Thomas updated MESOS-6004:
--
Description: 
When scheduling more than one task on the same agent, all tasks fail a as 
containers seem to be destroyed during provisioning.

Specifically, the errors on the agent logs are:

{noformat}
 E0808 15:53:09.691315 30996 slave.cpp:3976] Container 
'eb20f642-bb90-4293-8eec-6f1576ccaeb1' for executor '3' of framework 
c9852a23-bc07-422d-8d69-23c167a1924d-0001 failed to start: Container is being 
destroyed during provisioning
{noformat}

and 

{noformat}
I0808 15:52:32.510210 30999 slave.cpp:4539] Terminating executor ''2' of 
framework c9852a23-bc07-422d-8d69-23c167a1924d-0001' because it did not 
register within 3mins
{noformat}

As the default provisioning method {{copy}} is being used, I assume this is due 
to the provisioning of multiple containers taking too long and the agent will 
not wait. For large images, this method is simply not performant.

The issue did not occur, when only one tasks was scheduled.

Increasing the {{executor_registration_timeout}} parameter, seemed to help a 
bit as it allowed to schedule at least 2 tasks at the same time. But still 
fails with more (5 in this case)



h4. Complete logs

(with GLOG_v=0, as with 1 it was too long)

{noformat}
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.661067 30961 
main.cpp:434] Starting Mesos agent
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.661551 30961 
slave.cpp:198] Agent started on 1)@172.31.23.17:5051
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.661578 30961 
slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos,docker" --default_role="*" 
--disk_watch_interval="1mins" --docker="docker" --docker_config="XXX" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
--docker_stop_timeout="0ns" --docker_store_dir="/mnt/mesos/store/docker" --do
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: 
cker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="3mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname="ec2-52-59-113-0.eu-central-1.compute.amazonaws.com" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_command_executor="false" --image_providers="docker" 
--image_provisioner_backend="copy" --initialize_driver_logging="true" 
--isolation="filesystem/linux,docker/runtime,cgroups/devices,gpu/nvidia" 
--launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --master="zk://172.31.19.240:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recov
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: er="reconnect" 
--recovery_timeout="15mins" --registration_backoff_factor="1secs" 
--revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" 
--strict="true" --switch_user="false" --systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/mnt/mesos"
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.662147 30961 
slave.cpp:519] Agent resources: gpus(*):1; cpus(*):8; mem(*):14014; 
disk(*):60257; ports(*):[31000-32000]
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.662211 30961 
slave.cpp:527] Agent attributes: [  ]
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.662230 30961 
slave.cpp:532] Agent hostname: 
ec2-52-59-113-0.eu-central-1.compute.amazonaws.com
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.663354 31000 
state.cpp:57] Recovering state from '/mnt/mesos/meta'
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.663918 30995 
status_update_manager.cpp:200] Recovering status update manager
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.664131 30996 
containerizer.cpp:522] Recovering containerizer
Aug  8 15:48:32 ip-172-31-23-17 mesos-slave[30961]: I0808 15:48:32.664136 31000 
docker.cpp:775] Recovering

[jira] [Commented] (MESOS-6010) Can't pull Docker images when executing mesos-execute

2016-08-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413221#comment-15413221
 ] 

Stéphane Cottin commented on MESOS-6010:


you need to configure credentials, use the --docker_config agent option

> Can't pull Docker images when executing mesos-execute
> -
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6010) Can't pull Docker images when executing mesos-execute

2016-08-09 Thread Sunzhe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413212#comment-15413212
 ] 

Sunzhe commented on MESOS-6010:
---

The Docker works well, I can use {{docker pull IMAGE}}.

> Can't pull Docker images when executing mesos-execute
> -
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6010) Can't pull Docker images when executing mesos-execute

2016-08-09 Thread Sunzhe (JIRA)

Sunzhe created MESOS-6010:
-

 Summary: Can't pull Docker images when executing mesos-execute
 Key: MESOS-6010
 URL: https://issues.apache.org/jira/browse/MESOS-6010
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker
Affects Versions: 1.0.0
Reporter: Sunzhe


The {{mesos-agent}} flags:
{code}
 GLOG_v=1 ./bin/mesos-agent.sh \
  --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
  --ip=10.100.3.3  \
  --work_dir=${MESOS_WORK_DIR} \
  
--isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux \
  --enforce_container_disk_quota \
  --containerizers=mesos \
  --image_providers=docker \
  --executor_environment_variables="{}"
{code}
And the {{mesos-execute}} flags:
{code}
 ./src/mesos-execute \
   --master=${MESOS_MASTER_IP}:5050 \
   --name=${INSTANCE_NAME} \
   --docker_image=${DOCKER_IMAGE} \
   --framework_capabilities=GPU_RESOURCES \
   --shell=false
{code}
But when {{./src/mesos-execute}}, the errors like below:
{code}
I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
master@10.103.0.125:5050
Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
Received status update TASK_FAILED for task 'test'
  message: 'Failed to launch container: Failed to decode HTTP responses: No 
response decoded
HTTP/1.1 200 Connection established

HTTP/1.1 401 Unauthorized
Content-Type: application/json; charset=utf-8
Docker-Distribution-Api-Version: registry/2.0
Www-Authenticate: Bearer 
realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
Date: Tue, 09 Aug 2016 08:10:32 GMT
Content-Length: 145
Strict-Transport-Security: max-age=31536000

{"errors":[{"code":"UNAUTHORIZED","message":"authentication 
required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

59 matches

Mail list logo