[jira] [Updated] (MESOS-7652) Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.

2017-08-11 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-7652:

Priority: Blocker  (was: Critical)

> Docker image with universal containerizer does not work if WORKDIR is missing 
> in the rootfs.
> 
>
> Key: MESOS-7652
> URL: https://issues.apache.org/jira/browse/MESOS-7652
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.1
>Reporter: michael beisiegel
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: mesosphere
>
> hello,
> used the following docker image recently
> quay.io/spinnaker/front50:master
> https://quay.io/repository/spinnaker/front50
> Here the link to the Dockerfile
> https://github.com/spinnaker/front50/blob/master/Dockerfile
> and here the source
> {color:blue}FROM java:8
> MAINTAINER delivery-engineer...@netflix.com
> COPY . workdir/
> WORKDIR workdir
> RUN GRADLE_USER_HOME=cache ./gradlew buildDeb -x test && \
>   dpkg -i ./front50-web/build/distributions/*.deb && \
>   cd .. && \
>   rm -rf workdir
> CMD ["/opt/front50/bin/front50"]{color}
> The image works fine with the docker containerizer, but the universal 
> containerizer shows the following in stderr.
> "Failed to chdir into current working directory '/workdir': No such file or 
> directory"
> The problem comes from the fact that the Dockerfile creates a workdir but 
> then later removes the created dir as part of a RUN. The docker containerizer 
> has no problem with it if you do
> docker run -ti --rm quay.io/spinnaker/front50:master bash
> you get into the working dir, but the universal containerizer fails with the 
> error.
> thanks for your help,
> Michael



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6984) Pull out the docker image build step out of `support/docker-build.sh`.

2017-08-11 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-6984:
--
Target Version/s:   (was: 1.4.0)

> Pull out the docker image build step out of `support/docker-build.sh`.
> --
>
> Key: MESOS-6984
> URL: https://issues.apache.org/jira/browse/MESOS-6984
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>
> The {{support/docker-build.sh}} script currently writes a {{Dockerfile}} and 
> performs a docker build, runs the image then deletes the image.
> The docker build step is quite expensive, and are often flaky. We should 
> simply pull a docker image from Dockerhub so that we can make our CI more 
> stable and efficient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-11 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123532#comment-16123532
 ] 

Kapil Arya commented on MESOS-7744:
---

[~bmahler]: Retargeting it to 1.5.0. Please revert if you see fit.

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7652) Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.

2017-08-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123533#comment-16123533
 ] 

Jie Yu commented on MESOS-7652:
---

[~karya] Re-targeted this to 1.4. IMO, this is a blocker for 1.4. cc [~gilbert]

> Docker image with universal containerizer does not work if WORKDIR is missing 
> in the rootfs.
> 
>
> Key: MESOS-7652
> URL: https://issues.apache.org/jira/browse/MESOS-7652
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.1
>Reporter: michael beisiegel
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
>
> hello,
> used the following docker image recently
> quay.io/spinnaker/front50:master
> https://quay.io/repository/spinnaker/front50
> Here the link to the Dockerfile
> https://github.com/spinnaker/front50/blob/master/Dockerfile
> and here the source
> {color:blue}FROM java:8
> MAINTAINER delivery-engineer...@netflix.com
> COPY . workdir/
> WORKDIR workdir
> RUN GRADLE_USER_HOME=cache ./gradlew buildDeb -x test && \
>   dpkg -i ./front50-web/build/distributions/*.deb && \
>   cd .. && \
>   rm -rf workdir
> CMD ["/opt/front50/bin/front50"]{color}
> The image works fine with the docker containerizer, but the universal 
> containerizer shows the following in stderr.
> "Failed to chdir into current working directory '/workdir': No such file or 
> directory"
> The problem comes from the fact that the Dockerfile creates a workdir but 
> then later removes the created dir as part of a RUN. The docker containerizer 
> has no problem with it if you do
> docker run -ti --rm quay.io/spinnaker/front50:master bash
> you get into the working dir, but the universal containerizer fails with the 
> error.
> thanks for your help,
> Michael



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7883) Quota heuristic check not accounting for mount volumes

2017-08-11 Thread Vincent Roy (JIRA)
Vincent Roy created MESOS-7883:
--

 Summary: Quota heuristic check not accounting for mount volumes
 Key: MESOS-7883
 URL: https://issues.apache.org/jira/browse/MESOS-7883
 Project: Mesos
  Issue Type: Bug
Reporter: Vincent Roy


This may be expected but came as a surprise to us. We are unable to create a 
quota bigger than the root disk space on slaves.

Given two clusters with the same number of slaves and root disk size, but one 
that also has mount volumes, is what the disk resources look like:

{noformat}
[root@fin-fang-foom-master-1 ~]# curl -s master.mesos:5050/state | jq 
'.slaves[] .resources .disk'
28698
28699
28698
28698
28697
{noformat}

{noformat}
[root@hydra-master-1 ~]# curl -s master.mesos:5050/state | jq '.slaves[] 
.resources .disk'
50817
50817
50814
50819
50817
{noformat}

In {{fin-fang-foom}}, I was able to create a quota for {{143490mb}} which is 
the total of available disk resources, root in this case, as reported by Mesos. 
For {{hydra}}, I am only able to create a quota for {{143489mb}}. This is 
equivalent to the total of root disks available in {{hydra}} rather than the 
total available disks reported by Mesos resources which is {{254084mb}}.

With a modified Mesos that adds logging to {{quota_handler}}, we can see that 
only the {{disk(*)}} number increases in {{nonStaticClusterResources}} after 
every iteration. The final iteration is {{disk(*):143489}} which is the maximum 
quota I was able to create on {{hydra}}. We expected that quota heuristic check 
would also include resources such as {{disk(*)[MOUNT:/dcos/volume2]:7373}}

{noformat}
Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.763764 24902 
quota_handler.cpp:71] Performing capacity heuristic check for a set quota 
request
Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.763783 24902 
quota_handler.cpp:87] heuristic: total quota 'disk(*):143489'

Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.763870 24902 
quota_handler.cpp:111] heuristic: nonStaticAgentResources = 
'ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373; disk(*):28698; cpus(*):4; mem(*):15023'
Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.763923 24902 
quota_handler.cpp:113] heuristic: nonStaticClusterResources = 
'ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373; disk(*):28698; cpus(*):4; mem(*):15023'


Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.763989 24902 
quota_handler.cpp:111] heuristic: nonStaticAgentResources = 
'ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373; disk(*):28698; cpus(*):4; mem(*):15023'
Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.764022 24902 
quota_handler.cpp:113] heuristic: nonStaticClusterResources = 
'ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373; disk(*):57396; cpus(*):8; mem(*):30046; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373'

Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.764077 24902 
quota_handler.cpp:111] heuristic: nonStaticAgentResources = 
'ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373; disk(*):28695; cpus(*):4; mem(*):15023'
Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.764119 24902 
quota_handler.cpp:113] heuristic: nonStaticClusterResources = 
'ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373; disk(*):86091; cpus(*):12; mem(*):45069; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373; disk(*)[MOUNT:/dcos/volume0]:7373; 
disk(*)[MOUNT:/dcos/volume1]:7373; disk(*)[MOUNT:/dcos/volume2]:7373'

Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.764225 24902 
quota_handler.cpp:111] heuristic: nonStaticAgentResources = 
'ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; 
disk(*)[MOUNT:/dcos/volume0]:7373; disk(*)[MOUNT:/dcos/volume1]:7373; 
disk(*)[MOUNT:/dcos/volume2]:7373; disk(*):28700; cpus(*):4; mem(*):15023'
Aug 11 12:54:18 hydra-master-1 mesos-master[24896]: I0811 12:54:18.764307 24902 

[jira] [Updated] (MESOS-7652) Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.

2017-08-11 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7652:
--
Target Version/s: 1.2.3, 1.3.2, 1.5.0  (was: 1.2.3, 1.3.2, 1.4.0)

> Docker image with universal containerizer does not work if WORKDIR is missing 
> in the rootfs.
> 
>
> Key: MESOS-7652
> URL: https://issues.apache.org/jira/browse/MESOS-7652
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.1
>Reporter: michael beisiegel
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
>
> hello,
> used the following docker image recently
> quay.io/spinnaker/front50:master
> https://quay.io/repository/spinnaker/front50
> Here the link to the Dockerfile
> https://github.com/spinnaker/front50/blob/master/Dockerfile
> and here the source
> {color:blue}FROM java:8
> MAINTAINER delivery-engineer...@netflix.com
> COPY . workdir/
> WORKDIR workdir
> RUN GRADLE_USER_HOME=cache ./gradlew buildDeb -x test && \
>   dpkg -i ./front50-web/build/distributions/*.deb && \
>   cd .. && \
>   rm -rf workdir
> CMD ["/opt/front50/bin/front50"]{color}
> The image works fine with the docker containerizer, but the universal 
> containerizer shows the following in stderr.
> "Failed to chdir into current working directory '/workdir': No such file or 
> directory"
> The problem comes from the fact that the Dockerfile creates a workdir but 
> then later removes the created dir as part of a RUN. The docker containerizer 
> has no problem with it if you do
> docker run -ti --rm quay.io/spinnaker/front50:master bash
> you get into the working dir, but the universal containerizer fails with the 
> error.
> thanks for your help,
> Michael



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7652) Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.

2017-08-11 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123531#comment-16123531
 ] 

Kapil Arya commented on MESOS-7652:
---

[~gilbert]: Retargeting it to 1.5.0. Please revert if you see fit.

> Docker image with universal containerizer does not work if WORKDIR is missing 
> in the rootfs.
> 
>
> Key: MESOS-7652
> URL: https://issues.apache.org/jira/browse/MESOS-7652
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.1
>Reporter: michael beisiegel
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
>
> hello,
> used the following docker image recently
> quay.io/spinnaker/front50:master
> https://quay.io/repository/spinnaker/front50
> Here the link to the Dockerfile
> https://github.com/spinnaker/front50/blob/master/Dockerfile
> and here the source
> {color:blue}FROM java:8
> MAINTAINER delivery-engineer...@netflix.com
> COPY . workdir/
> WORKDIR workdir
> RUN GRADLE_USER_HOME=cache ./gradlew buildDeb -x test && \
>   dpkg -i ./front50-web/build/distributions/*.deb && \
>   cd .. && \
>   rm -rf workdir
> CMD ["/opt/front50/bin/front50"]{color}
> The image works fine with the docker containerizer, but the universal 
> containerizer shows the following in stderr.
> "Failed to chdir into current working directory '/workdir': No such file or 
> directory"
> The problem comes from the fact that the Dockerfile creates a workdir but 
> then later removes the created dir as part of a RUN. The docker containerizer 
> has no problem with it if you do
> docker run -ti --rm quay.io/spinnaker/front50:master bash
> you get into the working dir, but the universal containerizer fails with the 
> error.
> thanks for your help,
> Michael



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-3384) Include libsasl in Windows CMake build

2017-08-11 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-3384:
---

Assignee: John Kordich  (was: Andrew Schwartzmeyer)

> Include libsasl in Windows CMake build
> --
>
> Key: MESOS-3384
> URL: https://issues.apache.org/jira/browse/MESOS-3384
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: John Kordich
>  Labels: build, cmake, mesosphere
>
> Windows will probably require libsasl to work. This means we need to insert 
> the code to retrieve, build, and link against it for the Windows path, since 
> it isn't rebundled and distributed as part of Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically

2017-08-11 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7643:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> The order of isolators provided in '--isolation' flag is not preserved and 
> instead sorted alphabetically
> 
>
> Key: MESOS-7643
> URL: https://issues.apache.org/jira/browse/MESOS-7643
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.2, 1.2.0, 1.3.0
>Reporter: Michael Cherny
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: isolation
>
> According to documentation and comments in code the order of the entries in 
> the --isolation flag should specify the ordering of the isolators. 
> Specifically, the
> `create` and `prepare` calls for each isolator should run serially in the 
> order in which they appear in the --isolation flag, while the `cleanup` call 
> should be serialized in reverse order (with exception of filesystem isolator 
> which is always first).
> But in fact, the isolators provided in '--isolation' flag are sorted 
> alphabetically.
> That happens in [this line of 
> code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377].
>  In this line use of 'set' is done (apparently instead of list or 
> vector) and set is a sorted container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically

2017-08-11 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123529#comment-16123529
 ] 

Kapil Arya commented on MESOS-7643:
---

[~gilbert]: I have retargeted it to 1.5.0. Please update if you still want to 
land it in 1.4.0.

> The order of isolators provided in '--isolation' flag is not preserved and 
> instead sorted alphabetically
> 
>
> Key: MESOS-7643
> URL: https://issues.apache.org/jira/browse/MESOS-7643
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.2, 1.2.0, 1.3.0
>Reporter: Michael Cherny
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: isolation
>
> According to documentation and comments in code the order of the entries in 
> the --isolation flag should specify the ordering of the isolators. 
> Specifically, the
> `create` and `prepare` calls for each isolator should run serially in the 
> order in which they appear in the --isolation flag, while the `cleanup` call 
> should be serialized in reverse order (with exception of filesystem isolator 
> which is always first).
> But in fact, the isolators provided in '--isolation' flag are sorted 
> alphabetically.
> That happens in [this line of 
> code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377].
>  In this line use of 'set' is done (apparently instead of list or 
> vector) and set is a sorted container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7691) Support local enabled cgroups subsystems automatically.

2017-08-11 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123547#comment-16123547
 ] 

Kapil Arya commented on MESOS-7691:
---

[~gilbert] Retargeted to 1.5.0. Please revert as you see fit.

> Support local enabled cgroups subsystems automatically.
> ---
>
> Key: MESOS-7691
> URL: https://issues.apache.org/jira/browse/MESOS-7691
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: cgroups
>
> Currently, each cgroup subsystem needs to be turned on as an isolator, e.g., 
> "cgroups/blkio". Ideally, mesos should be able to detect all local enabled 
> cgroup subsystems and turn them on automatically (or we call it auto cgroups).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5656) Incomplete modelling of 3rdparty dependencies in cmake build

2017-08-11 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123810#comment-16123810
 ] 

Andrew Schwartzmeyer commented on MESOS-5656:
-

I have this working in my current set of CMake patches.

> Incomplete modelling of 3rdparty dependencies in cmake build
> 
>
> Key: MESOS-5656
> URL: https://issues.apache.org/jira/browse/MESOS-5656
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Affects Versions: 1.0.0
>Reporter: Benjamin Bannier
>Assignee: Andrew Schwartzmeyer
>  Labels: mesosphere
>
> The cmake build incompletely models dependencies on 3rdparty components. This 
> leads to incomplete and unusable build files generated for e.g., ninja.
> When generating a build file for ninja the build fails to start
> {code}
> % ninja
> ninja: error: 
> '3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/lib/libzookeeper_mt.a', 
> needed by 'src/slave/mesos-agent', missing and no known rule to make it
> {code}
> An identical problem exists with leveldb (apparent after working around the 
> zookeeper dep issue)
> {code}
> % ninja
> ninja: error: '3rdparty/leveldb-1.4/src/leveldb-1.4/libleveldb.a', needed by 
> 'src/slave/mesos-agent', missing and no known rule to make it
> {code}
> The problem here is that a number of targets depend on library files produced 
> implicitly via some {{ExternalProject}} library (via {{AGENT_LIBS}}), but we 
> fail to declare rules for these targets (I could imagine: via e.g., 
> {{add_library}} with {{IMPORTED}}). This appears to be no problem for build 
> files generate for {{make}} as it doesn't require rules for all dependency 
> nodes.
> It appears that one should be able to use {{ExternalProject_Add}}'s 
> {{BUILD_BYPRODUCTS}}, e.g.,
> {code}
> ExternalProject_Add(
>${ZOOKEEPER_TARGET}
>BUILD_BYPRODUCTS  ${ZOOKEEPER_LIB}/lib/libzookeeper_mt.a
>PREFIX${ZOOKEEPER_CMAKE_ROOT}
>PATCH_COMMAND ${ZOOKEEPER_PATCH_CMD}
>CONFIGURE_COMMAND ${ZOOKEEPER_CONFIG_CMD}
>...
> {code}
> to declare rules for these files, but this was only added in cmake-3.2 while 
> we at least formally require only cmake-2.8,
> {code}
> cmake_minimum_required(VERSION 2.8)
> {code}
> {{git bisect}} points to the recent 
> {{6e199cc255cbf561fac575568b0594ac2b2c14f9}} for surfacing this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7404) Ensure hierarchical roles work with old Mesos agents

2017-08-11 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123549#comment-16123549
 ] 

Kapil Arya commented on MESOS-7404:
---

Retargeted to 1.5.0. Please revert if it needs to lands in 1.4.0.

> Ensure hierarchical roles work with old Mesos agents
> 
>
> Key: MESOS-7404
> URL: https://issues.apache.org/jira/browse/MESOS-7404
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> If the Mesos master supports hierarchical roles but the agent does not, we 
> need to ensure that we avoid putting the agent into a bad state, e.g., if the 
> user creates a persistent volume.
> One approach is to use an agent capability for hierarchical roles, and 
> disallow creating persistent-volumes using a hierarchical role if the agent 
> doesn't have the capability. We could also use an agent version check, 
> although until MESOS-6975 is implemented, that will be a bit awkward.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7776) Document `MESOS_CONTAINER_IP`

2017-08-11 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123548#comment-16123548
 ] 

Kapil Arya commented on MESOS-7776:
---

[~avinash.mesos]: Retargeted to 1.5.0. Please revert as you see fit.

> Document `MESOS_CONTAINER_IP` 
> --
>
> Key: MESOS-7776
> URL: https://issues.apache.org/jira/browse/MESOS-7776
> Project: Mesos
>  Issue Type: Documentation
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>
> We introduced `MESOS_CONTAINER_IP` to inform tasks launched by the 
> default-executor to inform the tasks about their container IP. This was done 
> primarily to break the dependency of the containers on `LIBPROCESS_IP` to 
> learn their IP addresses which was misleading. 
> This change need to be documented.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7404) Ensure hierarchical roles work with old Mesos agents

2017-08-11 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7404:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Ensure hierarchical roles work with old Mesos agents
> 
>
> Key: MESOS-7404
> URL: https://issues.apache.org/jira/browse/MESOS-7404
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> If the Mesos master supports hierarchical roles but the agent does not, we 
> need to ensure that we avoid putting the agent into a bad state, e.g., if the 
> user creates a persistent volume.
> One approach is to use an agent capability for hierarchical roles, and 
> disallow creating persistent-volumes using a hierarchical role if the agent 
> doesn't have the capability. We could also use an agent version check, 
> although until MESOS-6975 is implemented, that will be a bit awkward.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7776) Document `MESOS_CONTAINER_IP`

2017-08-11 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7776:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Document `MESOS_CONTAINER_IP` 
> --
>
> Key: MESOS-7776
> URL: https://issues.apache.org/jira/browse/MESOS-7776
> Project: Mesos
>  Issue Type: Documentation
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>
> We introduced `MESOS_CONTAINER_IP` to inform tasks launched by the 
> default-executor to inform the tasks about their container IP. This was done 
> primarily to break the dependency of the containers on `LIBPROCESS_IP` to 
> learn their IP addresses which was misleading. 
> This change need to be documented.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3110) Harden the CMake system-dependency-locating routines

2017-08-11 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123813#comment-16123813
 ] 

Andrew Schwartzmeyer commented on MESOS-3110:
-

If I'm reading this right, it means using `find_library` to properly locate and 
import system dependencies. This is done in my current CMake patches, with the 
exception of SASL2, a TODO I need to resolve.

> Harden the CMake system-dependency-locating routines
> 
>
> Key: MESOS-3110
> URL: https://issues.apache.org/jira/browse/MESOS-3110
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: build, cmake
>
> Currently the Mesos project has two flavors of dependency: (1) the 
> dependencies we expect are already on the system (_e.g._, apr, libsvn), and 
> (2) the dependencies that are historically bundled with Mesos (_e.g._, glog).
> Dependency type (1) requires solid modules that will locate them on any 
> system: Linux, BSD, or Windows. This would come for free if we were using 
> CMake 3.0, but we're using CMake 2.8 so that Ubuntu users can install it out 
> of the box, instead of upgrading CMake first.
> This is additionally useful for dependency type (2), where we will expect to 
> have to use these routines when we support both the rebundled dependencies in 
> the `3rdparty/` folder, and system installations of those dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7652) Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.

2017-08-11 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7652:
--
Target Version/s: 1.2.3, 1.3.2, 1.4.0  (was: 1.2.3, 1.3.2, 1.5.0)

> Docker image with universal containerizer does not work if WORKDIR is missing 
> in the rootfs.
> 
>
> Key: MESOS-7652
> URL: https://issues.apache.org/jira/browse/MESOS-7652
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.1
>Reporter: michael beisiegel
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
>
> hello,
> used the following docker image recently
> quay.io/spinnaker/front50:master
> https://quay.io/repository/spinnaker/front50
> Here the link to the Dockerfile
> https://github.com/spinnaker/front50/blob/master/Dockerfile
> and here the source
> {color:blue}FROM java:8
> MAINTAINER delivery-engineer...@netflix.com
> COPY . workdir/
> WORKDIR workdir
> RUN GRADLE_USER_HOME=cache ./gradlew buildDeb -x test && \
>   dpkg -i ./front50-web/build/distributions/*.deb && \
>   cd .. && \
>   rm -rf workdir
> CMD ["/opt/front50/bin/front50"]{color}
> The image works fine with the docker containerizer, but the universal 
> containerizer shows the following in stderr.
> "Failed to chdir into current working directory '/workdir': No such file or 
> directory"
> The problem comes from the fact that the Dockerfile creates a workdir but 
> then later removes the created dir as part of a RUN. The docker containerizer 
> has no problem with it if you do
> docker run -ti --rm quay.io/spinnaker/front50:master bash
> you get into the working dir, but the universal containerizer fails with the 
> error.
> thanks for your help,
> Michael



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-11 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7744:
--
Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.5.0  (was: 1.1.3, 1.2.3, 1.3.2, 
1.4.0)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically

2017-08-11 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7643:
--
Target Version/s: 1.4.0  (was: 1.5.0)

> The order of isolators provided in '--isolation' flag is not preserved and 
> instead sorted alphabetically
> 
>
> Key: MESOS-7643
> URL: https://issues.apache.org/jira/browse/MESOS-7643
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.2, 1.2.0, 1.3.0
>Reporter: Michael Cherny
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: isolation
>
> According to documentation and comments in code the order of the entries in 
> the --isolation flag should specify the ordering of the isolators. 
> Specifically, the
> `create` and `prepare` calls for each isolator should run serially in the 
> order in which they appear in the --isolation flag, while the `cleanup` call 
> should be serialized in reverse order (with exception of filesystem isolator 
> which is always first).
> But in fact, the isolators provided in '--isolation' flag are sorted 
> alphabetically.
> That happens in [this line of 
> code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377].
>  In this line use of 'set' is done (apparently instead of list or 
> vector) and set is a sorted container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically

2017-08-11 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7643:
--
Priority: Blocker  (was: Critical)

> The order of isolators provided in '--isolation' flag is not preserved and 
> instead sorted alphabetically
> 
>
> Key: MESOS-7643
> URL: https://issues.apache.org/jira/browse/MESOS-7643
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.2, 1.2.0, 1.3.0
>Reporter: Michael Cherny
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: isolation
>
> According to documentation and comments in code the order of the entries in 
> the --isolation flag should specify the ordering of the isolators. 
> Specifically, the
> `create` and `prepare` calls for each isolator should run serially in the 
> order in which they appear in the --isolation flag, while the `cleanup` call 
> should be serialized in reverse order (with exception of filesystem isolator 
> which is always first).
> But in fact, the isolators provided in '--isolation' flag are sorted 
> alphabetically.
> That happens in [this line of 
> code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377].
>  In this line use of 'set' is done (apparently instead of list or 
> vector) and set is a sorted container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7652) Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.

2017-08-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7652:
--
Priority: Critical  (was: Blocker)

> Docker image with universal containerizer does not work if WORKDIR is missing 
> in the rootfs.
> 
>
> Key: MESOS-7652
> URL: https://issues.apache.org/jira/browse/MESOS-7652
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.1
>Reporter: michael beisiegel
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: mesosphere
>
> hello,
> used the following docker image recently
> quay.io/spinnaker/front50:master
> https://quay.io/repository/spinnaker/front50
> Here the link to the Dockerfile
> https://github.com/spinnaker/front50/blob/master/Dockerfile
> and here the source
> {color:blue}FROM java:8
> MAINTAINER delivery-engineer...@netflix.com
> COPY . workdir/
> WORKDIR workdir
> RUN GRADLE_USER_HOME=cache ./gradlew buildDeb -x test && \
>   dpkg -i ./front50-web/build/distributions/*.deb && \
>   cd .. && \
>   rm -rf workdir
> CMD ["/opt/front50/bin/front50"]{color}
> The image works fine with the docker containerizer, but the universal 
> containerizer shows the following in stderr.
> "Failed to chdir into current working directory '/workdir': No such file or 
> directory"
> The problem comes from the fact that the Dockerfile creates a workdir but 
> then later removes the created dir as part of a RUN. The docker containerizer 
> has no problem with it if you do
> docker run -ti --rm quay.io/spinnaker/front50:master bash
> you get into the working dir, but the universal containerizer fails with the 
> error.
> thanks for your help,
> Michael



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically

2017-08-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7643:
--
Priority: Critical  (was: Blocker)

> The order of isolators provided in '--isolation' flag is not preserved and 
> instead sorted alphabetically
> 
>
> Key: MESOS-7643
> URL: https://issues.apache.org/jira/browse/MESOS-7643
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.2, 1.2.0, 1.3.0
>Reporter: Michael Cherny
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: isolation
>
> According to documentation and comments in code the order of the entries in 
> the --isolation flag should specify the ordering of the isolators. 
> Specifically, the
> `create` and `prepare` calls for each isolator should run serially in the 
> order in which they appear in the --isolation flag, while the `cleanup` call 
> should be serialized in reverse order (with exception of filesystem isolator 
> which is always first).
> But in fact, the isolators provided in '--isolation' flag are sorted 
> alphabetically.
> That happens in [this line of 
> code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377].
>  In this line use of 'set' is done (apparently instead of list or 
> vector) and set is a sorted container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7882) Mesos master rescinds all the in-flight offers from all the registered agents when a new maintenance schedule is posted for a subset of slaves

2017-08-11 Thread Sagar Sadashiv Patwardhan (JIRA)
Sagar Sadashiv Patwardhan created MESOS-7882:


 Summary: Mesos master rescinds all the in-flight offers from all 
the registered agents when a new maintenance schedule is posted for a subset of 
slaves
 Key: MESOS-7882
 URL: https://issues.apache.org/jira/browse/MESOS-7882
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 1.3.0
 Environment: Ubuntu 14:04(trusty)
Mesos master branch.
SHA: a31dd52ab71d2a529b55cd9111ec54acf7550ded
Reporter: Sagar Sadashiv Patwardhan
Priority: Minor


We are running mesos 1.1.0 in production. We use a custom autoscaler for 
scaling our mesos  cluster up and down. While scaling down the cluster, 
autoscaler makes a POST request to mesos master /maintenance/schedule endpoint 
with a set of slaves to move to maintenance mode. This forces mesos master to 
rescind all the in-flight offers from *all the slaves* in the cluster. If our 
scheduler accepts one of these offers, then we get a TASK_LOST status update 
back for that task. We also see such 
(https://gist.github.com/sagar8192/8858e7cb59a23e8e1762a27571824118) log lines 
in mesos master logs.

After reading the code(refs: 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L6772), it 
appears that offers are getting rescinded for all the slaves. I am not sure 
what is the expected behavior here, but it makes more sense if only resources 
from slaves marked for maintenance are reclaimed.

Experiment:
To verify that it is actually happening, I checked out the master branch(sha: 
a31dd52ab71d2a529b55cd9111ec54acf7550ded ) and added some log 
lines(https://gist.github.com/sagar8192/42ca055720549c5ff3067b1e6c7c68b3). 
Built the binary and started a mesos master and 2 agent processes. Used a basic 
python framework that launches docker containers on these slaves. Verified that 
there is no existing schedule for any slaves using `curl 
10.40.19.239:5050/maintenance/status`. Posted maintenance schedule for one of 
the slaves(https://gist.github.com/sagar8192/fb65170240dd32a53f27e6985c549df0) 
after starting the mesos framework.

Logs:
mesos-master: https://gist.github.com/sagar8192/91888419fdf8284e33ebd58351131203
mesos-slave1: https://gist.github.com/sagar8192/3a83364b1f5ffc63902a80c728647f31
mesos-slave2: https://gist.github.com/sagar8192/1b341ef2271dde11d276974a27109426
Mesos framework: 
https://gist.github.com/sagar8192/bcd4b37dba03bde0a942b5b972004e8a

I think mesos should rescind offers and inverse offers only for those slaves 
that are marked for maintenance(draining mode).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7882) Mesos master rescinds all the in-flight offers from all the registered agents when a new maintenance schedule is posted for a subset of slaves

2017-08-11 Thread Sagar Sadashiv Patwardhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sadashiv Patwardhan updated MESOS-7882:
-
Description: 
We are running mesos 1.1.0 in production. We use a custom autoscaler for 
scaling our mesos  cluster up and down. While scaling down the cluster, 
autoscaler makes a POST request to mesos master /maintenance/schedule endpoint 
with a set of slaves to move to maintenance mode. This forces mesos master to 
rescind all the in-flight offers from *all the slaves* in the cluster. If our 
scheduler accepts one of these offers, then we get a TASK_LOST status update 
back for that task. We also see such 
(https://gist.github.com/sagar8192/8858e7cb59a23e8e1762a27571824118) log lines 
in mesos master logs.

After reading the code(refs: 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L6772), it 
appears that offers are getting rescinded for all the slaves. I am not sure 
what is the expected behavior here, but it makes more sense if only resources 
from slaves marked for maintenance are reclaimed.

*Experiment:*
To verify that it is actually happening, I checked out the master branch(sha: 
a31dd52ab71d2a529b55cd9111ec54acf7550ded ) and added some log 
lines(https://gist.github.com/sagar8192/42ca055720549c5ff3067b1e6c7c68b3). 
Built the binary and started a mesos master and 2 agent processes. Used a basic 
python framework that launches docker containers on these slaves. Verified that 
there is no existing schedule for any slaves using `curl 
10.40.19.239:5050/maintenance/status`. Posted maintenance schedule for one of 
the slaves(https://gist.github.com/sagar8192/fb65170240dd32a53f27e6985c549df0) 
after starting the mesos framework.

*Logs:*
mesos-master: https://gist.github.com/sagar8192/91888419fdf8284e33ebd58351131203
mesos-slave1: https://gist.github.com/sagar8192/3a83364b1f5ffc63902a80c728647f31
mesos-slave2: https://gist.github.com/sagar8192/1b341ef2271dde11d276974a27109426
Mesos framework: 
https://gist.github.com/sagar8192/bcd4b37dba03bde0a942b5b972004e8a

I think mesos should rescind offers and inverse offers only for those slaves 
that are marked for maintenance(draining mode).

  was:
We are running mesos 1.1.0 in production. We use a custom autoscaler for 
scaling our mesos  cluster up and down. While scaling down the cluster, 
autoscaler makes a POST request to mesos master /maintenance/schedule endpoint 
with a set of slaves to move to maintenance mode. This forces mesos master to 
rescind all the in-flight offers from *all the slaves* in the cluster. If our 
scheduler accepts one of these offers, then we get a TASK_LOST status update 
back for that task. We also see such 
(https://gist.github.com/sagar8192/8858e7cb59a23e8e1762a27571824118) log lines 
in mesos master logs.

After reading the code(refs: 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L6772), it 
appears that offers are getting rescinded for all the slaves. I am not sure 
what is the expected behavior here, but it makes more sense if only resources 
from slaves marked for maintenance are reclaimed.

Experiment:
To verify that it is actually happening, I checked out the master branch(sha: 
a31dd52ab71d2a529b55cd9111ec54acf7550ded ) and added some log 
lines(https://gist.github.com/sagar8192/42ca055720549c5ff3067b1e6c7c68b3). 
Built the binary and started a mesos master and 2 agent processes. Used a basic 
python framework that launches docker containers on these slaves. Verified that 
there is no existing schedule for any slaves using `curl 
10.40.19.239:5050/maintenance/status`. Posted maintenance schedule for one of 
the slaves(https://gist.github.com/sagar8192/fb65170240dd32a53f27e6985c549df0) 
after starting the mesos framework.

Logs:
mesos-master: https://gist.github.com/sagar8192/91888419fdf8284e33ebd58351131203
mesos-slave1: https://gist.github.com/sagar8192/3a83364b1f5ffc63902a80c728647f31
mesos-slave2: https://gist.github.com/sagar8192/1b341ef2271dde11d276974a27109426
Mesos framework: 
https://gist.github.com/sagar8192/bcd4b37dba03bde0a942b5b972004e8a

I think mesos should rescind offers and inverse offers only for those slaves 
that are marked for maintenance(draining mode).


> Mesos master rescinds all the in-flight offers from all the registered agents 
> when a new maintenance schedule is posted for a subset of slaves
> --
>
> Key: MESOS-7882
> URL: https://issues.apache.org/jira/browse/MESOS-7882
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.3.0
> Environment: Ubuntu 14:04(trusty)
> Mesos master branch.
> SHA: a31dd52ab71d2a529b55cd9111ec54acf7550ded
>Reporter: Sagar Sadashiv Patwardhan
>

[jira] [Commented] (MESOS-5078) Document TaskStatus reasons

2017-08-11 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123313#comment-16123313
 ] 

Benno Evers commented on MESOS-5078:


Review: https://reviews.apache.org/r/61495/

> Document TaskStatus reasons
> ---
>
> Key: MESOS-5078
> URL: https://issues.apache.org/jira/browse/MESOS-5078
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Greg Mann
>Assignee: Benno Evers
>  Labels: documentation, mesosphere, newbie++
>
> We should document the possible {{reason}} values that can be found in the 
> {{TaskStatus}} message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3576) Audit CMake linking flags

2017-08-11 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123976#comment-16123976
 ] 

Andrew Schwartzmeyer commented on MESOS-3576:
-

Moreover, that's with `BUILD_SHARED_LIBS=ON`, and when set to `OFF`, the 
(bundled) libraries are linked to statically as expected:

{noformat}
/usr/lib64/libapr-1.so /usr/lib64/libcurl.so 
3rdparty/glog-0.3.3/src/glog-0.3.3-build/lib/libglog.a 
3rdparty/protobuf-3.3.0/src/protobuf-3.3.0-build/libprotobuf.a -lpthread 
/usr/lib64/libz.so -lrt -ldl /lib64/libsvn_delta-1.so /lib64/libsvn_diff-1.so 
/lib64/libsvn_subr-1.so 
3rdparty/googletest-1.8.0/src/googletest-1.8.0-build/googlemock/libgmock.a 
3rdparty/googletest-1.8.0/src/googletest-1.8.0-build/googlemock/gtest/libgtest.a
{noformat}

> Audit CMake linking flags
> -
>
> Key: MESOS-3576
> URL: https://issues.apache.org/jira/browse/MESOS-3576
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: build, mesosphere
>
> If you look at the linking flags for autoconf's stout tests build:
> ```
> ./.libs/libgmock.a glog-0.3.3/.libs/libglog.a -lgflags 
> protobuf-2.5.0/src/.libs/libprotobuf.a -lpthread -ldl -lz 
> /usr/lib/x86_64-linux-gnu/libcurl-nss.so 
> /usr/lib/x86_64-linux-gnu/libsvn_delta-1.so 
> /usr/lib/x86_64-linux-gnu/libsvn_subr-1.so 
> /usr/lib/x86_64-linux-gnu/libapr-1.so -lrt -pthread
> ```
> you'll notice that they are much more concise than our CMake build:
> ```
> -L/usr/lib/x86_64-linux-gnu/libapr-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_client-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_delta-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_diff-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_repos-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_subr-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_wc-1.so  
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-lib/lib
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-build/gtest/lib/.libs
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protobuf-2.5.0-lib/lib/lib
>  -rdynamic -lpthread -lgmock -lsvn_client-1 -lsvn_delta-1 -lsvn_diff-1 
> -lsvn_fs-1 -Wl,-Bstatic -lsvn_fs_fs-1 -lsvn_fs_util-1 -Wl,-Bdynamic 
> -lsvn_ra-1 -Wl,-Bstatic -lsvn_ra_local-1 -lsvn_ra_serf-1 -lsvn_ra_svn-1 
> -Wl,-Bdynamic -lsvn_repos-1 -lsvn_subr-1 -lsvn_wc-1 -lglog -lprotobuf -lgtest 
> -ldl -lapr-1 -lrt 
> -Wl,-rpath,/usr/lib/x86_64-linux-gnu/libapr-1.so:/usr/lib/x86_64-linux-gnu/libsvn_client-1.so:/usr/lib/x86_64-linux-gnu/libsvn_delta-1.so:/usr/lib/x86_64-linux-gnu/libsvn_diff-1.so:/usr/lib/x86_64-linux-gnu/libsvn_fs-1.so:/usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.a:/usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra-1.so:/usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.a:/usr/lib/x86_64-linux-gnu/libsvn_repos-1.so:/usr/lib/x86_64-linux-gnu/libsvn_subr-1.so:/usr/lib/x86_64-linux-gnu/libsvn_wc-1.so:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-lib/lib:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-build/gtest/lib/.libs:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protobuf-2.5.0-lib/lib/lib
> ```
> We need to (1) audit this so that we are confident the linking process works 
> like we want it to, and (2) make sure we don't triple link dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3576) Audit CMake linking flags

2017-08-11 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3576:
-
Shepherd: Joseph Wu  (was: Joris Van Remoortere)

> Audit CMake linking flags
> -
>
> Key: MESOS-3576
> URL: https://issues.apache.org/jira/browse/MESOS-3576
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: build, mesosphere
>
> If you look at the linking flags for autoconf's stout tests build:
> ```
> ./.libs/libgmock.a glog-0.3.3/.libs/libglog.a -lgflags 
> protobuf-2.5.0/src/.libs/libprotobuf.a -lpthread -ldl -lz 
> /usr/lib/x86_64-linux-gnu/libcurl-nss.so 
> /usr/lib/x86_64-linux-gnu/libsvn_delta-1.so 
> /usr/lib/x86_64-linux-gnu/libsvn_subr-1.so 
> /usr/lib/x86_64-linux-gnu/libapr-1.so -lrt -pthread
> ```
> you'll notice that they are much more concise than our CMake build:
> ```
> -L/usr/lib/x86_64-linux-gnu/libapr-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_client-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_delta-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_diff-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_repos-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_subr-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_wc-1.so  
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-lib/lib
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-build/gtest/lib/.libs
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protobuf-2.5.0-lib/lib/lib
>  -rdynamic -lpthread -lgmock -lsvn_client-1 -lsvn_delta-1 -lsvn_diff-1 
> -lsvn_fs-1 -Wl,-Bstatic -lsvn_fs_fs-1 -lsvn_fs_util-1 -Wl,-Bdynamic 
> -lsvn_ra-1 -Wl,-Bstatic -lsvn_ra_local-1 -lsvn_ra_serf-1 -lsvn_ra_svn-1 
> -Wl,-Bdynamic -lsvn_repos-1 -lsvn_subr-1 -lsvn_wc-1 -lglog -lprotobuf -lgtest 
> -ldl -lapr-1 -lrt 
> -Wl,-rpath,/usr/lib/x86_64-linux-gnu/libapr-1.so:/usr/lib/x86_64-linux-gnu/libsvn_client-1.so:/usr/lib/x86_64-linux-gnu/libsvn_delta-1.so:/usr/lib/x86_64-linux-gnu/libsvn_diff-1.so:/usr/lib/x86_64-linux-gnu/libsvn_fs-1.so:/usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.a:/usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra-1.so:/usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.a:/usr/lib/x86_64-linux-gnu/libsvn_repos-1.so:/usr/lib/x86_64-linux-gnu/libsvn_subr-1.so:/usr/lib/x86_64-linux-gnu/libsvn_wc-1.so:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-lib/lib:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-build/gtest/lib/.libs:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protobuf-2.5.0-lib/lib/lib
> ```
> We need to (1) audit this so that we are confident the linking process works 
> like we want it to, and (2) make sure we don't triple link dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3576) Audit CMake linking flags

2017-08-11 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123979#comment-16123979
 ] 

Benjamin Bannier commented on MESOS-3576:
-

Adding a link to MESOS-7409 for tracking. I am not sure whether there is a 
direct connection.

> Audit CMake linking flags
> -
>
> Key: MESOS-3576
> URL: https://issues.apache.org/jira/browse/MESOS-3576
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: build, mesosphere
>
> If you look at the linking flags for autoconf's stout tests build:
> ```
> ./.libs/libgmock.a glog-0.3.3/.libs/libglog.a -lgflags 
> protobuf-2.5.0/src/.libs/libprotobuf.a -lpthread -ldl -lz 
> /usr/lib/x86_64-linux-gnu/libcurl-nss.so 
> /usr/lib/x86_64-linux-gnu/libsvn_delta-1.so 
> /usr/lib/x86_64-linux-gnu/libsvn_subr-1.so 
> /usr/lib/x86_64-linux-gnu/libapr-1.so -lrt -pthread
> ```
> you'll notice that they are much more concise than our CMake build:
> ```
> -L/usr/lib/x86_64-linux-gnu/libapr-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_client-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_delta-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_diff-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_repos-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_subr-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_wc-1.so  
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-lib/lib
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-build/gtest/lib/.libs
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protobuf-2.5.0-lib/lib/lib
>  -rdynamic -lpthread -lgmock -lsvn_client-1 -lsvn_delta-1 -lsvn_diff-1 
> -lsvn_fs-1 -Wl,-Bstatic -lsvn_fs_fs-1 -lsvn_fs_util-1 -Wl,-Bdynamic 
> -lsvn_ra-1 -Wl,-Bstatic -lsvn_ra_local-1 -lsvn_ra_serf-1 -lsvn_ra_svn-1 
> -Wl,-Bdynamic -lsvn_repos-1 -lsvn_subr-1 -lsvn_wc-1 -lglog -lprotobuf -lgtest 
> -ldl -lapr-1 -lrt 
> -Wl,-rpath,/usr/lib/x86_64-linux-gnu/libapr-1.so:/usr/lib/x86_64-linux-gnu/libsvn_client-1.so:/usr/lib/x86_64-linux-gnu/libsvn_delta-1.so:/usr/lib/x86_64-linux-gnu/libsvn_diff-1.so:/usr/lib/x86_64-linux-gnu/libsvn_fs-1.so:/usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.a:/usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra-1.so:/usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.a:/usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.a:/usr/lib/x86_64-linux-gnu/libsvn_repos-1.so:/usr/lib/x86_64-linux-gnu/libsvn_subr-1.so:/usr/lib/x86_64-linux-gnu/libsvn_wc-1.so:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-lib/lib:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-build/gtest/lib/.libs:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib:/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protobuf-2.5.0-lib/lib/lib
> ```
> We need to (1) audit this so that we are confident the linking process works 
> like we want it to, and (2) make sure we don't triple link dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3576) Audit CMake linking flags

2017-08-11 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123946#comment-16123946
 ] 

Andrew Schwartzmeyer commented on MESOS-3576:
-

Master build of `stout-tests` link flags using CMake:
{noformat}
-L/home/andschwa/src/meso
s-master/build/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib  
-L/home/andschwa/src/mesos-master/build/3rdparty/protobuf-3.3.0/src/pro
tobuf-3.3.0-lib/lib/lib  
-L/home/andschwa/src/mesos-master/build/3rdparty/googletest-1.8.0/src/googletest-1.8.0-lib/lib
  -L/home/andsc
hwa/src/mesos-master/build/3rdparty/googletest-1.8.0/src/googletest-1.8.0-lib/lib/gtest
 -Wl,-rpath,/home/andschwa/src/mesos-master/bui
ld/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib:/home/andschwa/src/mesos-master/build/3rdparty/protobuf-3.3.0/src/protobuf-3.3.0-lib
/lib/lib:/home/andschwa/src/mesos-master/build/3rdparty/googletest-1.8.0/src/googletest-1.8.0-lib/lib:/home/andschwa/src/mesos-master/
build/3rdparty/googletest-1.8.0/src/googletest-1.8.0-lib/lib/gtest -lapr-1 
-lcurl -lglog -lsvn_delta-1 -lsvn_diff-1 -lsvn_subr-1 -lpro
tobuf -ldl -lapr-1 -lrt -lpthread -lapr-1 -lcurl -lglog -lsvn_delta-1 
-lsvn_diff-1 -lsvn_subr-1 -lprotobuf -ldl -lapr-1 -lrt -lpthread
 -lapr-1 -lcurl -lglog -lsvn_delta-1 -lsvn_diff-1 -lsvn_subr-1 -lprotobuf -ldl 
-lapr-1 -lrt -lpthread -lapr-1 -lcurl -lglog -lsvn_delt
a-1 -lsvn_diff-1 -lsvn_subr-1 -lprotobuf -ldl -lapr-1 -lrt -lgmock -lgtest 
-lpthread -lgmock -lgtest
{noformat}

Build of `stout-tests` link flags using CMake with my patch series:
{noformat}
-Wl,-rpath,/home/andschwa/src/mesos/build/3rdparty/glog-0.3.3/src/glog-0.3.3-build/lib:/home/andschwa/src/mesos/build/3rdparty/protobuf-3.3.0/src/protobuf-3.3.0-build
 /usr/lib64/libapr-1.so /usr/lib64/libcurl.so 
3rdparty/glog-0.3.3/src/glog-0.3.3-build/lib/libglog.so 
3rdparty/protobuf-3.3.0/src/protobuf-3.3.0-build/libprotobuf.so -lpthread 
/usr/lib64/libz.so -lrt -ldl /lib64/libsvn_delta-1.so /lib64/libsvn_diff-1.so 
/lib64/libsvn_subr-1.so 
3rdparty/googletest-1.8.0/src/googletest-1.8.0-build/googlemock/libgmock.a 
3rdparty/googletest-1.8.0/src/googletest-1.8.0-build/googlemock/gtest/libgtest.a
{noformat}

I'll note that the remaining difference is CMake is linking to shared libraries 
of glog and protobuf, whereas the Autotools build links to static libraries.

> Audit CMake linking flags
> -
>
> Key: MESOS-3576
> URL: https://issues.apache.org/jira/browse/MESOS-3576
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: build, mesosphere
>
> If you look at the linking flags for autoconf's stout tests build:
> ```
> ./.libs/libgmock.a glog-0.3.3/.libs/libglog.a -lgflags 
> protobuf-2.5.0/src/.libs/libprotobuf.a -lpthread -ldl -lz 
> /usr/lib/x86_64-linux-gnu/libcurl-nss.so 
> /usr/lib/x86_64-linux-gnu/libsvn_delta-1.so 
> /usr/lib/x86_64-linux-gnu/libsvn_subr-1.so 
> /usr/lib/x86_64-linux-gnu/libapr-1.so -lrt -pthread
> ```
> you'll notice that they are much more concise than our CMake build:
> ```
> -L/usr/lib/x86_64-linux-gnu/libapr-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_client-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_delta-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_diff-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.a  
> -L/usr/lib/x86_64-linux-gnu/libsvn_repos-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_subr-1.so  
> -L/usr/lib/x86_64-linux-gnu/libsvn_wc-1.so  
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-lib/lib
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/gmock-1.7.0/src/gmock-1.7.0-build/gtest/lib/.libs
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/lib
>   
> -L/home/joris/projects/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protobuf-2.5.0-lib/lib/lib
>  -rdynamic -lpthread -lgmock -lsvn_client-1 -lsvn_delta-1 -lsvn_diff-1 
> -lsvn_fs-1 -Wl,-Bstatic -lsvn_fs_fs-1 -lsvn_fs_util-1 -Wl,-Bdynamic 
> -lsvn_ra-1 -Wl,-Bstatic -lsvn_ra_local-1 -lsvn_ra_serf-1 -lsvn_ra_svn-1 
> -Wl,-Bdynamic -lsvn_repos-1 -lsvn_subr-1 -lsvn_wc-1 -lglog -lprotobuf -lgtest 
> -ldl -lapr-1 -lrt 
> 

[jira] [Created] (MESOS-7884) Support containerd on Mesos.

2017-08-11 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-7884:
---

 Summary: Support containerd on Mesos.
 Key: MESOS-7884
 URL: https://issues.apache.org/jira/browse/MESOS-7884
 Project: Mesos
  Issue Type: Epic
  Components: containerization
Reporter: Gilbert Song


containerd v1.0 is very close (v1.0.0 alpha 4 now) to the formal release. We 
should consider support containerd on Mesos, either by refactoring the docker 
containerizer or introduce a new containerd containerizer. Design and 
suggestions are definitely welcome.

https://github.com/containerd/containerd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7296) CMake 2.8.10 does not support TIMESTAMP

2017-08-11 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124356#comment-16124356
 ] 

Andrew Schwartzmeyer commented on MESOS-7296:
-

I put this back to using {{TIMESTAMP}} in my WIP refactor because we're now 
updating to CMake 3.7.

> CMake 2.8.10 does not support TIMESTAMP
> ---
>
> Key: MESOS-7296
> URL: https://issues.apache.org/jira/browse/MESOS-7296
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
> Environment: Anywhere with CMake 2.x instead of 3.x (specifically 
> 2.8.10).
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Trivial
>  Labels: build
>
> The patch https://reviews.apache.org/r/57052/ moved the build time and date 
> info out of compile definitions and into a build file. While testing, an 
> existent bug was discovered where the CMake command `string(TIMESTAMP 
> BUILD_TIME "%s" UTC)` is unsupported with CMake 2.8.10. Instead of replacing 
> the variable with the time, it replaces with with "%s".
> This is not a Linux vs Windows bug. This is specifically unsupported in 
> 2.8.10. Configuring with `cmake3` does not reproduce the bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6959) Separate the mesos-containerizer binary into a static binary, which only depends on stout

2017-08-11 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124344#comment-16124344
 ] 

Andrew Schwartzmeyer commented on MESOS-6959:
-

I attempted this. I can get it compiling with the following:

{noformat}
add_executable(mesos-containerizer
  main.cpp
  containerizer.cpp
  )
target_include_directories(mesos-containerizer PRIVATE
  ${CMAKE_SOURCE_DIR}/3rdparty/libprocess/include
  ${MESOS_PUBLIC_INCLUDE_DIR}
  ${MESOS_BIN_INCLUDE_DIR}
  ${MESOS_BIN_INCLUDE_DIR}/mesos
  ${MESOS_BIN_SRC_DIR}
  ${MESOS_SRC_DIR})
target_link_libraries(mesos-containerizer PRIVATE stout nvml)
{noformat}

There are obviously more sources that go into this binary, but this enough to 
demonstrate the dependency on {{libprocess}}.

Snippet of errors:

{noformat}
containerizer.cpp:(.text._ZNK5mesos5slave11ContainerIO2IOcvN7process10Subprocess2IOEEv[_ZNK5mesos5slave11ContainerIO2IOcvN7process10Subprocess2IOEEv]+0x52):
 undefined reference to `process::Subprocess::FD(int, process::Subprocess
::IO::FDType)'
containerizer.cpp:(.text._ZNK5mesos5slave11ContainerIO2IOcvN7process10Subprocess2IOEEv[_ZNK5mesos5slave11ContainerIO2IOcvN7process10Subprocess2IOEEv]+0x76):
 undefined reference to `process::Subprocess::PATH(std::string const&)'
src/slave/containerizer/mesos/CMakeFiles/mesos-containerizer.dir/containerizer.cpp.o:
 In function `process::terminate(process::ProcessBase const*, bool)':
containerizer.cpp:(.text._ZN7process9terminateEPKNS_11ProcessBaseEb[_ZN7process9terminateEPKNS_11ProcessBaseEb]+0x28):
 undefined reference to `process::terminate(process::UPID const&, bool)'
src/slave/containerizer/mesos/CMakeFiles/mesos-containerizer.dir/containerizer.cpp.o:
 In function `process::wait(process::ProcessBase const*, Duration const&)':
containerizer.cpp:(.text._ZN7process4waitEPKNS_11ProcessBaseERK8Duration[_ZN7process4waitEPKNS_11ProcessBaseERK8Duration]+0x27):
 undefined reference to `process::wait(process::UPID const&, Duration const&)'
src/slave/containerizer/mesos/CMakeFiles/mesos-containerizer.dir/containerizer.cpp.o:
 In function `process::metrics::Metric::push(double)':
containerizer.cpp:(.text._ZN7process7metrics6Metric4pushEd[_ZN7process7metrics6Metric4pushEd]+0x37):
 undefined reference to `process::Clock::now()'
src/slave/containerizer/mesos/CMakeFiles/mesos-containerizer.dir/containerizer.cpp.o:
 In function `process::metrics::Metric::Data::Data(std::string const&, 
Option const&)':
containerizer.cpp:(.text._ZN7process7metrics6Metric4DataC2ERKSsRK6OptionI8DurationE[_ZN7process7metrics6Metric4DataC5ERKSsRK6OptionI8DurationE]+0x89):
 undefined reference to `process::TIME_SERIES_CAPACITY'
src/slave/containerizer/mesos/CMakeFiles/mesos-containerizer.dir/containerizer.cpp.o:
 In function `process::metrics::remove(process::metrics::Metric const&)':
containerizer.cpp:(.text._ZN7process7metrics6removeERKNS0_6MetricE[_ZN7process7metrics6removeERKNS0_6MetricE]+0x67):
 undefined reference to `process::initialize(Option const&, 
Option const&, Option const&)'
containerizer.cpp:(.text._ZN7process7metrics6removeERKNS0_6MetricE[_ZN7process7metrics6removeERKNS0_6MetricE]+0xa4):
 undefined reference to 
`process::metrics::internal::MetricsProcess::remove(std::string const&)'
containerizer.cpp:(.text._ZN7process7metrics6removeERKNS0_6MetricE[_ZN7process7metrics6removeERKNS0_6MetricE]+0xc4):
 undefined reference to `process::metrics::internal::metrics'
{noformat}

So I will leave this as-is (linking to {{libmesos}}.)

> Separate the mesos-containerizer binary into a static binary, which only 
> depends on stout
> -
>
> Key: MESOS-6959
> URL: https://issues.apache.org/jira/browse/MESOS-6959
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Joseph Wu
>Assignee: Andrew Schwartzmeyer
>  Labels: cmake, mesosphere, microsoft
>
> The {{mesos-containerizer}} binary currently has [three 
> commands|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/main.cpp#L46-L48]:
> * 
> [MesosContainerizerLaunch|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/launch.cpp]
> * 
> [MesosContainerizerMount|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/mount.cpp]
> * 
> [NetworkCniIsolatorSetup|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L1776-L1997]
> These commands are all heavily dependent on stout, and have no need to be 
> linked to libprocess.  In fact, adding an erroneous call to 
> {{process::initialize}} (either explicitly, or by accidentally using a 
> libprocess method) will 

[jira] [Issue Comment Deleted] (MESOS-6959) Separate the mesos-containerizer binary into a static binary, which only depends on stout

2017-08-11 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6959:

Comment: was deleted

(was: -We can pretty easily replace this with 
[GetProductInfo|https://msdn.microsoft.com/en-us/library/windows/desktop/ms724358(v=vs.85).aspx].-)

> Separate the mesos-containerizer binary into a static binary, which only 
> depends on stout
> -
>
> Key: MESOS-6959
> URL: https://issues.apache.org/jira/browse/MESOS-6959
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Joseph Wu
>Assignee: Andrew Schwartzmeyer
>  Labels: cmake, mesosphere, microsoft
>
> The {{mesos-containerizer}} binary currently has [three 
> commands|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/main.cpp#L46-L48]:
> * 
> [MesosContainerizerLaunch|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/launch.cpp]
> * 
> [MesosContainerizerMount|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/mount.cpp]
> * 
> [NetworkCniIsolatorSetup|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L1776-L1997]
> These commands are all heavily dependent on stout, and have no need to be 
> linked to libprocess.  In fact, adding an erroneous call to 
> {{process::initialize}} (either explicitly, or by accidentally using a 
> libprocess method) will break {{mesos-containerizer}} can cause several Mesos 
> containerizer tests to fail.  (The tasks fail to launch, saying {{Failed to 
> synchronize with agent (it's probably exited)}}).
> Because this binary only depends on stout, we can separate it from the other 
> source files and make this a static binary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7652) Docker image with universal containerizer does not work if WORKDIR is missing in the rootfs.

2017-08-11 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123541#comment-16123541
 ] 

Kapil Arya commented on MESOS-7652:
---

Thanks [~jieyu]!

> Docker image with universal containerizer does not work if WORKDIR is missing 
> in the rootfs.
> 
>
> Key: MESOS-7652
> URL: https://issues.apache.org/jira/browse/MESOS-7652
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.1
>Reporter: michael beisiegel
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: mesosphere
>
> hello,
> used the following docker image recently
> quay.io/spinnaker/front50:master
> https://quay.io/repository/spinnaker/front50
> Here the link to the Dockerfile
> https://github.com/spinnaker/front50/blob/master/Dockerfile
> and here the source
> {color:blue}FROM java:8
> MAINTAINER delivery-engineer...@netflix.com
> COPY . workdir/
> WORKDIR workdir
> RUN GRADLE_USER_HOME=cache ./gradlew buildDeb -x test && \
>   dpkg -i ./front50-web/build/distributions/*.deb && \
>   cd .. && \
>   rm -rf workdir
> CMD ["/opt/front50/bin/front50"]{color}
> The image works fine with the docker containerizer, but the universal 
> containerizer shows the following in stderr.
> "Failed to chdir into current working directory '/workdir': No such file or 
> directory"
> The problem comes from the fact that the Dockerfile creates a workdir but 
> then later removes the created dir as part of a RUN. The docker containerizer 
> has no problem with it if you do
> docker run -ti --rm quay.io/spinnaker/front50:master bash
> you get into the working dir, but the universal containerizer fails with the 
> error.
> thanks for your help,
> Michael



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7691) Support local enabled cgroups subsystems automatically.

2017-08-11 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7691:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Support local enabled cgroups subsystems automatically.
> ---
>
> Key: MESOS-7691
> URL: https://issues.apache.org/jira/browse/MESOS-7691
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: cgroups
>
> Currently, each cgroup subsystem needs to be turned on as an isolator, e.g., 
> "cgroups/blkio". Ideally, mesos should be able to detect all local enabled 
> cgroup subsystems and turn them on automatically (or we call it auto cgroups).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)