[jira] [Updated] (MESOS-7863) Agent may drop pending kill task status updates.

2017-08-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7863:
---
Target Version/s: 1.2.3, 1.3.2, 1.4.0

> Agent may drop pending kill task status updates.
> 
>
> Key: MESOS-7863
> URL: https://issues.apache.org/jira/browse/MESOS-7863
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Critical
>
> Currently there is an assumption that when a pending task is killed, the 
> framework will still be stored in the agent. However, this assumption can be 
> violated in two cases:
> # Another pending task was killed and we removed the framework in 
> 'Slave::run' thinking it was idle, because pending tasks were empty (we 
> remove from pending tasks when processing the kill). (MESOS-7783 is an 
> example instance of this).
> # The last executor terminated without tasks to send terminal updates for, or 
> the last terminated executor received its last acknowledgement. At this 
> point, we remove the framework thinking there were no pending tasks if the 
> task was killed (removed from pending).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7783) Framework might not receive status update when a just launched task is killed immediately

2017-08-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7783:
---
Fix Version/s: 1.4.0
   1.3.2
   1.2.3

> Framework might not receive status update when a just launched task is killed 
> immediately
> -
>
> Key: MESOS-7783
> URL: https://issues.apache.org/jira/browse/MESOS-7783
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.2.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
> Attachments: GroupDeployIntegrationTest.log.zip, logs
>
>
> Our Marathon team are seeing issues in their integration test suite when 
> Marathon gets stuck in an infinite loop trying to kill a just launched task. 
> In their test a task launched which is immediately followed by killing the 
> task -- the framework does e.g., not wait for any task status update.
> In this case the launch and kill messages arrive at the agent in the correct 
> order, but both the launch and kill paths in the agent do not reach the point 
> where a status update is sent to the framework. Since the framework has seen 
> no status update on the task it re-triggers a kill, causing an infinite loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7783) Framework might not receive status update when a just launched task is killed immediately

2017-08-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7783:
---
Target Version/s: 1.2.3, 1.3.2, 1.4.0
   Fix Version/s: (was: 1.2.3)
  (was: 1.3.2)
  (was: 1.4.0)

> Framework might not receive status update when a just launched task is killed 
> immediately
> -
>
> Key: MESOS-7783
> URL: https://issues.apache.org/jira/browse/MESOS-7783
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.2.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
> Attachments: GroupDeployIntegrationTest.log.zip, logs
>
>
> Our Marathon team are seeing issues in their integration test suite when 
> Marathon gets stuck in an infinite loop trying to kill a just launched task. 
> In their test a task launched which is immediately followed by killing the 
> task -- the framework does e.g., not wait for any task status update.
> In this case the launch and kill messages arrive at the agent in the correct 
> order, but both the launch and kill paths in the agent do not reach the point 
> where a status update is sent to the framework. Since the framework has seen 
> no status update on the task it re-triggers a kill, causing an infinite loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7863) Agent may drop pending kill task status updates.

2017-08-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7863:
---
Sprint: Mesosphere Sprint 60

> Agent may drop pending kill task status updates.
> 
>
> Key: MESOS-7863
> URL: https://issues.apache.org/jira/browse/MESOS-7863
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Critical
>
> Currently there is an assumption that when a pending task is killed, the 
> framework will still be stored in the agent. However, this assumption can be 
> violated in two cases:
> # Another pending task was killed and we removed the framework in 
> 'Slave::run' thinking it was idle, because pending tasks were empty (we 
> remove from pending tasks when processing the kill). (MESOS-7783 is an 
> example instance of this).
> # The last executor terminated without tasks to send terminal updates for, or 
> the last terminated executor received its last acknowledgement. At this 
> point, we remove the framework thinking there were no pending tasks if the 
> task was killed (removed from pending).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7863) Agent may drop pending kill task status updates.

2017-08-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-7863:
--

Assignee: Benjamin Mahler

> Agent may drop pending kill task status updates.
> 
>
> Key: MESOS-7863
> URL: https://issues.apache.org/jira/browse/MESOS-7863
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Critical
>
> Currently there is an assumption that when a pending task is killed, the 
> framework will still be stored in the agent. However, this assumption can be 
> violated in two cases:
> # Another pending task was killed and we removed the framework in 
> 'Slave::run' thinking it was idle, because pending tasks were empty (we 
> remove from pending tasks when processing the kill). (MESOS-7783 is an 
> example instance of this).
> # The last executor terminated without tasks to send terminal updates for, or 
> the last terminated executor received its last acknowledgement. At this 
> point, we remove the framework thinking there were no pending tasks if the 
> task was killed (removed from pending).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7863) Agent may drop pending kill task status updates.

2017-08-04 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-7863:
--

 Summary: Agent may drop pending kill task status updates.
 Key: MESOS-7863
 URL: https://issues.apache.org/jira/browse/MESOS-7863
 Project: Mesos
  Issue Type: Bug
  Components: agent
Reporter: Benjamin Mahler
Priority: Critical


Currently there is an assumption that when a pending task is killed, the 
framework will still be stored in the agent. However, this assumption can be 
violated in two cases:

# Another pending task was killed and we removed the framework in 'Slave::run' 
thinking it was idle, because pending tasks were empty (we remove from pending 
tasks when processing the kill). (MESOS-7783 is an example instance of this).
# The last executor terminated without tasks to send terminal updates for, or 
the last terminated executor received its last acknowledgement. At this point, 
we remove the framework thinking there were no pending tasks if the task was 
killed (removed from pending).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7783) Framework might not receive status update when a just launched task is killed immediately

2017-08-04 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-7783:
--

Assignee: Benjamin Mahler

> Framework might not receive status update when a just launched task is killed 
> immediately
> -
>
> Key: MESOS-7783
> URL: https://issues.apache.org/jira/browse/MESOS-7783
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.2.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
> Attachments: GroupDeployIntegrationTest.log.zip, logs
>
>
> Our Marathon team are seeing issues in their integration test suite when 
> Marathon gets stuck in an infinite loop trying to kill a just launched task. 
> In their test a task launched which is immediately followed by killing the 
> task -- the framework does e.g., not wait for any task status update.
> In this case the launch and kill messages arrive at the agent in the correct 
> order, but both the launch and kill paths in the agent do not reach the point 
> where a status update is sent to the framework. Since the framework has seen 
> no status update on the task it re-triggers a kill, causing an infinite loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7862) Get rid of timestamp and date in generated javadoc files

2017-08-04 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-7862:
-

   Resolution: Fixed
 Assignee: Vinod Kone
Fix Version/s: 1.4.0

Looks like the `-notimestamp` arg gets rid of both the timestamp and meta date 
tag. Pushed the fix.

commit 13953dc4bd59f186b3a17958f9f10881e28ab6a3
Author: Vinod Kone 
Date:   Fri Aug 4 14:39:17 2017 -0700

Added "-notimestamp" argument to javadoc.

This causes javadoc to not generate timestamp and date meta tags in
the generated Java docs. This reduces the size of the diff everytime
the website publish bot generates the website.


> Get rid of timestamp and date in generated javadoc files
> 
>
> Key: MESOS-7862
> URL: https://issues.apache.org/jira/browse/MESOS-7862
> Project: Mesos
>  Issue Type: Improvement
>  Components: project website
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: newbie
> Fix For: 1.4.0
>
>
> Having timestamp in the generated doc files, makes the diff huge when 
> periodically generating the website from the CI bot. 
> See:
> https://github.com/apache/mesos-site/commit/45df3a33f91145e3be27c37f3a732c940919ff06#diff-f51f8cc1f105932d260e610ba29b0e92
> {code}
> -
>  +
>   Protos.WeightInfoOrBuilder
>  -
>  +
> {code}
> Currently, huge diffs break git mirroring from "git-wip/mesos-site" to 
> "git/mesos-site" in ASF INFRA. See related ticket.
> Looks like we can use "-notimestamp" for getting rid of the "Generated..." 
> line. Don't know how to get rid of the meta tag yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7783) Framework might not receive status update when a just launched task is killed immediately

2017-08-04 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114988#comment-16114988
 ] 

Benjamin Mahler commented on MESOS-7783:


The bug occurs as follows:

(1) Two (or more) tasks arrive at the agent, but do not yet reach 
{{Slave::_run}}.
(2) Kill task messages arrive at the agent and are processed.
(3) The first task to reach {{Slave::_run}} will cause the framework to be 
removed, since the pending tasks / executors are now empty (see 
[here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp?utf8=%E2%9C%93#L1841-L1845]).
(4) The remaining tasks to reach {{Slave::_run}} encounter the framework as 
removed and are dropped without a status update (see 
[here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp?utf8=%E2%9C%93#L1788-L1794]).

> Framework might not receive status update when a just launched task is killed 
> immediately
> -
>
> Key: MESOS-7783
> URL: https://issues.apache.org/jira/browse/MESOS-7783
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.2.0
>Reporter: Benjamin Bannier
>Priority: Critical
>  Labels: reliability
> Attachments: GroupDeployIntegrationTest.log.zip, logs
>
>
> Our Marathon team are seeing issues in their integration test suite when 
> Marathon gets stuck in an infinite loop trying to kill a just launched task. 
> In their test a task launched which is immediately followed by killing the 
> task -- the framework does e.g., not wait for any task status update.
> In this case the launch and kill messages arrive at the agent in the correct 
> order, but both the launch and kill paths in the agent do not reach the point 
> where a status update is sent to the framework. Since the framework has seen 
> no status update on the task it re-triggers a kill, causing an infinite loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7862) Get rid of timestamp and date in generated javadoc files

2017-08-04 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7862:
--
Labels: newbie  (was: )

> Get rid of timestamp and date in generated javadoc files
> 
>
> Key: MESOS-7862
> URL: https://issues.apache.org/jira/browse/MESOS-7862
> Project: Mesos
>  Issue Type: Improvement
>  Components: project website
>Reporter: Vinod Kone
>  Labels: newbie
>
> Having timestamp in the generated doc files, makes the diff huge when 
> periodically generating the website from the CI bot. 
> See:
> https://github.com/apache/mesos-site/commit/45df3a33f91145e3be27c37f3a732c940919ff06#diff-f51f8cc1f105932d260e610ba29b0e92
> {code}
> -
>  +
>   Protos.WeightInfoOrBuilder
>  -
>  +
> {code}
> Currently, huge diffs break git mirroring from "git-wip/mesos-site" to 
> "git/mesos-site" in ASF INFRA. See related ticket.
> Looks like we can use "-notimestamp" for getting rid of the "Generated..." 
> line. Don't know how to get rid of the meta tag yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7862) Get rid of timestamp and date in generated javadoc files

2017-08-04 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-7862:
-

 Summary: Get rid of timestamp and date in generated javadoc files
 Key: MESOS-7862
 URL: https://issues.apache.org/jira/browse/MESOS-7862
 Project: Mesos
  Issue Type: Improvement
  Components: project website
Reporter: Vinod Kone


Having timestamp in the generated doc files, makes the diff huge when 
periodically generating the website from the CI bot. 

See:
https://github.com/apache/mesos-site/commit/45df3a33f91145e3be27c37f3a732c940919ff06#diff-f51f8cc1f105932d260e610ba29b0e92

{code}
-
 +
  Protos.WeightInfoOrBuilder
 -
 +
{code}

Currently, huge diffs break git mirroring from "git-wip/mesos-site" to 
"git/mesos-site" in ASF INFRA. See related ticket.

Looks like we can use "-notimestamp" for getting rid of the "Generated..." 
line. Don't know how to get rid of the meta tag yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7861) Health and readiness check output inaccessible with default executor

2017-08-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7861:
---
Labels: check default-executor health-check mesosphere  (was: 
default-executor mesosphere)

> Health and readiness check output inaccessible with default executor
> 
>
> Key: MESOS-7861
> URL: https://issues.apache.org/jira/browse/MESOS-7861
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.3.0
>Reporter: Michael Browning
>Priority: Minor
>  Labels: check, default-executor, health-check, mesosphere
>
> With the default executor, health and readiness checks are run in their own 
> nested containers, whose sandboxes are cleaned up after they terminate. This 
> makes access to stdout/stderr of the check command effectively impossible. 
> Although the exit code of the command being run is reported in a task status, 
> it is often necessary to see the command's actual output when debugging a 
> framework issue, so the ability to access this output via the executor logs 
> would be helpful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7861) Health and readiness check output inaccessible with default executor

2017-08-04 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7861:
--
Affects Version/s: 1.3.0
 Target Version/s: 1.5.0
   Labels: default-executor mesosphere  (was: )

> Health and readiness check output inaccessible with default executor
> 
>
> Key: MESOS-7861
> URL: https://issues.apache.org/jira/browse/MESOS-7861
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.3.0
>Reporter: Michael Browning
>Priority: Minor
>  Labels: default-executor, mesosphere
>
> With the default executor, health and readiness checks are run in their own 
> nested containers, whose sandboxes are cleaned up after they terminate. This 
> makes access to stdout/stderr of the check command effectively impossible. 
> Although the exit code of the command being run is reported in a task status, 
> it is often necessary to see the command's actual output when debugging a 
> framework issue, so the ability to access this output via the executor logs 
> would be helpful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7861) Health and readiness check output inaccessible with default executor

2017-08-04 Thread Michael Browning (JIRA)
Michael Browning created MESOS-7861:
---

 Summary: Health and readiness check output inaccessible with 
default executor
 Key: MESOS-7861
 URL: https://issues.apache.org/jira/browse/MESOS-7861
 Project: Mesos
  Issue Type: Bug
  Components: executor
Reporter: Michael Browning
Priority: Minor


With the default executor, health and readiness checks are run in their own 
nested containers, whose sandboxes are cleaned up after they terminate. This 
makes access to stdout/stderr of the check command effectively impossible. 
Although the exit code of the command being run is reported in a task status, 
it is often necessary to see the command's actual output when debugging a 
framework issue, so the ability to access this output via the executor logs 
would be helpful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7215) Race condition on re-registration of non-partition-aware frameworks

2017-08-04 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114745#comment-16114745
 ] 

Yan Xu commented on MESOS-7215:
---

Communicated over slack but yeah it's being worked on and a patch will be ready 
soon.

> Race condition on re-registration of non-partition-aware frameworks
> ---
>
> Key: MESOS-7215
> URL: https://issues.apache.org/jira/browse/MESOS-7215
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Yan Xu
>Assignee: Megha Sharma
>Priority: Critical
>
> Prior to the partition-awareness work MESOS-5344, upon agent reregistration 
> after it has been removed, the master only sends ShutdownFrameworkMessages to 
> the agent for frameworks that it knows have been torn down. 
> With the new logic in MESOS-5344, Mesos is now sending 
> {{ShutdownFrameworkMessages}} to the agent for all non-partition-aware 
> frameworks (including the ones that are still registered)
> This is problematic. The offer from this agent can still go to the same 
> framework which can then launch new tasks. The agent then receives tasks of 
> the same framework and ignores them because it thinks the framework is 
> shutting down. The framework is not shutting down of course, so from the 
> master and the scheduler's perspective the task is pending in STAGING forever 
> until the next agent reregistration, which could happen much later.
> This also makes the semantics of `ShutdownFrameworkMessage` ambiguous: the 
> agent is assuming the framework to be going away (and act accordingly) when 
> it's not. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7853) Support shared PID namespace.

2017-08-04 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114702#comment-16114702
 ] 

Gilbert Song commented on MESOS-7853:
-

[~jpe...@apache.org], this use case can be resolved by one more level nested 
container. Let me paste our conversation in case any other people have similar 
concern:
https://mesos.slack.com/archives/C1YN31DPA/p1501862379420730


> Support shared PID namespace.
> -
>
> Key: MESOS-7853
> URL: https://issues.apache.org/jira/browse/MESOS-7853
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>  Labels: containerizer, mesosphere, namespaces
>
> Currently, with the 'namespaces/pid' isolator enabled, each container will 
> have its own pid namespace. This does not meet the need for some scenarios. 
> For example, under the same executor container, one task wants to reach out 
> to another task which need to share the same pid namespace.
> We should support container pid namespace to be configurable. Users can 
> choose one container to share its parent's pid namespace or not.
> User facing API:
> {noformat}
> message LinuxInfo {
>   ..
>   // True if it shares the pid namepace with its parent. If the
>   // container is a top level container, it means share the pid
>   // namespace with the agent. If the container is a nested
>   // container, it means share the pid namespce with its parent
>   // container. This field will be ignored if 'namespaces/pid'
>   // isolator is not enabled.
>   optional bool share_pid_namespace = 4;
> }
> {noformat}
> A new agent flag:
> --disallow_top_level_pid_ns_sharing (defaults to be: false)
> this is a security concern from operator's perspective. While some of the 
> nested containers share the pid namespace from their parents, the top level 
> containers always not share the pid ns from the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7858) Launching a nested container with namespace/pid isolation, with glibc < 2.25, may deadlock the LinuxLauncher and MesosContainerizer

2017-08-04 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7858:
--
Affects Version/s: 1.2.1

> Launching a nested container with namespace/pid isolation, with glibc < 2.25, 
> may deadlock the LinuxLauncher and MesosContainerizer
> ---
>
> Key: MESOS-7858
> URL: https://issues.apache.org/jira/browse/MESOS-7858
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Joseph Wu
>  Labels: health-check, mesosphere
>
> This bug in glibc (fixed in glibc 2.25) will sometimes cause a child process 
> of a {{fork}} to {{assert}} incorrectly, if the parent enters a new pid 
> namespace before forking: 
> https://sourceware.org/bugzilla/show_bug.cgi?id=15392
> https://sourceware.org/bugzilla/show_bug.cgi?id=21386
> The LinuxLauncher code happens to do this when launching nested containers:
> * The MesosContainerizer process launches a subprocess, with a customized 
> {{ns::clone}} function as an argument.  The thread then basically waits for 
> the launch to succeed and return a child PID: 
> https://github.com/apache/mesos/blob/1.3.x/src/slave/containerizer/mesos/linux_launcher.cpp#L495
> * A separate thread in the Mesos agent forks and then waits for the 
> grandchild to report a PID: 
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L453
> * The child of the fork first enters the namespaces (including a pid 
> namespace) and then forks a grandchild.  The child then calls {{waitpid}} on 
> the grandchild: 
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L555
> * Due to the glibc bug, the grandchild sometimes never returns from the 
> {{fork}} here: 
> https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L540
> According to the glibc bug, we can work around this by:
> {quote}
> The obvious solution is just to use clone() after setns() and never use 
> fork() - and one can certainly patch both programs to do so. Nevertheless it 
> would be nice to see if fork() also worked after setns(), especially since 
> there is no inherent reason for it not to.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7860) Simplify website development workflow

2017-08-04 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-7860:
-

 Summary: Simplify website development workflow
 Key: MESOS-7860
 URL: https://issues.apache.org/jira/browse/MESOS-7860
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone


Right now, the website development workflow requires one to use `docker` (see 
site/README.md). 

Since we use `bundler` as a level of abstraction already, we could consider 
removing `docker` as an abstraction on top. 

Things to figure out: How to get non-bundler-managed dependencies (e.g., ruby, 
ruby-devel, doxygen) installed in a platform agnostic way.

Another option to simplify would be to still use `docker`, but consolidate the 
website development workflow with the website publishing workflow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7860) Simplify website development workflow

2017-08-04 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7860:
--
Component/s: project website

> Simplify website development workflow
> -
>
> Key: MESOS-7860
> URL: https://issues.apache.org/jira/browse/MESOS-7860
> Project: Mesos
>  Issue Type: Improvement
>  Components: project website
>Reporter: Vinod Kone
>
> Right now, the website development workflow requires one to use `docker` (see 
> site/README.md). 
> Since we use `bundler` as a level of abstraction already, we could consider 
> removing `docker` as an abstraction on top. 
> Things to figure out: How to get non-bundler-managed dependencies (e.g., 
> ruby, ruby-devel, doxygen) installed in a platform agnostic way.
> Another option to simplify would be to still use `docker`, but consolidate 
> the website development workflow with the website publishing workflow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7859) Running `bundle install` as non-root user causes `rake` to be not found

2017-08-04 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7859:
--
Component/s: project website

> Running `bundle install` as non-root user causes `rake` to be not found
> ---
>
> Key: MESOS-7859
> URL: https://issues.apache.org/jira/browse/MESOS-7859
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Vinod Kone
>
> Found this issue while trying to automate website publishing.
> {code}
> /mesos/site /mesos
> Fetching gem metadata from https://rubygems.org/
> Fetching version metadata from https://rubygems.org/..
> Fetching dependency metadata from https://rubygems.org/.
> Rubygems 2.0.14.1 is not threadsafe, so your gems will be installed one at a 
> time. Upgrade to Rubygems 2.1.0 or higher to enable parallel gem installation.
> Fetching rake 12.0.0
> Installing rake 12.0.0
> Fetching i18n 0.6.11
> Installing i18n 0.6.11
> Fetching multi_json 1.12.1
> Installing multi_json 1.12.1
> Fetching addressable 2.3.8
> Installing addressable 2.3.8
> Using bundler 1.15.3
> Fetching chunky_png 1.3.8
> Installing chunky_png 1.3.8
> Fetching coffee-script-source 1.12.2
> Installing coffee-script-source 1.12.2
> Fetching sass 3.4.24
> Installing sass 3.4.24
> Fetching rb-fsevent 0.10.2
> Installing rb-fsevent 0.10.2
> Fetching ffi 1.9.18
> Installing ffi 1.9.18 with native extensions
> Fetching eventmachine 1.2.3
> Installing eventmachine 1.2.3 with native extensions
> Fetching http_parser.rb 0.6.0
> Installing http_parser.rb 0.6.0 with native extensions
> Fetching temple 0.8.0
> Installing temple 0.8.0
> Fetching tilt 1.3.7
> Installing tilt 1.3.7
> Fetching hike 1.2.3
> Installing hike 1.2.3
> Fetching htmlentities 4.3.4
> Installing htmlentities 4.3.4
> Fetching kramdown 1.14.0
> Installing kramdown 1.14.0
> Fetching libv8 3.16.14.19 (x86_64-linux)
> Installing libv8 3.16.14.19 (x86_64-linux)
> Fetching rack 1.6.8
> Installing rack 1.6.8
> Fetching thor 0.19.4
> Installing thor 0.19.4
> Fetching thread_safe 0.3.6
> Installing thread_safe 0.3.6
> Fetching rdiscount 2.1.7
> Installing rdiscount 2.1.7 with native extensions
> Fetching ref 2.0.0
> Installing ref 2.0.0
> Fetching activesupport 3.2.22.5
> Installing activesupport 3.2.22.5
> Fetching execjs 1.4.1
> Installing execjs 1.4.1
> Fetching compass-core 1.0.3
> Installing compass-core 1.0.3
> Fetching compass-import-once 1.0.5
> Installing compass-import-once 1.0.5
> Fetching rb-inotify 0.9.10
> Installing rb-inotify 0.9.10
> Fetching rb-kqueue 0.2.5
> Installing rb-kqueue 0.2.5
> Fetching em-websocket 0.5.1
> Installing em-websocket 0.5.1
> Fetching haml 5.0.1
> Installing haml 5.0.1
> Fetching rack-test 0.6.3
> Installing rack-test 0.6.3
> Fetching sprockets 2.12.4
> Installing sprockets 2.12.4
> Fetching rack-livereload 0.3.16
> Installing rack-livereload 0.3.16
> Fetching rouge 0.3.10
> Installing rouge 0.3.10
> Fetching tzinfo 1.2.3
> Installing tzinfo 1.2.3
> Fetching therubyracer 0.12.3
> Installing therubyracer 0.12.3 with native extensions
> Fetching coffee-script 2.2.0
> Installing coffee-script 2.2.0
> Fetching uglifier 2.1.2
> Installing uglifier 2.1.2
> Fetching compass 1.0.3
> Installing compass 1.0.3
> Fetching listen 1.3.1
> Installing listen 1.3.1
> Fetching sprockets-helpers 1.1.0
> Installing sprockets-helpers 1.1.0
> Fetching sprockets-sass 1.1.0
> Installing sprockets-sass 1.1.0
> Fetching middleman-core 3.2.0
> Installing middleman-core 3.2.0
> Fetching middleman-sprockets 3.3.3
> Installing middleman-sprockets 3.3.3
> Fetching middleman-blog 3.5.1
> Installing middleman-blog 3.5.1
> Fetching middleman-livereload 3.1.0
> Installing middleman-livereload 3.1.0
> Fetching middleman-syntax 1.2.1
> Installing middleman-syntax 1.2.1
> Fetching middleman 3.2.0
> Installing middleman 3.2.0
> Bundle complete! 8 Gemfile dependencies, 49 gems now installed.
> Use `bundle info [gemname]` to see where a bundled gem is installed.
> Post-install message from compass:
> Compass is charityware. If you love it, please donate on our behalf at 
> http://umdf.org/compass Thanks!
> Post-install message from middleman-core:
> NOTICE: Middleman v3.2.x and greater no longer support Ruby 1.8
> bundler: command not found: rake
> Install missing gem executables with `bundle install`
> bundler: command not found: rake
> Install missing gem executables with `bundle install`
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7853) Support shared PID namespace.

2017-08-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114492#comment-16114492
 ] 

James Peach commented on MESOS-7853:


I don't think this is the right approach. A very common use case for this would 
be for the executor to be in PID namespace "A" and for all the nested 
containers to be in PID namespace "B" together. How can you support this with 
this protobuf definition? The only scenario this can support is joining the 
executor namespace, which is undesirable in many cases, since the executor is 
in a different security domain.

> Support shared PID namespace.
> -
>
> Key: MESOS-7853
> URL: https://issues.apache.org/jira/browse/MESOS-7853
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>  Labels: containerizer, mesosphere, namespaces
>
> Currently, with the 'namespaces/pid' isolator enabled, each container will 
> have its own pid namespace. This does not meet the need for some scenarios. 
> For example, under the same executor container, one task wants to reach out 
> to another task which need to share the same pid namespace.
> We should support container pid namespace to be configurable. Users can 
> choose one container to share its parent's pid namespace or not.
> User facing API:
> {noformat}
> message LinuxInfo {
>   ..
>   // True if it shares the pid namepace with its parent. If the
>   // container is a top level container, it means share the pid
>   // namespace with the agent. If the container is a nested
>   // container, it means share the pid namespce with its parent
>   // container. This field will be ignored if 'namespaces/pid'
>   // isolator is not enabled.
>   optional bool share_pid_namespace = 4;
> }
> {noformat}
> A new agent flag:
> --disallow_top_level_pid_ns_sharing (defaults to be: false)
> this is a security concern from operator's perspective. While some of the 
> nested containers share the pid namespace from their parents, the top level 
> containers always not share the pid ns from the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5886) FUTURE_DISPATCH may react on irrelevant dispatch.

2017-08-04 Thread Andrei Budnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik updated MESOS-5886:
-
Shepherd: Michael Park

> FUTURE_DISPATCH may react on irrelevant dispatch.
> -
>
> Key: MESOS-5886
> URL: https://issues.apache.org/jira/browse/MESOS-5886
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.2, 1.2.1, 1.3.0, 1.4.0
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: mesosphere, tech-debt, tech-debt-test
>
> [{{FUTURE_DISPATCH}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L50]
>  uses 
> [{{DispatchMatcher}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L350]
>  to figure out whether a processed {{DispatchEvent}} is the same the user is 
> waiting for. However, comparing {{std::type_info}} of function pointers is 
> not enough: different class methods with same signatures will be matched. 
> Here is the test that proves this:
> {noformat}
> class DispatchProcess : public Process
> {
> public:
>   MOCK_METHOD0(func0, void());
>   MOCK_METHOD1(func1, bool(bool));
>   MOCK_METHOD1(func1_same_but_different, bool(bool));
>   MOCK_METHOD1(func2, Future(bool));
>   MOCK_METHOD1(func3, int(int));
>   MOCK_METHOD2(func4, Future(bool, int));
> };
> {noformat}
> {noformat}
> TEST(ProcessTest, DispatchMatch)
> {
>   DispatchProcess process;
>   PID pid = spawn(&process);
>   Future future = FUTURE_DISPATCH(
>   pid,
>   &DispatchProcess::func1_same_but_different);
>   EXPECT_CALL(process, func1(_))
> .WillOnce(ReturnArg<0>());
>   dispatch(pid, &DispatchProcess::func1, true);
>   AWAIT_READY(future);
>   terminate(pid);
>   wait(pid);
> }
> {noformat}
> The test passes:
> {noformat}
> [ RUN  ] ProcessTest.DispatchMatch
> [   OK ] ProcessTest.DispatchMatch (1 ms)
> {noformat}
> This change was introduced in https://reviews.apache.org/r/28052/.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-4941) Support update existing quota.

2017-08-04 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114105#comment-16114105
 ] 

Alexander Rukletsov commented on MESOS-4941:


It got deprioritized on my backlog and I never came back to Zhitao's reviews 
after doing the first round. [~bmahler], please figure out with [~zhitao] 
whether he wants to continue working on this and revive his patches.

> Support update existing quota.
> --
>
> Key: MESOS-4941
> URL: https://issues.apache.org/jira/browse/MESOS-4941
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>  Labels: Quota, mesosphere, multitenancy, tech-debt
>
> We want to support updating an existing quota without the cycle of delete and 
> recreate. This avoids the possible starvation risk of losing the quota 
> between delete and recreate, and also makes the interface friendly.
> Design doc:
> https://docs.google.com/document/d/1c8fJY9_N0W04FtUQ_b_kZM6S0eePU7eYVyfUP14dSys



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7541) Cannot compile without pre-compiled headers on Windows

2017-08-04 Thread Piotr Wera (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114101#comment-16114101
 ] 

Piotr Wera commented on MESOS-7541:
---

I have same issue.
AF_INET6 comes form #include  ?
Environment:
Windows 7, NMake build.

> Cannot compile without pre-compiled headers on Windows
> --
>
> Key: MESOS-7541
> URL: https://issues.apache.org/jira/browse/MESOS-7541
> Project: Mesos
>  Issue Type: Bug
> Environment: Windows 10 with  -DENABLE_PRECOMPILED_HEADERS=0
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>  Labels: cmake, windows
>
> Looks like we messed up an include at some point:
> {noformat}
> "C:\Users\andschwa\src\mesos\build\src\tests\mesos-tests.vcxproj" (default 
> target) (1) ->
> "C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj" (default target) 
> (4) ->
> (ClCompile target) ->
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(104): error 
> C2065: 'AF_INET6': undeclared identifier (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(138): error 
> C2065: 'AF_INET6': undeclared identifier (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(151): error 
> C2065: 'AF_INET6': undeclared identifier (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(151): error 
> C2131: expression did not evaluate to a constant (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(151): error 
> C2051: case expression not constant (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(164): error 
> C2065: 'AF_INET6': undeclared identifier (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(164): error 
> C2131: expression did not evaluate to a constant (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(164): error 
> C2051: case expression not constant (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(233): error 
> C2065: 'AF_INET6': undeclared identifier (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(233): error 
> C2131: expression did not evaluate to a constant (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(234): error 
> C2065: 'AF_INET6': undeclared identifier (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(246): error 
> C2065: 'AF_INET6': undeclared identifier (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(246): error 
> C2512: 'Try': no appropriate default constructor available 
> (compiling source file C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(233): error 
> C2051: case expression not constant (compiling source file 
> C:\Users\andschwa\src\mesos\src\zookeeper\group.cpp) 
> [C:\Users\andschwa\src\mesos\build\src\mesos-1.4.0.vcxproj]
>   C:\Users\andschwa\src\mesos\3rdparty\stout\include\stout/ip.hpp(293): error 
> C2065: 'AF_INET6': undeclared identifier (compiling source file 
> C:\Users\an

[jira] [Commented] (MESOS-2153) Add support for systemd journal for logging

2017-08-04 Thread Kevin Cox (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114073#comment-16114073
 ] 

Kevin Cox commented on MESOS-2153:
--

Can I express recent demand for this feature?

> Add support for systemd journal for logging
> ---
>
> Key: MESOS-2153
> URL: https://issues.apache.org/jira/browse/MESOS-2153
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> We should be able to redirect master and slave logs to systemd journal on the 
> systems where it's available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)