[jira] [Commented] (MESOS-7813) when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, devices,blkio,memory,cpuacct is changed. why?

2017-07-25 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100056#comment-16100056
 ] 

Joris Van Remoortere commented on MESOS-7813:
-

[~y123456yz] take a look at this comment and the surrounding code in the 
systemd cgroup code base:
https://github.com/systemd/systemd/blob/52b1478414067eb9381b413408f920da7f162c6f/src/core/cgroup.c#L1345-L1348

> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?
> --
>
> Key: MESOS-7813
> URL: https://issues.apache.org/jira/browse/MESOS-7813
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, cgroups, executor, framework
> Environment: 1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 
> GNU/Linux
>Reporter: y123456yz
>
> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3009) Reproduce systemd cgroup behavior

2017-07-25 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100050#comment-16100050
 ] 

Joris Van Remoortere commented on MESOS-3009:
-

from: [http://man7.org/linux/man-pages/man5/systemd.resource-control.5.html]
{quote}Turns on delegation of further resource control partitioning to
   processes of the unit. For unprivileged services (i.e. those
   using the User= setting), this allows processes to create a
   subhierarchy beneath its control group path. For privileged
   services and scopes, this ensures the processes will have all
   control group controllers enabled.{quote}

Systemd has started implementing the linux kernel goal of making the cgroup 
file hierarchy read only. Sometimes it rebalances the cgroups hierarchy. If 
there are settings (files) in there that were not initiated by it then it may 
delete them if a rebalancing event occurs. One way to prevent this is to notify 
systemd that you want to control the subhierarchy for your specific systemd 
unit.

> Reproduce systemd cgroup behavior 
> --
>
> Key: MESOS-3009
> URL: https://issues.apache.org/jira/browse/MESOS-3009
> Project: Mesos
>  Issue Type: Task
>Reporter: Artem Harutyunyan
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> It has been noticed before that systemd reorganizes cgroup hierarchy created 
> by mesos slave. Because of this mesos is no longer able to find the cgroup, 
> and there is also a chance of undoing the isolation that mesos slave puts in 
> place. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7813) when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, devices,blkio,memory,cpuacct is changed. why?

2017-07-20 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094733#comment-16094733
 ] 

Joris Van Remoortere commented on MESOS-7813:
-

[~y123456yz] here is an example of the systemd configuration in DC/OS
https://github.com/dcos/dcos/blob/18c76a2b4b24aab0c4107bae9c7191a68e6de174/packages/mesos/extra/dcos-mesos-slave.service

> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?
> --
>
> Key: MESOS-7813
> URL: https://issues.apache.org/jira/browse/MESOS-7813
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, cgroups, executor, framework
> Environment: 1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 
> GNU/Linux
>Reporter: y123456yz
>
> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7813) when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, devices,blkio,memory,cpuacct is changed. why?

2017-07-19 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094144#comment-16094144
 ] 

Joris Van Remoortere commented on MESOS-7813:
-

[~y123456yz]
Check out the delegate flag in systemd.
Here is an explanation of the problem:
https://issues.apache.org/jira/browse/MESOS-3425

> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?
> --
>
> Key: MESOS-7813
> URL: https://issues.apache.org/jira/browse/MESOS-7813
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, cgroups, executor, framework
> Environment: 1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 
> GNU/Linux
>Reporter: y123456yz
>
> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6828) Consider ways for frameworks to ignore offers with an Unavailability

2017-03-30 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949915#comment-15949915
 ] 

Joris Van Remoortere commented on MESOS-6828:
-

Based on some offline discussion I want to suggest that the least dangerous 
solution (in my opinion) is to have frameworks prefer offers with the longest 
availability by default.

Aurora is a good example of a framework that collects offers and has the 
ability to express a preference while iterating the offers to match a task to 
launch.
Preferring offers with no (or longest in the future) unavailability will 
naturally tend new tasks away from machines that will be entering maintenace.
A benefit of this approach is that the agents in the schedule will still be 
used if there is demand pressure for resources by the framework.

> Consider ways for frameworks to ignore offers with an Unavailability
> 
>
> Key: MESOS-6828
> URL: https://issues.apache.org/jira/browse/MESOS-6828
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joris Van Remoortere
>Assignee: Artem Harutyunyan
>  Labels: maintenance
>
> Due to the opt-in nature of maintenance primitives in Mesos, there is a 
> deficiency for cluster administrators when frameworks have not opted in.
> An example case:
> - Cluster with reasonable churn (tasks terminate naturally)
> - Operator specifies maintenance schedule
> Ideally *even* in a world where none of the frameworks had opted in to 
> maintenance primitives the operator would have some way of preventing 
> frameworks from scheduling further work on agents in the schedule. The 
> natural termination of the tasks in the cluster would allow the nodes to 
> drain gracefully and the operator to then perform maintenance.
> 2 options that have been discussed so far:
> # Provide a capability for frameworks to automatically filter offers with an 
> {{Unavailability}} set.
> #* Pro: Finer grained control. Allows other frameworks to keep scheduling 
> short lived tasks that can complete before the Unavailability.
> #* Con: All frameworks have to be updated. Consider making this an 
> environment variable to the scheduler driver for legacy frameworks.
> # Provide a flag on the master to filter all offers with an 
> {{Unavailability}} set.
> #* Pro: Immediately actionable / usable.
> #* Con: Coarse grained. Some frameworks may suffer efficiency.
> #* Con: *Dangerous*: planning out a multi-day maintenance schedule for an 
> entire cluster will prevent any frameworks from scheduling further work, 
> potentially stalling the cluster.
> Action Items: Provide further context for each option and consider others. We 
> need to ensure we have something immediately consumable by users to fill the 
> gap until maintenance primitives are the norm. We also need to ensure we 
> prevent dangerous scenarios like the Con listed for option #2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6484) Memory leak in `Future::after()`

2017-01-05 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6484:

Shepherd: Joris Van Remoortere
  Sprint: Mesosphere Sprint 48

> Memory leak in `Future::after()`
> ---
>
> Key: MESOS-6484
> URL: https://issues.apache.org/jira/browse/MESOS-6484
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.1.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: libprocess, mesosphere
> Fix For: 1.2.0
>
>
> The problem arises when one tries to associate an {{after()}} call to copied 
> futures. The following test case is enough to reproduce the issue:
> {code}
> TEST(FutureTest, After3)
> {
>   auto policy = std::make_shared(0);
>   {
> auto generator = []() {
>   return Future();
> };
> Future future = generator()
>   .after(Milliseconds(1),
> [policy](const Future&) {
>return Nothing();
> });
> AWAIT_READY(future);
>   }
>   EXPECT_EQ(1, policy.use_count());
> }
> {code}
> In the test, one would expect that there is only one active reference to 
> {{policy}}, therefore the expectation {{EXPECT_EQ(1, policy.use_count())}}. 
> However, if after is triggered more than once, each extra call adds one 
> undeleted reference to {{policy}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6828) Consider ways for frameworks to ignore offers with an Unavailability

2016-12-22 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771179#comment-15771179
 ] 

Joris Van Remoortere commented on MESOS-6828:
-

An updated proposal to improve flexibility while still being easily consumable:
# Allow operators to specify a separate start time for when offers should stop 
being sent prior to the actual maintenance window.
# Add an opt-in capability for frameworks to be able to see offers during the 
period described in point #1

By controlling the time period during which offers are not sent out we are able 
to stagger them out based on the maintenance schedule and prevent the stalling 
scenario described in the ticket description.

> Consider ways for frameworks to ignore offers with an Unavailability
> 
>
> Key: MESOS-6828
> URL: https://issues.apache.org/jira/browse/MESOS-6828
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joris Van Remoortere
>Assignee: Artem Harutyunyan
>  Labels: maintenance
>
> Due to the opt-in nature of maintenance primitives in Mesos, there is a 
> deficiency for cluster administrators when frameworks have not opted in.
> An example case:
> - Cluster with reasonable churn (tasks terminate naturally)
> - Operator specifies maintenance schedule
> Ideally *even* in a world where none of the frameworks had opted in to 
> maintenance primitives the operator would have some way of preventing 
> frameworks from scheduling further work on agents in the schedule. The 
> natural termination of the tasks in the cluster would allow the nodes to 
> drain gracefully and the operator to then perform maintenance.
> 2 options that have been discussed so far:
> # Provide a capability for frameworks to automatically filter offers with an 
> {{Unavailability}} set.
> #* Pro: Finer grained control. Allows other frameworks to keep scheduling 
> short lived tasks that can complete before the Unavailability.
> #* Con: All frameworks have to be updated. Consider making this an 
> environment variable to the scheduler driver for legacy frameworks.
> # Provide a flag on the master to filter all offers with an 
> {{Unavailability}} set.
> #* Pro: Immediately actionable / usable.
> #* Con: Coarse grained. Some frameworks may suffer efficiency.
> #* Con: *Dangerous*: planning out a multi-day maintenance schedule for an 
> entire cluster will prevent any frameworks from scheduling further work, 
> potentially stalling the cluster.
> Action Items: Provide further context for each option and consider others. We 
> need to ensure we have something immediately consumable by users to fill the 
> gap until maintenance primitives are the norm. We also need to ensure we 
> prevent dangerous scenarios like the Con listed for option #2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6828) Consider ways for frameworks to ignore offers with an Unavailability

2016-12-21 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-6828:
---

 Summary: Consider ways for frameworks to ignore offers with an 
Unavailability
 Key: MESOS-6828
 URL: https://issues.apache.org/jira/browse/MESOS-6828
 Project: Mesos
  Issue Type: Improvement
Reporter: Joris Van Remoortere
Assignee: Artem Harutyunyan


Due to the opt-in nature of maintenance primitives in Mesos, there is a 
deficiency for cluster administrators when frameworks have not opted in.

An example case:
- Cluster with reasonable churn (tasks terminate naturally)
- Operator specifies maintenance schedule

Ideally *even* in a world where none of the frameworks had opted in to 
maintenance primitives the operator would have some way of preventing 
frameworks from scheduling further work on agents in the schedule. The natural 
termination of the tasks in the cluster would allow the nodes to drain 
gracefully and the operator to then perform maintenance.

2 options that have been discussed so far:
# Provide a capability for frameworks to automatically filter offers with an 
{{Unavailability}} set.
#* Pro: Finer grained control. Allows other frameworks to keep scheduling short 
lived tasks that can complete before the Unavailability.
#* Con: All frameworks have to be updated. Consider making this an environment 
variable to the scheduler driver for legacy frameworks.
# Provide a flag on the master to filter all offers with an {{Unavailability}} 
set.
#* Pro: Immediately actionable / usable.
#* Con: Coarse grained. Some frameworks may suffer efficiency.
#* Con: *Dangerous*: planning out a multi-day maintenance schedule for an 
entire cluster will prevent any frameworks from scheduling further work, 
potentially stalling the cluster.

Action Items: Provide further context for each option and consider others. We 
need to ensure we have something immediately consumable by users to fill the 
gap until maintenance primitives are the norm. We also need to ensure we 
prevent dangerous scenarios like the Con listed for option #2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6815) Enable glog stack traces when we call things like `ABORT` on Windows

2016-12-19 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6815:

Priority: Critical  (was: Major)

> Enable glog stack traces when we call things like `ABORT` on Windows
> 
>
> Key: MESOS-6815
> URL: https://issues.apache.org/jira/browse/MESOS-6815
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>Priority: Critical
>  Labels: microsoft, windows-mvp
>
> Currently in the Windows builds, if we call `ABORT` (etc.) we will simply 
> bail out, with no stack traces.
> This is highly undesirable. Stack traces are important for operating clusters 
> in production. We should work to enable this behavior, including possibly 
> working with glog to add this support if they currently they do not natively 
> support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4638) versioning preprocessor macros

2016-11-07 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4638:

Fix Version/s: 0.28.3

> versioning preprocessor macros
> --
>
> Key: MESOS-4638
> URL: https://issues.apache.org/jira/browse/MESOS-4638
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: James Peach
>Assignee: Zhitao Li
> Fix For: 0.28.3, 1.0.2, 1.1.0
>
>
> The macros in {{version.hpp}} cannot be used for conditional build because 
> they are strings not integers. It would be helpful to have integer versions 
> of these for conditionally building code against different versions of the 
> Mesos API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6502) _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.

2016-11-07 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6502:

Fix Version/s: 0.28.3

> _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java 
> binding.
> ---
>
> Key: MESOS-6502
> URL: https://issues.apache.org/jira/browse/MESOS-6502
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.28.3, 1.0.2, 1.1.0
>
>
> When the macros were re-assigned they were not flushed fully through the 
> codebase:
> https://github.com/apache/mesos/commit/6bc6a40a54491cfd733263cd3962e490b0b4bdbb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6502) _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.

2016-11-07 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645209#comment-15645209
 ] 

Joris Van Remoortere commented on MESOS-6502:
-

{{0.28.3}}
{code}
commit b0dd63ea35b4338dc365da7db6c79eb9731e8e8b
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}

> _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java 
> binding.
> ---
>
> Key: MESOS-6502
> URL: https://issues.apache.org/jira/browse/MESOS-6502
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.28.3, 1.0.2, 1.1.0
>
>
> When the macros were re-assigned they were not flushed fully through the 
> codebase:
> https://github.com/apache/mesos/commit/6bc6a40a54491cfd733263cd3962e490b0b4bdbb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4638) versioning preprocessor macros

2016-11-07 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645204#comment-15645204
 ] 

Joris Van Remoortere commented on MESOS-4638:
-

{{0.28.3}}
{code}
commit 6408c54e0327ab864d4e193814ee69bcd24985df
Author: Zhitao Li 
Date:   Wed Aug 17 09:34:27 2016 -0700

Introduce MESOS_{MAJOR|MINOR|PATCH}_VERSION_NUM macros.

This makes version based conditional compiling much easier for
module writers.

Review: https://reviews.apache.org/r/50992/
{code}

> versioning preprocessor macros
> --
>
> Key: MESOS-4638
> URL: https://issues.apache.org/jira/browse/MESOS-4638
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: James Peach
>Assignee: Zhitao Li
> Fix For: 0.28.3, 1.0.2, 1.1.0
>
>
> The macros in {{version.hpp}} cannot be used for conditional build because 
> they are strings not integers. It would be helpful to have integer versions 
> of these for conditionally building code against different versions of the 
> Mesos API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.

2016-11-01 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6457:

Target Version/s: 1.0.2, 1.1.0  (was: 1.1.0)

> Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.
> -
>
> Key: MESOS-6457
> URL: https://issues.apache.org/jira/browse/MESOS-6457
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Blocker
>
> A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if 
> for example it starts/stops passing a health check once it got into the 
> {{TASK_KILLING}} state.
> I think that this behaviour is counterintuitive. It also makes the life of 
> framework/tools developers harder, since they have to keep track of the 
> complete task status history in order to know if a task is being killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.

2016-11-01 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6457:

Target Version/s: 1.1.0  (was: 1.2.0)

> Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.
> -
>
> Key: MESOS-6457
> URL: https://issues.apache.org/jira/browse/MESOS-6457
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Blocker
>
> A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if 
> for example it starts/stops passing a health check once it got into the 
> {{TASK_KILLING}} state.
> I think that this behaviour is counterintuitive. It also makes the life of 
> framework/tools developers harder, since they have to keep track of the 
> complete task status history in order to know if a task is being killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6502) _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.

2016-10-28 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616903#comment-15616903
 ] 

Joris Van Remoortere edited comment on MESOS-6502 at 10/28/16 11:16 PM:


{{1.1.x}}
{code}
commit e105363a52e219a565acc91144788600eb0b9aeb
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}
{{1.0.2}}
{code}
commit 9b8c54282c5337e28d99bc0025661131bde2246f
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}


was (Author: jvanremoortere):
{{1.1.x}}
{code}
commit e105363a52e219a565acc91144788600eb0b9aeb
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}

> _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java 
> binding.
> ---
>
> Key: MESOS-6502
> URL: https://issues.apache.org/jira/browse/MESOS-6502
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.2, 1.1.0
>
>
> When the macros were re-assigned they were not flushed fully through the 
> codebase:
> https://github.com/apache/mesos/commit/6bc6a40a54491cfd733263cd3962e490b0b4bdbb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6502) _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.

2016-10-28 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616903#comment-15616903
 ] 

Joris Van Remoortere commented on MESOS-6502:
-

{{1.1.x}}
{code}
commit e105363a52e219a565acc91144788600eb0b9aeb
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}

> _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java 
> binding.
> ---
>
> Key: MESOS-6502
> URL: https://issues.apache.org/jira/browse/MESOS-6502
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.2, 1.1.0
>
>
> When the macros were re-assigned they were not flushed fully through the 
> codebase:
> https://github.com/apache/mesos/commit/6bc6a40a54491cfd733263cd3962e490b0b4bdbb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4638) versioning preprocessor macros

2016-10-28 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616900#comment-15616900
 ] 

Joris Van Remoortere commented on MESOS-4638:
-

{{1.0.2}}:
{code}
commit 5668d4ff2655f120ca3d66c509efa40e24d5faf3
Author: Zhitao Li 
Date:   Wed Aug 17 09:34:27 2016 -0700

Introduce MESOS_{MAJOR|MINOR|PATCH}_VERSION_NUM macros.

This makes version based conditional compiling much easier for
module writers.

Review: https://reviews.apache.org/r/50992/
{code}

> versioning preprocessor macros
> --
>
> Key: MESOS-4638
> URL: https://issues.apache.org/jira/browse/MESOS-4638
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: James Peach
>Assignee: Zhitao Li
> Fix For: 1.0.2, 1.1.0
>
>
> The macros in {{version.hpp}} cannot be used for conditional build because 
> they are strings not integers. It would be helpful to have integer versions 
> of these for conditionally building code against different versions of the 
> Mesos API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4638) versioning preprocessor macros

2016-10-28 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4638:

Fix Version/s: 1.0.2

> versioning preprocessor macros
> --
>
> Key: MESOS-4638
> URL: https://issues.apache.org/jira/browse/MESOS-4638
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: James Peach
>Assignee: Zhitao Li
> Fix For: 1.0.2, 1.1.0
>
>
> The macros in {{version.hpp}} cannot be used for conditional build because 
> they are strings not integers. It would be helpful to have integer versions 
> of these for conditionally building code against different versions of the 
> Mesos API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6502) _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.

2016-10-28 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6502:

Summary: _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in 
libmesos java binding.  (was: MESOS_{MAJOR,MINOR,PATCH}_VERSION incorrect in 
libmesos java binding)

> _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java 
> binding.
> ---
>
> Key: MESOS-6502
> URL: https://issues.apache.org/jira/browse/MESOS-6502
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> When the macros were re-assigned they were not flushed fully through the 
> codebase:
> https://github.com/apache/mesos/commit/6bc6a40a54491cfd733263cd3962e490b0b4bdbb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6407) Move DEFAULT_v1_xxx macros to the v1 namespace.

2016-10-20 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592835#comment-15592835
 ] 

Joris Van Remoortere commented on MESOS-6407:
-

{code}
commit e9da9b3bc41aa81c25d36901e52ff1e941fa09e6
Author: Joris Van Remoortere 
Date:   Mon Oct 17 23:15:21 2016 -0700

Split mesos test helpers into 'internal' and 'v1' namespaces.

Review: https://reviews.apache.org/r/52976

commit 2373819dc3e3f8b251526db962eecde23de1545b
Author: Joris Van Remoortere 
Date:   Tue Oct 18 20:54:41 2016 -0700

Removed unused tests helper macro 'DEFAULT_CONTAINER_ID'.

Review: https://reviews.apache.org/r/53014

commit 78d4ec406f7bee61eb5097bca91bf143d2f43f82
Author: Joris Van Remoortere 
Date:   Tue Oct 18 15:33:09 2016 -0700

Removed extra 'evolve' implementation from 'api_tests.cpp'.

Review: https://reviews.apache.org/r/53013

commit 7831f1fbace2ae868dd7dc80f4ddca459b9ffe19
Author: Joris Van Remoortere 
Date:   Tue Oct 18 16:18:25 2016 -0700

Fixed usage of 'evolve' in master http endpoints.

Review: https://reviews.apache.org/r/53012
{code}

> Move DEFAULT_v1_xxx macros to the v1 namespace.
> ---
>
> Key: MESOS-6407
> URL: https://issues.apache.org/jira/browse/MESOS-6407
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
> Fix For: 1.2.0
>
>
> We should clean up the existing {{DEFAULT_v1_*}} macros and bring it under 
> the {{v1}} namespace e.g., {{v1::DEFAULT_FRAMEWORK_INFO}}. This is necessary 
> for doing a larger cleanup i.e., we would like to introduce {{createXXX}} for 
> the {{v1}} API and would not like to add {{createV1XXX}} functions eventually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6343) Documentation Error: Default Executor does not implicitly construct resources

2016-10-07 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-6343:
---

 Summary: Documentation Error: Default Executor does not implicitly 
construct resources
 Key: MESOS-6343
 URL: https://issues.apache.org/jira/browse/MESOS-6343
 Project: Mesos
  Issue Type: Documentation
Reporter: Joris Van Remoortere
Priority: Blocker


https://github.com/apache/mesos/blob/d16f53d5a9e15d1d9533739a8c052bc546ec3262/include/mesos/v1/mesos.proto#L544-L546

This probably got carried forward from early design discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6315) `killtree` can accidentally kill containerizer / executor

2016-10-05 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-6315:
---

 Summary: `killtree` can accidentally kill containerizer / executor
 Key: MESOS-6315
 URL: https://issues.apache.org/jira/browse/MESOS-6315
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Joris Van Remoortere


The implementation of killtree is buggy. [~jieyu] has some ideas.

ltrace of mesos-local:
{code}
[pid 19501] [0x7f89d77a61ab] libmesos-1.1.0.so->kill(29985, SIGKILL)
   = 0
[pid 19501] [0x7f89d77a61ab] libmesos-1.1.0.so->kill(31349, SIGKILL 
[pid 31359] [0x] +++ killed by SIGKILL +++
[pid 31358] [0x] +++ killed by SIGKILL +++
[pid 31357] [0x] +++ killed by SIGKILL +++
[pid 31356] [0x] +++ killed by SIGKILL +++
[pid 31354] [0x] +++ killed by SIGKILL +++
[pid 31353] [0x] +++ killed by SIGKILL +++
[pid 31351] [0x] +++ killed by SIGKILL +++
[pid 31350] [0x] +++ killed by SIGKILL +++
[pid 19501] [0x7f89d77a61ab] <... kill resumed> )   
   = 0
[pid 19501] [0x7f89d77a61dd] libmesos-1.1.0.so->kill(29985, SIGCONT 
[pid 29985] [0x] +++ killed by SIGKILL +++
[pid 19493] [0x7f89d64ceda0] --- SIGCHLD (Child exited) ---
[pid 31352] [0x] +++ killed by SIGKILL +++
[pid 31349] [0x] +++ killed by SIGKILL +++
[pid 19501] [0x7f89d77a61dd] <... kill resumed> )   
   = 0
[pid 19501] [0x7f89d77a61dd] libmesos-1.1.0.so->kill(31349, SIGCONT)
   = -1
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6315) `killtree` can accidentally kill containerizer / executor

2016-10-05 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549602#comment-15549602
 ] 

Joris Van Remoortere commented on MESOS-6315:
-

Since {{killtree}} is only used in the posix containerizer this is not a 
blocker.

> `killtree` can accidentally kill containerizer / executor
> -
>
> Key: MESOS-6315
> URL: https://issues.apache.org/jira/browse/MESOS-6315
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Joris Van Remoortere
>
> The implementation of killtree is buggy. [~jieyu] has some ideas.
> ltrace of mesos-local:
> {code}
> [pid 19501] [0x7f89d77a61ab] libmesos-1.1.0.so->kill(29985, SIGKILL)  
>  = 0
> [pid 19501] [0x7f89d77a61ab] libmesos-1.1.0.so->kill(31349, SIGKILL  return ...>
> [pid 31359] [0x] +++ killed by SIGKILL +++
> [pid 31358] [0x] +++ killed by SIGKILL +++
> [pid 31357] [0x] +++ killed by SIGKILL +++
> [pid 31356] [0x] +++ killed by SIGKILL +++
> [pid 31354] [0x] +++ killed by SIGKILL +++
> [pid 31353] [0x] +++ killed by SIGKILL +++
> [pid 31351] [0x] +++ killed by SIGKILL +++
> [pid 31350] [0x] +++ killed by SIGKILL +++
> [pid 19501] [0x7f89d77a61ab] <... kill resumed> ) 
>  = 0
> [pid 19501] [0x7f89d77a61dd] libmesos-1.1.0.so->kill(29985, SIGCONT  return ...>
> [pid 29985] [0x] +++ killed by SIGKILL +++
> [pid 19493] [0x7f89d64ceda0] --- SIGCHLD (Child exited) ---
> [pid 31352] [0x] +++ killed by SIGKILL +++
> [pid 31349] [0x] +++ killed by SIGKILL +++
> [pid 19501] [0x7f89d77a61dd] <... kill resumed> ) 
>  = 0
> [pid 19501] [0x7f89d77a61dd] libmesos-1.1.0.so->kill(31349, SIGCONT)  
>  = -1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6264) Investigate the high memory usage of the default executor.

2016-10-05 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549111#comment-15549111
 ] 

Joris Van Remoortere edited comment on MESOS-6264 at 10/5/16 3:45 PM:
--

cc [~vinodkone][~jieyu]
The bulk of this comes from loading in {{libmesos.so}}
We do this because the autoconf build treats libmesos as a dynamic dependency.
Since we load libmesos dynamically, there is no chance for the linker to strip 
unused code. This means that all of the code in libmesos regardless of use gets 
loaded into resident memory.
In contrast the cmake build generates a static library for {{libmesos.a}}. This 
is then used to build the {{mesos-executor}} binary without a dynamic 
dependency on libmesos. The benefit of this approach is that the linker is able 
to strip out all unused code. In an optimized build this is {{~10MB}}.

Some approaches for the quick win are:
# Consider using the cmake build. This only needs to be modified slightly to 
strip symbols from the final executor binary {{-s}}.
# Modiy the autoconf build to build a {{libmesos.a}} so that we can statically 
link it in to the {{mesos-executor}} binary and allow the linker to strip 
unused code.

Regardless of the above approach, {{libmesos}} would still be by far the 
largest contributor of the {{RSS}}. This is for 2 reasons:
# Much of our code is structured such that the linker can't determine if it is 
unused. We would need to adjust our patterns such that the unused code analyzer 
can do a better job.
# Much of our code is {{inlined}} or written such that it can't be optimized. 2 
examples are:
## 
https://github.com/apache/mesos/blob/9beb8eae6408249cdb3e2f16ba68b31a00d3452c/3rdparty/libprocess/include/process/mime.hpp#L35-L154
This code could be moved to a {{.cpp}} file and should be a {{static const 
std::unordered_map}} that we {{insert(begin(), end())}} into 
{{types}}. This would reduce the size of libmesos by {{~20KB}}!
## 
https://github.com/apache/mesos/blob/9beb8eae6408249cdb3e2f16ba68b31a00d3452c/3rdparty/libprocess/include/process/http.hpp#L453-L517
This code and sibling {{struct Request}} have auto-generated {{inlined}} 
destructors. These are very expensive. Just declaring and then defining in the 
{{.cpp}} the default destructor can remove another {{~20KB}} each from 
libmesos. There are plenty of other opportunities like this scattered through 
the codebase. It's work to find them and the returns are small for each, but 
end up adding to much of the {{9MB}} left over.


was (Author: jvanremoortere):
cc [~vinodkone][~jieyu]
The bulk of this comes from loading in {{libmesos.so}}
We do this because the autoconf build treats libmesos as a dynamic dependency.
Since we load libmesos dynamically, there is no chance for the linker to strip 
unused code. This means that all of the code in libmesos regardless of use gets 
loaded into resident memory.
In contrast the cmake build generates a static library for {{libmesos.a}}. This 
is then used to build the {{mesos-executor}} binary without a dynamic 
dependency on libmesos. The benefit of this approach is that the linker is able 
to strip out all unused code. In an optimized build this is {{~10MB}}.

Some approaches for the quick win are:
# Consider using the cmake build. This only needs to be modified slightly to 
strip symbols from the final executor binary {{-s}}.
# Modiy the autoconf build to build a {{libmesos.a}} so that we can statically 
link it in to the {{mesos-executor}} binary and allow the linker to strip 
unused code.

Regardless of the above approach, {{libmesos}} would still be by far the 
largest contributor of the {{RSS}}. This is for 2 reasons:
# Much of our code is structured such that the linker can't determine if it is 
unused. We would need to adjust our patterns such that the unused code analyzer 
can do a better job.
# Much of our code is {{inlined}} or written such that it can't be optimized. 2 
examples are:
## 
https://github.com/apache/mesos/blob/9beb8eae6408249cdb3e2f16ba68b31a00d3452c/3rdparty/libprocess/include/process/mime.hpp#L35-L154
This code could be moved to a {{.cpp}} file and should be a {{static const 
std::unordered_map}} that we {{insert(begin(), end())}} into 
{{types}}. This would reduce the size of libmesos by {{~20KB}}!
## 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/http.hpp#L453-L517
This code and sibling {{struct Request}} have auto-generated {{inlined}} 
destructors. These are very expensive. Just declaring and then defining in the 
{{.cpp}} the default destructor can remove another {{~20KB}} each from 
libmesos. There are plenty of other opportunities like this scattered through 
the codebase. It's work to find them and the returns are small for each, but 
end up adding to much of the {{9MB}} left over.

> Investigate the high memory usage of the default 

[jira] [Commented] (MESOS-6264) Investigate the high memory usage of the default executor.

2016-10-05 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549111#comment-15549111
 ] 

Joris Van Remoortere commented on MESOS-6264:
-

cc [~vinodkone][~jieyu]
The bulk of this comes from loading in {{libmesos.so}}
We do this because the autoconf build treats libmesos as a dynamic dependency.
Since we load libmesos dynamically, there is no chance for the linker to strip 
unused code. This means that all of the code in libmesos regardless of use gets 
loaded into resident memory.
In contrast the cmake build generates a static library for {{libmesos.a}}. This 
is then used to build the {{mesos-executor}} binary without a dynamic 
dependency on libmesos. The benefit of this approach is that the linker is able 
to strip out all unused code. In an optimized build this is {{~10MB}}.

Some approaches for the quick win are:
# Consider using the cmake build. This only needs to be modified slightly to 
strip symbols from the final executor binary {{-s}}.
# Modiy the autoconf build to build a {{libmesos.a}} so that we can statically 
link it in to the {{mesos-executor}} binary and allow the linker to strip 
unused code.

Regardless of the above approach, {{libmesos}} would still be by far the 
largest contributor of the {{RSS}}. This is for 2 reasons:
# Much of our code is structured such that the linker can't determine if it is 
unused. We would need to adjust our patterns such that the unused code analyzer 
can do a better job.
# Much of our code is {{inlined}} or written such that it can't be optimized. 2 
examples are:
## 
https://github.com/apache/mesos/blob/9beb8eae6408249cdb3e2f16ba68b31a00d3452c/3rdparty/libprocess/include/process/mime.hpp#L35-L154
This code could be moved to a {{.cpp}} file and should be a {{static const 
std::unordered_map}} that we {{insert(begin(), end())}} into 
{{types}}. This would reduce the size of libmesos by {{~20KB}}!
## 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/http.hpp#L453-L517
This code and sibling {{struct Request}} have auto-generated {{inlined}} 
destructors. These are very expensive. Just declaring and then defining in the 
{{.cpp}} the default destructor can remove another {{~20KB}} each from 
libmesos. There are plenty of other opportunities like this scattered through 
the codebase. It's work to find them and the returns are small for each, but 
end up adding to much of the {{9MB}} left over.

> Investigate the high memory usage of the default executor.
> --
>
> Key: MESOS-6264
> URL: https://issues.apache.org/jira/browse/MESOS-6264
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 1.1.0
>
> Attachments: pmap_output_for_the_default_executor.txt
>
>
> It seems that a default executor with two sleep tasks is using ~32 mb on 
> average and can sometimes lead to it being killed for some tests like 
> {{SlaveRecoveryTest/0.ROOT_CGROUPS_ReconnectDefaultExecutor}} on our internal 
> CI. Attached the {{pmap}} output for the default executor. Please note that 
> the command executor memory usage is also pretty high (~26 mb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6247) Enable Framework to set weight

2016-10-05 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548652#comment-15548652
 ] 

Joris Van Remoortere commented on MESOS-6247:
-

[~klaus1982]Do you mean they can not share reserved resources with each other?

If they are in the same role they are supposed to be co-operative. At that 
point why does the weight matter? They should both be yielding all unavailable 
resources to each other.

If we add support for weights now it will make it *even* harder to move people 
into the hierarchical role world described by benm. It seems like the 
frameworks co-operating (as they should per the contract of sharing a role) is 
the right temporary solution for you.

> Enable Framework to set weight
> --
>
> Key: MESOS-6247
> URL: https://issues.apache.org/jira/browse/MESOS-6247
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
> Environment: all
>Reporter: Klaus Ma
>Priority: Critical
>
> We'd like to enable framework's weight when it register. So the framework can 
> share resources based on weight within the same role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6249) On Mesos master failover the reregistered callback is not triggered

2016-10-05 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548632#comment-15548632
 ] 

Joris Van Remoortere commented on MESOS-6249:
-

[~markusjura] It seems like you are hitting some logic around 
https://issues.apache.org/jira/browse/MESOS-786
You can see the comment here.
https://github.com/apache/mesos/blob/b70a22bad22e5e8668f9af62c575902dec7b0125/src/master/master.cpp#L2813-L2820

pinging [~bmahler] who wrote the comment, and [~anandmazumdar] for reference.

> On Mesos master failover the reregistered callback is not triggered
> ---
>
> Key: MESOS-6249
> URL: https://issues.apache.org/jira/browse/MESOS-6249
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 0.28.0, 0.28.1, 1.0.1
> Environment: OS X 10.11.6
>Reporter: Markus Jura
>
> On a Mesos master failover the reregistered callback of the Java API is not 
> triggered. Only the registration callback is triggered which makes it hard 
> for a framework to distinguish between these scenarios.
> This behaviour has been tested with the ConductR framework, both with the 
> Java API version 0.28.0, 0.28.1 and 1.0.1. Below you find the logs from the 
> master that got re-elected and from the ConductR framework.
> *Log: Mesos master on a master re-election*
> {code:bash}
> I0926 11:44:20.008306 3747840 zookeeper.cpp:259] A new leading master 
> (UPID=master@127.0.0.1:5050) is detected
> I0926 11:44:20.008458 3747840 master.cpp:1847] The newly elected leader is 
> master@127.0.0.1:5050 with id ca5b9713-1eec-43e1-9d27-9ebc5c0f95b1
> I0926 11:44:20.008484 3747840 master.cpp:1860] Elected as the leading master!
> I0926 11:44:20.008498 3747840 master.cpp:1547] Recovering from registrar
> I0926 11:44:20.008607 3747840 registrar.cpp:332] Recovering registrar
> I0926 11:44:20.016340 4284416 registrar.cpp:365] Successfully fetched the 
> registry (0B) in 7.702016ms
> I0926 11:44:20.016393 4284416 registrar.cpp:464] Applied 1 operations in 
> 12us; attempting to update the 'registry'
> I0926 11:44:20.021428 4284416 registrar.cpp:509] Successfully updated the 
> 'registry' in 5.019904ms
> I0926 11:44:20.021481 4284416 registrar.cpp:395] Successfully recovered 
> registrar
> I0926 11:44:20.021611 528384 master.cpp:1655] Recovered 0 agents from the 
> Registry (118B) ; allowing 10mins for agents to re-register
> I0926 11:44:20.536859 3747840 master.cpp:2424] Received SUBSCRIBE call for 
> framework 'conductr' at 
> scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> I0926 11:44:20.536969 3747840 master.cpp:2500] Subscribing framework conductr 
> with checkpointing disabled and capabilities [  ]
> I0926 11:44:20.537401 3211264 hierarchical.cpp:271] Added framework conductr
> I0926 11:44:20.807895 528384 master.cpp:4787] Re-registering agent 
> b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.808145 1601536 registrar.cpp:464] Applied 1 operations in 
> 38us; attempting to update the 'registry'
> I0926 11:44:20.815757 1601536 registrar.cpp:509] Successfully updated the 
> 'registry' in 7.568896ms
> I0926 11:44:20.815992 3747840 master.cpp:7447] Adding task 
> 6abce9bb-895f-4f6f-be5b-25f6bd09f548 with resources mem(*):0 on agent 
> b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1)
> I0926 11:44:20.816339 3747840 master.cpp:4872] Re-registered agent 
> b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 
> (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; 
> ports(*):[31000-32000]
> I0926 11:44:20.816385 1601536 hierarchical.cpp:478] Added agent 
> b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1) with cpus(*):8; 
> mem(*):15360; disk(*):470832; ports(*):[31000-32000] (allocated: cpus(*):0.9; 
> mem(*):402.653; disk(*):1000; ports(*):[31000-31000, 31001-31500])
> I0926 11:44:20.816437 3747840 master.cpp:4940] Sending updated checkpointed 
> resources  to agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at 
> slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.816787 4284416 master.cpp:5725] Sending 1 offers to framework 
> conductr (conductr) at 
> scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> {code}
> *Log: ConductR framework*
> {code:bash}
> I0926 11:44:20.007189 66441216 detector.cpp:152] Detected a new leader: 
> (id='87')
> I0926 11:44:20.007524 64294912 group.cpp:706] Trying to get 
> '/mesos/json.info_87' in ZooKeeper
> I0926 11:44:20.008625 63758336 zookeeper.cpp:259] A new leading master 
> (UPID=master@127.0.0.1:5050) is detected
> I0926 11:44:20.008965 63758336 sched.cpp:330] New master detected at 
> master@127.0.0.1:5050
> 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient 
> [sourceThread=conductr-akka.actor.default-dispatcher-2, 
> 

[jira] [Commented] (MESOS-6311) Consider supporting implicit reconciliation per agent

2016-10-04 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546209#comment-15546209
 ] 

Joris Van Remoortere commented on MESOS-6311:
-

cc [~anandmazumdar] [~neilconway] [~vinodkone]

> Consider supporting implicit reconciliation per agent
> -
>
> Key: MESOS-6311
> URL: https://issues.apache.org/jira/browse/MESOS-6311
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Joris Van Remoortere
>
> Currently mesos only supports:
> - total implicit reconciliation
> - explicit reconciliation per task
> Since agent can slowly rejoin the master after a master failover, it is hard 
> to have a low time bound on implicit reconciliation for tasks.
> Performing the current implicit reconciliation is expensive on big clusters 
> so it should not be done every N seconds.
> If we could perform implicit reconciliation for a particular agent, then it 
> would be cheap enough to after we notice that particular agent rejoining the 
> cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6311) Consider supporting implicit reconciliation per agent

2016-10-04 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-6311:
---

 Summary: Consider supporting implicit reconciliation per agent
 Key: MESOS-6311
 URL: https://issues.apache.org/jira/browse/MESOS-6311
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Joris Van Remoortere


Currently mesos only supports:
- total implicit reconciliation
- explicit reconciliation per task

Since agent can slowly rejoin the master after a master failover, it is hard to 
have a low time bound on implicit reconciliation for tasks.
Performing the current implicit reconciliation is expensive on big clusters so 
it should not be done every N seconds.
If we could perform implicit reconciliation for a particular agent, then it 
would be cheap enough to after we notice that particular agent rejoining the 
cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4948) Move maintenance tests to use the new scheduler library interface.

2016-09-30 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536930#comment-15536930
 ] 

Joris Van Remoortere commented on MESOS-4948:
-

[~ipronin] [~anandmazumdar] will shepherd for you as he introduced the 
abstraction.

> Move maintenance tests to use the new scheduler library interface.
> --
>
> Key: MESOS-4948
> URL: https://issues.apache.org/jira/browse/MESOS-4948
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: Ubuntu 14.04, using gcc, with libevent and SSL enabled 
> (on ASF CI)
>Reporter: Greg Mann
>Assignee: Ilya Pronin
>  Labels: flaky-test, maintenance, mesosphere, newbie
>
> We need to move the existing maintenance tests to use the new scheduler 
> interface. We have already moved 1 test 
> {{MasterMaintenanceTest.PendingUnavailabilityTest}} to use the new interface. 
> It would be good to move the other 2 remaining tests to the new interface 
> since it can lead to failures around the stack object being referenced after 
> has been already destroyed. Detailed log from an ASF CI build failure.
> {code}
> [ RUN  ] MasterMaintenanceTest.InverseOffers
> I0315 04:16:50.786032  2681 leveldb.cpp:174] Opened db in 125.361171ms
> I0315 04:16:50.836374  2681 leveldb.cpp:181] Compacted db in 50.254411ms
> I0315 04:16:50.836470  2681 leveldb.cpp:196] Created db iterator in 25917ns
> I0315 04:16:50.836488  2681 leveldb.cpp:202] Seeked to beginning of db in 
> 3291ns
> I0315 04:16:50.836498  2681 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 253ns
> I0315 04:16:50.836549  2681 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0315 04:16:50.837474  2702 recover.cpp:447] Starting replica recovery
> I0315 04:16:50.837565  2681 cluster.cpp:183] Creating default 'local' 
> authorizer
> I0315 04:16:50.838191  2702 recover.cpp:473] Replica is in EMPTY status
> I0315 04:16:50.839532  2704 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (4784)@172.17.0.4:39845
> I0315 04:16:50.839754  2705 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0315 04:16:50.841893  2704 recover.cpp:564] Updating replica status to 
> STARTING
> I0315 04:16:50.842566  2703 master.cpp:376] Master 
> c326bc68-2581-48d4-9dc4-0d6f270bdda1 (01fcd642f65f) started on 
> 172.17.0.4:39845
> I0315 04:16:50.842644  2703 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_http="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/DE2Uaw/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.29.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/DE2Uaw/master" --zk_session_timeout="10secs"
> I0315 04:16:50.843168  2703 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I0315 04:16:50.843227  2703 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0315 04:16:50.843302  2703 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/DE2Uaw/credentials'
> I0315 04:16:50.843737  2703 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0315 04:16:50.843969  2703 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0315 04:16:50.844177  2703 master.cpp:571] Authorization enabled
> I0315 04:16:50.844360  2708 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0315 04:16:50.844430  2708 whitelist_watcher.cpp:77] No whitelist given
> I0315 04:16:50.848227  2703 master.cpp:1806] The newly elected leader is 
> master@172.17.0.4:39845 with id c326bc68-2581-48d4-9dc4-0d6f270bdda1
> I0315 04:16:50.848269  2703 master.cpp:1819] Elected as the leading master!
> I0315 04:16:50.848292  2703 master.cpp:1508] Recovering from registrar
> I0315 04:16:50.848563  2703 registrar.cpp:307] Recovering registrar
> I0315 04:16:50.876277  2711 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 34.178445ms
> I0315 04:16:50.876365  2711 replica.cpp:320] Persisted replica status to 

[jira] [Updated] (MESOS-4948) Move maintenance tests to use the new scheduler library interface.

2016-09-30 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4948:

Shepherd: Anand Mazumdar

> Move maintenance tests to use the new scheduler library interface.
> --
>
> Key: MESOS-4948
> URL: https://issues.apache.org/jira/browse/MESOS-4948
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: Ubuntu 14.04, using gcc, with libevent and SSL enabled 
> (on ASF CI)
>Reporter: Greg Mann
>Assignee: Ilya Pronin
>  Labels: flaky-test, maintenance, mesosphere, newbie
>
> We need to move the existing maintenance tests to use the new scheduler 
> interface. We have already moved 1 test 
> {{MasterMaintenanceTest.PendingUnavailabilityTest}} to use the new interface. 
> It would be good to move the other 2 remaining tests to the new interface 
> since it can lead to failures around the stack object being referenced after 
> has been already destroyed. Detailed log from an ASF CI build failure.
> {code}
> [ RUN  ] MasterMaintenanceTest.InverseOffers
> I0315 04:16:50.786032  2681 leveldb.cpp:174] Opened db in 125.361171ms
> I0315 04:16:50.836374  2681 leveldb.cpp:181] Compacted db in 50.254411ms
> I0315 04:16:50.836470  2681 leveldb.cpp:196] Created db iterator in 25917ns
> I0315 04:16:50.836488  2681 leveldb.cpp:202] Seeked to beginning of db in 
> 3291ns
> I0315 04:16:50.836498  2681 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 253ns
> I0315 04:16:50.836549  2681 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0315 04:16:50.837474  2702 recover.cpp:447] Starting replica recovery
> I0315 04:16:50.837565  2681 cluster.cpp:183] Creating default 'local' 
> authorizer
> I0315 04:16:50.838191  2702 recover.cpp:473] Replica is in EMPTY status
> I0315 04:16:50.839532  2704 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (4784)@172.17.0.4:39845
> I0315 04:16:50.839754  2705 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0315 04:16:50.841893  2704 recover.cpp:564] Updating replica status to 
> STARTING
> I0315 04:16:50.842566  2703 master.cpp:376] Master 
> c326bc68-2581-48d4-9dc4-0d6f270bdda1 (01fcd642f65f) started on 
> 172.17.0.4:39845
> I0315 04:16:50.842644  2703 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_http="true" 
> --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/DE2Uaw/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.29.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/DE2Uaw/master" --zk_session_timeout="10secs"
> I0315 04:16:50.843168  2703 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I0315 04:16:50.843227  2703 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0315 04:16:50.843302  2703 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/DE2Uaw/credentials'
> I0315 04:16:50.843737  2703 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0315 04:16:50.843969  2703 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0315 04:16:50.844177  2703 master.cpp:571] Authorization enabled
> I0315 04:16:50.844360  2708 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0315 04:16:50.844430  2708 whitelist_watcher.cpp:77] No whitelist given
> I0315 04:16:50.848227  2703 master.cpp:1806] The newly elected leader is 
> master@172.17.0.4:39845 with id c326bc68-2581-48d4-9dc4-0d6f270bdda1
> I0315 04:16:50.848269  2703 master.cpp:1819] Elected as the leading master!
> I0315 04:16:50.848292  2703 master.cpp:1508] Recovering from registrar
> I0315 04:16:50.848563  2703 registrar.cpp:307] Recovering registrar
> I0315 04:16:50.876277  2711 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 34.178445ms
> I0315 04:16:50.876365  2711 replica.cpp:320] Persisted replica status to 
> STARTING
> I0315 04:16:50.876776  2711 recover.cpp:473] Replica is in STARTING status
> I0315 

[jira] [Updated] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-09-23 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6237:

Summary: Slave Sandbox inaccessible when using IPv6 address in patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6  (was: Agent Sandbox 
inaccessible when using IPv6 address in patch from 
https://github.com/lava/mesos/tree/bennoe/ipv6)

> Slave Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible. The 
> problem seems to be that there's no brackets around the IP so it tries to 
> access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6122) Mesos slave throws systemd errors even when passed a flag to disable systemd

2016-09-06 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467888#comment-15467888
 ] 

Joris Van Remoortere commented on MESOS-6122:
-

These are just logging statements. The rest of the mesos code will execute just 
the same.
All this patch will do is remove those logging lines. I appreciate that may be 
all you want, just want to be sure there are no other issues :-)

> Mesos slave throws systemd errors even when passed a flag to disable systemd
> 
>
> Key: MESOS-6122
> URL: https://issues.apache.org/jira/browse/MESOS-6122
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
>Reporter: Gennady Feldman
>Assignee: Jie Yu
> Fix For: 1.1.0, 1.0.2
>
>
> Seems like the code in slave/main.cpp is logically in the wrong order:
> #ifdef __linux__
>   // Initialize systemd if it exists.
> if (systemd::exists() && flags.systemd_enable_support) {
> Lines 339-341: 
> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L341
> The flags should come first before the systemd::exists() check runs.Currently 
> the systemd.exists() always runs and there's no way to disable that check 
> from running in mesos-slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6122) Mesos slave throws systemd errors even when passed a flag to disable systemd

2016-09-06 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467807#comment-15467807
 ] 

Joris Van Remoortere commented on MESOS-6122:
-

The point of the code is to check if systemd exists. It should never error out, 
just return {{true}} / {{false}}.
Can you please provide the error that you are encountering?

> Mesos slave throws systemd errors even when passed a flag to disable systemd
> 
>
> Key: MESOS-6122
> URL: https://issues.apache.org/jira/browse/MESOS-6122
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
>Reporter: Gennady Feldman
>Assignee: Jie Yu
> Fix For: 1.1.0, 1.0.2
>
>
> Seems like the code in slave/main.cpp is logically in the wrong order:
> #ifdef __linux__
>   // Initialize systemd if it exists.
> if (systemd::exists() && flags.systemd_enable_support) {
> Lines 339-341: 
> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L341
> The flags should come first before the systemd::exists() check runs.Currently 
> the systemd.exists() always runs and there's no way to disable that check 
> from running in mesos-slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6122) Mesos slave throws systemd errors even when passed a flag to disable systemd

2016-09-06 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467768#comment-15467768
 ] 

Joris Van Remoortere commented on MESOS-6122:
-

[~jieyu] This change looks ok.
[~gena01] Can you please provide logs for the errors you ran into? I don't 
understand how the logical order evaluation here is a {{bug}} unless you are 
running into an error during the {{exists}} check. If so can you please augment 
this ticket with that information? At this point all we are doing is masking 
that problem.
Otherwise this is purely an optimization.

> Mesos slave throws systemd errors even when passed a flag to disable systemd
> 
>
> Key: MESOS-6122
> URL: https://issues.apache.org/jira/browse/MESOS-6122
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
>Reporter: Gennady Feldman
>Assignee: Jie Yu
> Fix For: 1.1.0, 1.0.2
>
>
> Seems like the code in slave/main.cpp is logically in the wrong order:
> #ifdef __linux__
>   // Initialize systemd if it exists.
> if (systemd::exists() && flags.systemd_enable_support) {
> Lines 339-341: 
> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L341
> The flags should come first before the systemd::exists() check runs.Currently 
> the systemd.exists() always runs and there's no way to disable that check 
> from running in mesos-slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1474) Provide cluster maintenance primitives for operators.

2016-08-24 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435880#comment-15435880
 ] 

Joris Van Remoortere commented on MESOS-1474:
-

To help clarify: The new offers have an explicit unavailability in them that 
indicates how long the agent will still be up. New tasks scheduled there should 
be able to complete prior to that time point.

> Provide cluster maintenance primitives for operators.
> -
>
> Key: MESOS-1474
> URL: https://issues.apache.org/jira/browse/MESOS-1474
> Project: Mesos
>  Issue Type: Epic
>  Components: framework, master, slave
>Reporter: Benjamin Mahler
>Assignee: Artem Harutyunyan
>  Labels: mesosphere, twitter
>
> Sometimes operators need to perform maintenance on a mesos cluster; we define 
> maintenance here as anything that requires the tasks to be drained on the 
> slave(s). Most mesos upgrades can be done without affecting running tasks, 
> but there are situations where maintenance is task-affecting:
> * Host maintenance (e.g. hardware repair, kernel upgrades).
> * Non-recoverable slave upgrades (e.g. adjusting slave attributes).
> * etc
> In order to ensure operators don’t violate frameworks’ SLAs, schedulers need 
> to be aware of planned unavailability events.
> Maintenance awareness allows schedulers to avoid churn for long running tasks 
> by placing them on machines not undergoing maintenance. If all resources are 
> planned for maintenance, then the scheduler will prefer machines scheduled 
> for maintenance least imminently.
> Maintenance awareness is also crucial when a scheduler uses [persistent 
> disk|https://issues.apache.org/jira/browse/MESOS-1554] resources, to ensure 
> that the scheduler is aware of the expected duration of unavailability for a 
> persistent disk resource (e.g. using 3 1TB replicas, don’t need to replicate 
> 1TB over the network when only 1 of the 3 replicas is going to be unavailable 
> for a reboot (< 1 hour)).
> There are a few primitives of interest here:
> * Provide a way for operators to [fully shutdown a 
> slave|https://issues.apache.org/jira/browse/MESOS-1475] (killing all tasks 
> underneath it). Colloquially known as a "hard drain".
> * Provide a way for operators to mark specific slaves as scheduled for 
> maintenance. This will inform the scheduler about the scheduled 
> unavailability of the resources.
> * Provide a way for frameworks to be notified when resources are requested to 
> be relinquished. This gives the framework to proactively move a task before 
> it may be forcibly killed by an operator. It also allows the automation of 
> operations like: "please drain these slaves within 1 hour."
> See the [design 
> doc|https://docs.google.com/a/twitter.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit#]
>  for the latest details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4694) DRFAllocator takes very long to allocate resources with a large number of frameworks

2016-08-04 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408562#comment-15408562
 ] 

Joris Van Remoortere commented on MESOS-4694:
-

{code}
commit e859d3ae8d8ff7349327b9e6a89edd6f98d2b7a1
Author: Dario Rexin 
Date:   Thu Aug 4 17:12:10 2016 -0400

Removed frameworks with suppressed offers from DRFSorter.

This patch removes frameworks with suppressed offers from the sorter to
reduce time spent in sorting. The allocations will remain in the sorter,
so no data is lost and the numbers are still correct. When a framework
revives offers, it will be re-added to the sorter.

Review: https://reviews.apache.org/r/43666/
{code}

> DRFAllocator takes very long to allocate resources with a large number of 
> frameworks
> 
>
> Key: MESOS-4694
> URL: https://issues.apache.org/jira/browse/MESOS-4694
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Affects Versions: 0.26.0, 0.27.0, 0.27.1, 0.27.2, 0.28.0, 0.28.1
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>
> With a growing number of connected frameworks, the allocation time grows to 
> very high numbers. The addition of quota in 0.27 had an additional impact on 
> these numbers. Running `mesos-tests.sh --benchmark 
> --gtest_filter=HierarchicalAllocator_BENCHMARK_Test.DeclineOffers` gives us 
> the following numbers:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.921202secs to make 200 offers
> round 1 allocate took 2.85045secs to make 200 offers
> round 2 allocate took 2.823768secs to make 200 offers
> {noformat}
> Increasing the number of frameworks to 2000:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 28.209454secs to make 2000 offers
> round 1 allocate took 28.469419secs to make 2000 offers
> round 2 allocate took 28.138086secs to make 2000 offers
> {noformat}
> I was able to reduce this time by a substantial amount. After applying the 
> patches:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 1.016226secs to make 2000 offers
> round 1 allocate took 1.102729secs to make 2000 offers
> round 2 allocate took 1.102624secs to make 2000 offers
> {noformat}
> And with 2000 frameworks:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 12.563203secs to make 2000 offers
> round 1 allocate took 12.437517secs to make 2000 offers
> round 2 allocate took 12.470708secs to make 2000 offers
> {noformat}
> The patches do 3 things to improve the performance of the allocator.
> 1) The total values in the DRFSorter will be pre calculated per resource type
> 2) In the allocate method, when no resources are available to allocate, we 
> break out of the innermost loop to prevent looping over a large number of 
> frameworks when we have nothing to allocate
> 3) when a framework suppresses offers, we remove it from the sorter instead 
> of just calling continue in the allocation loop - this greatly improves 
> performance in the sorter and prevents looping over frameworks that don't 
> need resources
> Assuming that most of the frameworks behave nicely and suppress offers when 
> they have nothing to schedule, it is fair to assume, that point 3) has the 
> biggest impact on the performance. If we suppress offers for 90% of the 
> frameworks in the benchmark test, we see following numbers:
> {noformat}
> ==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 200 slaves and 2000 frameworks
> round 0 allocate took 11626us to make 200 offers
> round 1 allocate took 22890us to make 200 offers
> round 2 allocate took 21346us to make 200 offers
> {noformat}
> And for 

[jira] [Comment Edited] (MESOS-5983) Number of libprocess worker threads is not configurable for log-rotation module.

2016-08-03 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406594#comment-15406594
 ] 

Joris Van Remoortere edited comment on MESOS-5983 at 8/3/16 8:55 PM:
-

https://reviews.apache.org/r/50766/


was (Author: jvanremoortere):
https://github.com/dcos/dcos/pull/483
depends on
https://reviews.apache.org/r/50766/

> Number of libprocess worker threads is not configurable for log-rotation 
> module.
> 
>
> Key: MESOS-5983
> URL: https://issues.apache.org/jira/browse/MESOS-5983
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 1.0.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5983) Number of libprocess worker threads is not configurable for log-rotation module.

2016-08-03 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406594#comment-15406594
 ] 

Joris Van Remoortere commented on MESOS-5983:
-

https://github.com/dcos/dcos/pull/483
depends on
https://reviews.apache.org/r/50766/

> Number of libprocess worker threads is not configurable for log-rotation 
> module.
> 
>
> Key: MESOS-5983
> URL: https://issues.apache.org/jira/browse/MESOS-5983
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 1.0.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5983) Number of libprocess worker threads is not configurable for log-rotation module.

2016-08-03 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-5983:

Description: (was: https://github.com/dcos/dcos/pull/483)

> Number of libprocess worker threads is not configurable for log-rotation 
> module.
> 
>
> Key: MESOS-5983
> URL: https://issues.apache.org/jira/browse/MESOS-5983
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 1.0.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5983) Number of libprocess worker threads is not configurable for log-rotation module.

2016-08-03 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-5983:
---

 Summary: Number of libprocess worker threads is not configurable 
for log-rotation module.
 Key: MESOS-5983
 URL: https://issues.apache.org/jira/browse/MESOS-5983
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Affects Versions: 1.0.0
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5943) Incremental http parsing of URLs leads to decoder error

2016-08-02 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404647#comment-15404647
 ] 

Joris Van Remoortere edited comment on MESOS-5943 at 8/2/16 9:05 PM:
-

{code}
commit 2776a09cbcd836080241a5ad8c1e003984e5a146
Author: Joris Van Remoortere 
Date:   Sat Jul 30 12:58:28 2016 -0700

Libprocess: Fixed decoder to support incremental URL parsing.

Review: https://reviews.apache.org/r/50634
{code}


was (Author: jvanremoortere):
{code}
commit f291d5023e9f2e471c11d4f20590901d9bfc1de4
Author: Joris Van Remoortere 
Date:   Mon Aug 1 17:14:37 2016 -0700

Libprocess: Removed old http_parser code.

We remove the code that supported the `HTTP_PARSER_VERSION_MAJOR` < 2
path.

Review: https://reviews.apache.org/r/50683

commit 2776a09cbcd836080241a5ad8c1e003984e5a146
Author: Joris Van Remoortere 
Date:   Sat Jul 30 12:58:28 2016 -0700

Libprocess: Fixed decoder to support incremental URL parsing.

Review: https://reviews.apache.org/r/50634
{code}

> Incremental http parsing of URLs leads to decoder error
> ---
>
> Key: MESOS-5943
> URL: https://issues.apache.org/jira/browse/MESOS-5943
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, scheduler driver
>Affects Versions: 1.0.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.28.3, 1.0.1, 0.27.4
>
>
> When requests arrive to the decoder in pieces (e.g. {{mes}} followed by a 
> separate chunk of {{os.apache.org}}) the http parser is not able to handle 
> this case if the split is within the URL component.
> This causes the decoder to error out, and can lead to connection invalidation.
> The scheduler driver is susceptible to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5970) Remove HTTP_PARSER_VERSION_MAJOR < 2 code in decoder.

2016-08-02 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-5970:
---

 Summary: Remove HTTP_PARSER_VERSION_MAJOR < 2 code in decoder.
 Key: MESOS-5970
 URL: https://issues.apache.org/jira/browse/MESOS-5970
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
 Fix For: 1.0.1, 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5943) Incremental http parsing of URLs leads to decoder error

2016-08-02 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404647#comment-15404647
 ] 

Joris Van Remoortere edited comment on MESOS-5943 at 8/2/16 9:03 PM:
-

{code}
commit f291d5023e9f2e471c11d4f20590901d9bfc1de4
Author: Joris Van Remoortere 
Date:   Mon Aug 1 17:14:37 2016 -0700

Libprocess: Removed old http_parser code.

We remove the code that supported the `HTTP_PARSER_VERSION_MAJOR` < 2
path.

Review: https://reviews.apache.org/r/50683

commit 2776a09cbcd836080241a5ad8c1e003984e5a146
Author: Joris Van Remoortere 
Date:   Sat Jul 30 12:58:28 2016 -0700

Libprocess: Fixed decoder to support incremental URL parsing.

Review: https://reviews.apache.org/r/50634
{code}


was (Author: jvanremoortere):
{code}
commit f291d5023e9f2e471c11d4f20590901d9bfc1de4
Author: Joris Van Remoortere 
Date:   Mon Aug 1 17:14:37 2016 -0700

Libprocess: Removed old http_parser code.

We remove the code that supported the `HTTP_PARSER_VERSION_MAJOR` < 2
path.

Review: https://reviews.apache.org/r/50683

commit 2776a09cbcd836080241a5ad8c1e003984e5a146
Author: Joris Van Remoortere 
Date:   Sat Jul 30 12:58:28 2016 -0700

Libprocess: Fixed decoder to support incremental URL parsing.

Review: https://reviews.apache.org/r/50634
{code}

> Incremental http parsing of URLs leads to decoder error
> ---
>
> Key: MESOS-5943
> URL: https://issues.apache.org/jira/browse/MESOS-5943
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, scheduler driver
>Affects Versions: 1.0.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.28.3, 1.0.1, 0.27.4
>
>
> When requests arrive to the decoder in pieces (e.g. {{mes}} followed by a 
> separate chunk of {{os.apache.org}}) the http parser is not able to handle 
> this case if the split is within the URL component.
> This causes the decoder to error out, and can lead to connection invalidation.
> The scheduler driver is susceptible to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5943) Incremental http parsing of URLs leads to decoder error

2016-08-02 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-5943:

Fix Version/s: 0.27.4
   0.28.3

> Incremental http parsing of URLs leads to decoder error
> ---
>
> Key: MESOS-5943
> URL: https://issues.apache.org/jira/browse/MESOS-5943
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, scheduler driver
>Affects Versions: 1.0.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.28.3, 1.0.1, 0.27.4
>
>
> When requests arrive to the decoder in pieces (e.g. {{mes}} followed by a 
> separate chunk of {{os.apache.org}}) the http parser is not able to handle 
> this case if the split is within the URL component.
> This causes the decoder to error out, and can lead to connection invalidation.
> The scheduler driver is susceptible to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5944) Remove `O_SYNC` from StatusUpdateManager logs

2016-07-30 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-5944:
---

 Summary: Remove `O_SYNC` from StatusUpdateManager logs
 Key: MESOS-5944
 URL: https://issues.apache.org/jira/browse/MESOS-5944
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
 Fix For: 1.1.0


Currently the {{StatusUpdateManager}} uses {{O_SYNC}} to flush status updates 
to disk. 

We don't need to use {{O_SYNC}} because we only read this file if the host did 
not crash. {{os::write}} success implies the kernel will have flushed our data 
to the page cache. This is sufficient for the recovery scenarios we use this 
data for.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5943) Incremental http parsing of URLs leads to decoder error

2016-07-30 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-5943:
---

 Summary: Incremental http parsing of URLs leads to decoder error
 Key: MESOS-5943
 URL: https://issues.apache.org/jira/browse/MESOS-5943
 Project: Mesos
  Issue Type: Bug
  Components: libprocess, scheduler driver
Affects Versions: 1.0.0
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
Priority: Blocker
 Fix For: 1.0.1


When requests arrive to the decoder in pieces (e.g. {{mes}} followed by a 
separate chunk of {{os.apache.org}}) the http parser is not able to handle this 
case if the split is within the URL component.

This causes the decoder to error out, and can lead to connection invalidation.

The scheduler driver is susceptible to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5425) Consider using IntervalSet for Port range resource math

2016-06-13 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-5425:

Assignee: Yanyan Hu

> Consider using IntervalSet for Port range resource math
> ---
>
> Key: MESOS-5425
> URL: https://issues.apache.org/jira/browse/MESOS-5425
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Joseph Wu
>Assignee: Yanyan Hu
>  Labels: mesosphere
> Attachments: graycol.gif
>
>
> Follow-up JIRA for comments raised in MESOS-3051 (see comments there).
> We should consider utilizing 
> [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp]
>  in [Port range resource 
> math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-08 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320739#comment-15320739
 ] 

Joris Van Remoortere commented on MESOS-5545:
-

[~fan.du] I would like to; however, this is currently not high enough on my 
priority list. I'm passionate about this subject, which is why I've brought it 
up before :-)

We should see in the community meeting if there is some consensus on a timeline.

If the automation aspect is what is most important to you, then I would focus 
on a good interface between Mesos and the modules / tools you want to build to 
source the information.
We likely won't get much traction dragging specific strategies into the Mesos 
project. Rather, we should take the approach of ensuring the interfaces / 
primitives work well for a variety of strategies and tools.

> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-07 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318745#comment-15318745
 ] 

Joris Van Remoortere edited comment on MESOS-5545 at 6/7/16 3:58 PM:
-

Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual 
conversation. It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
- Get a sense of timeline.
- Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component 
change like this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough 
interest to follow through as of yet. I would focus the most on getting this 
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some 
implementation / configuration bias (LLDP). I would work on partitioning 
general fault domain awareness (Mesos) from assigning of the attributes 
(Operator / automation).
- Take a step back and consider what other information we may want to associate 
with fault domains in the future. Is there a structure that is more resilient 
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take 
based upon it. Have we thought out all the actions, and whether they would 
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the 
life-time of an agent. For example, currently we don't allow resources or IPs 
to change. If this were also true for fault domain attributes, it would 
simplify the implementation. If you feel that dynamic attributes are necessary, 
then I would urge you to make that a phase 2 project and first work with the 
community to agree on a common pattern for updating any attributes on the 
agent, and how to surface consequential changes to both tasks and frameworks. 
(You may see why I suggest static to begin with ;-) )




was (Author: jvanremoortere):
Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual 
conversation. It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
  A. Get a sense of timeline.
  B. Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component 
change like this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough 
interest to follow through as of yet. I would focus the most on getting this 
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some 
implementation / configuration bias (LLDP). I would work on partitioning 
general fault domain awareness (Mesos) from assigning of the attributes 
(Operator / automation).
- Take a step back and consider what other information we may want to associate 
with fault domains in the future. Is there a structure that is more resilient 
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take 
based upon it. Have we thought out all the actions, and whether they would 
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the 
life-time of an agent. For example, currently we don't allow resources or IPs 
to change. If this were also true for fault domain attributes, it would 
simplify the implementation. If you feel that dynamic attributes are necessary, 
then I would urge you to make that a phase 2 project and first work with the 
community to agree on a common pattern for updating any attributes on the 
agent, and how to surface consequential changes to both tasks and frameworks. 
(You may see why I suggest static to begin with ;-) )



> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, 

[jira] [Commented] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-07 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318745#comment-15318745
 ] 

Joris Van Remoortere commented on MESOS-5545:
-

Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual 
conversation. It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
  A. Get a sense of timeline.
  B. Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component 
change like this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough 
interest to follow through as of yet. I would focus the most on getting this 
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some 
implementation / configuration bias (LLDP). I would work on partitioning 
general fault domain awareness (Mesos) from assigning of the attributes 
(Operator / automation).
- Take a step back and consider what other information we may want to associate 
with fault domains in the future. Is there a structure that is more resilient 
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take 
based upon it. Have we thought out all the actions, and whether they would 
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the 
life-time of an agent. For example, currently we don't allow resources or IPs 
to change. If this were also true for fault domain attributes, it would 
simplify the implementation. If you feel that dynamic attributes are necessary, 
then I would urge you to make that a phase 2 project and first work with the 
community to agree on a common pattern for updating any attributes on the 
agent, and how to surface consequential changes to both tasks and frameworks. 
(You may see why I suggest static to begin with ;-) )



> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault tolerance and 
> intelligent task placement. This ticket tries to investigate how to add rack 
> awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources

2016-06-07 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318745#comment-15318745
 ] 

Joris Van Remoortere edited comment on MESOS-5545 at 6/7/16 3:58 PM:
-

Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual 
conversation. It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
- Get a sense of timeline.
- Find a shepherd.

2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component 
change like this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough 
interest to follow through as of yet. I would focus the most on getting this 
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some 
implementation / configuration bias (LLDP). I would work on partitioning 
general fault domain awareness (Mesos) from assigning of the attributes 
(Operator / automation).
- Take a step back and consider what other information we may want to associate 
with fault domains in the future. Is there a structure that is more resilient 
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take 
based upon it. Have we thought out all the actions, and whether they would 
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the 
life-time of an agent. For example, currently we don't allow resources or IPs 
to change. If this were also true for fault domain attributes, it would 
simplify the implementation. If you feel that dynamic attributes are necessary, 
then I would urge you to make that a phase 2 project and first work with the 
community to agree on a common pattern for updating any attributes on the 
agent, and how to surface consequential changes to both tasks and frameworks. 
(You may see why I suggest static to begin with ;-) )




was (Author: jvanremoortere):
Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual 
conversation. It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
- Get a sense of timeline.
- Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component 
change like this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough 
interest to follow through as of yet. I would focus the most on getting this 
prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some 
implementation / configuration bias (LLDP). I would work on partitioning 
general fault domain awareness (Mesos) from assigning of the attributes 
(Operator / automation).
- Take a step back and consider what other information we may want to associate 
with fault domains in the future. Is there a structure that is more resilient 
to augmentation in the future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take 
based upon it. Have we thought out all the actions, and whether they would 
require changes to Mesos?
- You should clarify whether these attributes are expected to change over the 
life-time of an agent. For example, currently we don't allow resources or IPs 
to change. If this were also true for fault domain attributes, it would 
simplify the implementation. If you feel that dynamic attributes are necessary, 
then I would urge you to make that a phase 2 project and first work with the 
community to agree on a common pattern for updating any attributes on the 
agent, and how to surface consequential changes to both tasks and frameworks. 
(You may see why I suggest static to begin with ;-) )



> Add rack awareness support for Mesos resources
> --
>
> Key: MESOS-5545
> URL: https://issues.apache.org/jira/browse/MESOS-5545
> Project: Mesos
>  Issue Type: Story
>  Components: hadoop, master
>Reporter: Fan Du
> Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the 
> cluster, for example, rack topology. While lots of data center applications 
> have rack awareness feature to provide data locality, fault 

[jira] [Commented] (MESOS-5445) Allow libprocess/stout to build without first doing `make` in 3rdparty.

2016-06-06 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316681#comment-15316681
 ] 

Joris Van Remoortere commented on MESOS-5445:
-

[~tillt] Great! Go for it :-)

> Allow libprocess/stout to build without first doing `make` in 3rdparty.
> ---
>
> Key: MESOS-5445
> URL: https://issues.apache.org/jira/browse/MESOS-5445
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> After the 3rdparty reorg, libprocess/stout are enable to build their 
> dependencies and so one has to do `make` in 3rdpart/ before building 
> libprocess/stout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5420) Implement os::exists for processes

2016-05-30 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-5420:

Sprint: Mesosphere Sprint 36

> Implement os::exists for processes
> --
>
> Key: MESOS-5420
> URL: https://issues.apache.org/jira/browse/MESOS-5420
> Project: Mesos
>  Issue Type: Improvement
> Environment: Windows
>Reporter: Daniel Pravat
>Assignee: Daniel Pravat
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> os::exists returns true if the process identified by the parameter is still 
> running or was running and we are able to get information about it, such us 
> the exit code. In Windows after obtaining a handle to the process it is 
> possible perform those operations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3624) Port slave/containerizer/mesos/launch.cpp to Windows

2016-05-30 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3624:

Sprint: Mesosphere Sprint 36

> Port slave/containerizer/mesos/launch.cpp to Windows
> 
>
> Key: MESOS-3624
> URL: https://issues.apache.org/jira/browse/MESOS-3624
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows
> Fix For: 1.0.0
>
>
> Important subset of the dependency tree follows:
> slave/containerizer/mesos/launch.cpp: os, protobuf, launch
> launch: subcommand
> subcommand: flags
> flags.hpp: os.hpp, path.hpp, fetch.hpp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3639) Implement stout/os/windows/killtree.hpp

2016-05-30 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307067#comment-15307067
 ] 

Joris Van Remoortere commented on MESOS-3639:
-

{code}
commit 563c9ff5b539dc2d4ce1ba987dec925045cef5b8
Author: Daniel Pravat 
Date:   Mon May 30 18:02:24 2016 -0700

Windows: Enabled `JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE` on job objects.

Review: https://reviews.apache.org/r/47442/
{code}

> Implement stout/os/windows/killtree.hpp
> ---
>
> Key: MESOS-3639
> URL: https://issues.apache.org/jira/browse/MESOS-3639
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Daniel Pravat
>  Labels: mesosphere, windows
> Fix For: 0.29.0
>
>
> killtree() is implemented using Windows Job Objects. The processes created by 
> the  executor are associated with a job object using `create_job'. killtree() 
> is simply terminating the job object. 
> Helper functions:
> `create_job` function creates a job object whose name is derived from the 
> `pid` and associates the `pid` process with the job object. Every process 
> started by the process which is part of the job object becomes part of the 
> job object. The job name should match the name used in `kill_job`. The jobs 
> should be create with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE and allow the caller 
> to decide how to handle the returned handle. 
> `kill_job` function assumes the process identified by `pid` is associated 
> with a job object whose name is derive from it. Every process started by the 
> process which is part of the job object becomes part of the job object. 
> Destroying the task will close all such processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5417) define WSTRINGIFY behaviour on Windows

2016-05-30 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307063#comment-15307063
 ] 

Joris Van Remoortere commented on MESOS-5417:
-

{code}
commit ad3e161ac19ac32f5493e8b31bdef7b579c87177
Author: Daniel Pravat 
Date:   Mon May 30 17:48:47 2016 -0700

Windows: Added logging for `WSTRINGIFY` calls.

The return codes in Windows are not standardized. The function returns
an empty string and logs a warning.

Review: https://reviews.apache.org/r/47473/
{code}

> define WSTRINGIFY behaviour on Windows
> --
>
> Key: MESOS-5417
> URL: https://issues.apache.org/jira/browse/MESOS-5417
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Daniel Pravat
>Assignee: Daniel Pravat
>Priority: Minor
>  Labels: windows
>
> Identify the proper behaviour of WSTRINGIFY to improve the logging.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5375) Implement stout/os/windows/kill.hpp

2016-05-16 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-5375:

Story Points: 5

> Implement stout/os/windows/kill.hpp
> ---
>
> Key: MESOS-5375
> URL: https://issues.apache.org/jira/browse/MESOS-5375
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Daniel Pravat
>Assignee: Daniel Pravat
>  Labels: mesosphere, windows
> Fix For: 0.29.0
>
>
> Implement equivalent functionality on Windows 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3639) Implement stout/os/windows/killtree.hpp

2016-05-16 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284805#comment-15284805
 ] 

Joris Van Remoortere edited comment on MESOS-3639 at 5/16/16 4:38 PM:
--

https://reviews.apache.org/r/47169/


was (Author: jvanremoortere):
{code}
commit 769701ce36f639224a4b6763e234d153d58b297e
Author: Daniel Pravat 
Date:   Mon May 16 12:20:37 2016 -0400

Windows: Stout: Implemented `killtree` using NT job objects.

Review: https://reviews.apache.org/r/47169/
{code}

> Implement stout/os/windows/killtree.hpp
> ---
>
> Key: MESOS-3639
> URL: https://issues.apache.org/jira/browse/MESOS-3639
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Daniel Pravat
>  Labels: mesosphere, windows
> Fix For: 0.29.0
>
>
> killtree() is implemented using Windows Job Objects. The processes created by 
> the  executor are associated with a job object using `create_job'. killtree() 
> is simply terminating the job object. 
> Helper functions:
> `create_job` function creates a job object whose name is derived from the 
> `pid` and associates the `pid` process with the job object. Every process 
> started by the process which is part of the job object becomes part of the 
> job object. The job name should match the name used in `kill_job`.
> `kill_job` function assumes the process identified by `pid` is associated 
> with a job object whose name is derive from it. Every process started by the 
> process which is part of the job object becomes part of the job object. 
> Destroying the task will close all such processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5371) Implement `fcntl.hpp`

2016-05-13 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282908#comment-15282908
 ] 

Joris Van Remoortere commented on MESOS-5371:
-

{code}
commit 4c6162d5e3535f4611e869e143c91454033dca2d
Author: Alex Clemmer 
Date:   Fri May 13 13:25:57 2016 -0400

Windows: Added stub implementations of `fcntl.hpp` functions.

This commit introduces temporary versions of 2 important functions:
`os::nonblock` and `os::cloexec`. We put them here in a placeholder
commit so that reviewers can make progress on subprocess. In the
immediate term, the plan is to figure out on a callsite-by-callsite
basis how to work around the functionality of `os::cloexec`. When we
collect more data, we will be in a better position to offer a way
forward here.

Review: https://reviews.apache.org/r/46392/
{code}

> Implement `fcntl.hpp`
> -
>
> Key: MESOS-5371
> URL: https://issues.apache.org/jira/browse/MESOS-5371
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, stout, windows-mvp
>
> `fcntl.hpp` has a bunch of functions that will never work on Windows. We will 
> need to work around them, either by working around specific call sites of 
> functions like `os::cloexec`, or by implementing something that keeps track 
> of which file descriptors are cloexec, and which aren't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5379) Authentication documentation for libprocess endpoints can be misleading.

2016-05-13 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-5379:

Assignee: (was: Joris Van Remoortere)

> Authentication documentation for libprocess endpoints can be misleading.
> 
>
> Key: MESOS-5379
> URL: https://issues.apache.org/jira/browse/MESOS-5379
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, libprocess
>Affects Versions: 0.29.0
>Reporter: Benjamin Bannier
>Priority: Blocker
>  Labels: mesosphere, tech-debt
> Fix For: 0.29.0
>
>
> Libprocess exposes a number of endpoints (at least: {{/logging}}, 
> {{/metrics}}, and {{/profiler}}). If libprocess was initialized with some 
> realm these endpoints require authentication, and don't if not.
> To generate endpoint help we currently use the also function 
> {{AUTHENTICATION}} which injects the following into the help string,
> {code}
> This endpoints requires authentication iff HTTP authentication is enabled.
> {code}
> with {{iff}} documenting a coupling stronger between required authentication 
> and enabled authentication which might not be true for above libprocess 
> endpoints -- it is e.g., true when these endpoints are exposed through mesos 
> masters/agents, but possibly not if exposed through other executables.
> It seems for libprocess endpoint a less strong formulation like e.g.,
> {code}
> This endpoints supports authentication. If HTTP authentication is enabled, 
> this endpoint may require authentication.
> {code}
> might make the generated help strings more reusable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5379) Authentication documentation for libprocess endpoints can be misleading.

2016-05-13 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere reassigned MESOS-5379:
---

Assignee: Joris Van Remoortere

> Authentication documentation for libprocess endpoints can be misleading.
> 
>
> Key: MESOS-5379
> URL: https://issues.apache.org/jira/browse/MESOS-5379
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, libprocess
>Affects Versions: 0.29.0
>Reporter: Benjamin Bannier
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere, tech-debt
> Fix For: 0.29.0
>
>
> Libprocess exposes a number of endpoints (at least: {{/logging}}, 
> {{/metrics}}, and {{/profiler}}). If libprocess was initialized with some 
> realm these endpoints require authentication, and don't if not.
> To generate endpoint help we currently use the also function 
> {{AUTHENTICATION}} which injects the following into the help string,
> {code}
> This endpoints requires authentication iff HTTP authentication is enabled.
> {code}
> with {{iff}} documenting a coupling stronger between required authentication 
> and enabled authentication which might not be true for above libprocess 
> endpoints -- it is e.g., true when these endpoints are exposed through mesos 
> masters/agents, but possibly not if exposed through other executables.
> It seems for libprocess endpoint a less strong formulation like e.g.,
> {code}
> This endpoints supports authentication. If HTTP authentication is enabled, 
> this endpoint may require authentication.
> {code}
> might make the generated help strings more reusable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5356) Add Windows support for StopWatch

2016-05-10 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-5356:
---

 Summary: Add Windows support for StopWatch
 Key: MESOS-5356
 URL: https://issues.apache.org/jira/browse/MESOS-5356
 Project: Mesos
  Issue Type: Improvement
Reporter: Joris Van Remoortere
Assignee: Alex Clemmer
 Fix For: 0.29.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3643) Implement stout/os/windows/shell.hpp

2016-05-08 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275744#comment-15275744
 ] 

Joris Van Remoortere commented on MESOS-3643:
-

{code}
commit fc4f9d25f75dc0ca87732c8b0ee868a5713f1d0f
Author: Alex Clemmer 
Date:   Sun May 8 17:00:05 2016 -0400

Windows: Fixed shell constants, marked `os::shell` as deleted.

Review: https://reviews.apache.org/r/46393/
{code}

> Implement stout/os/windows/shell.hpp
> 
>
> Key: MESOS-3643
> URL: https://issues.apache.org/jira/browse/MESOS-3643
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows, windows-mvp
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3656) Port process/socket.hpp to Windows

2016-05-08 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275742#comment-15275742
 ] 

Joris Van Remoortere commented on MESOS-3656:
-

{code}
commit cd879244d42ade1f63d228694e5681ea254a9902
Author: Alex Clemmer 
Date:   Sun May 8 13:32:09 2016 -0700

Windows: Libprocess: Winsock class to handle WSAStartup/WSACleanup.

Review: https://reviews.apache.org/r/46344/
{code}

> Port process/socket.hpp to Windows
> --
>
> Key: MESOS-3656
> URL: https://issues.apache.org/jira/browse/MESOS-3656
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5296) Split Resource and Inverse offer protobufs for V1 API

2016-04-27 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-5296:
---

 Summary: Split Resource and Inverse offer protobufs for V1 API
 Key: MESOS-5296
 URL: https://issues.apache.org/jira/browse/MESOS-5296
 Project: Mesos
  Issue Type: Improvement
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
 Fix For: 0.29.0


The protobufs for the V1 api regarding inverse offers initially re-used the 
existing offer / rescind / accept / decline messages for regular offers.
We should split these out the be more explicit, and provide the ability to 
augment the messages with particulars to either resource or inverse offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5044) Temporary directories created by environment->mkdtemp cleanup can be problematic.

2016-03-30 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere reassigned MESOS-5044:
---

Assignee: Joris Van Remoortere

> Temporary directories created by environment->mkdtemp cleanup can be 
> problematic.
> -
>
> Key: MESOS-5044
> URL: https://issues.apache.org/jira/browse/MESOS-5044
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Gilbert Song
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Currently in mesos test, we have the temporary directories created by 
> `environment->mkdtemp()` cleaned up until the end of the test suite, which 
> can be problematic. For instance, if we have many tests in a test suite, each 
> of those tests is performing large size disk read/write in its temp dir, 
> which may lead to out of disk issue on some resource limited machines. 
> We should have these temp dir created by `environment->mkdtemp` cleaned up 
> during each test teardown. Currently we only clean up the sandbox for each 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5044) Temporary directories created by environment->mkdtemp cleanup can be problematic.

2016-03-30 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-5044:

   Sprint: Mesosphere Sprint 32
 Story Points: 1
   Labels: mesosphere  (was: )
Fix Version/s: 0.29.0

> Temporary directories created by environment->mkdtemp cleanup can be 
> problematic.
> -
>
> Key: MESOS-5044
> URL: https://issues.apache.org/jira/browse/MESOS-5044
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Gilbert Song
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Currently in mesos test, we have the temporary directories created by 
> `environment->mkdtemp()` cleaned up until the end of the test suite, which 
> can be problematic. For instance, if we have many tests in a test suite, each 
> of those tests is performing large size disk read/write in its temp dir, 
> which may lead to out of disk issue on some resource limited machines. 
> We should have these temp dir created by `environment->mkdtemp` cleaned up 
> during each test teardown. Currently we only clean up the sandbox for each 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4353) Limit the number of processes created by libprocess

2016-03-30 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4353:

Assignee: Maged Michael  (was: Qian Zhang)

> Limit the number of processes created by libprocess
> ---
>
> Key: MESOS-4353
> URL: https://issues.apache.org/jira/browse/MESOS-4353
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Qian Zhang
>Assignee: Maged Michael
>  Labels: libprocess, mesosphere
> Fix For: 0.29.0
>
>
> Currently libprocess will create {{max(8, number of CPU cores)}} processes 
> during the initialization, see 
> https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146
>  for details. This should be OK for a normal machine which has no much cores 
> (e.g., 16, 32), but for a powerful machine which may have a large number of 
> cores (e.g., an IBM Power machine may have 192 cores), this will cause too 
> much worker threads which are not necessary.
> And since libprocess is widely used in Mesos (master, agent, scheduler, 
> executor), it may also cause some performance issue. For example, when user 
> creates a Docker container via Mesos in a Mesos agent which is running on a 
> powerful machine with 192 cores, the DockerContainerizer in Mesos agent will 
> create a dedicated executor for the container, and there will be 192 worker 
> threads in that executor. And if user creates 1000 Docker containers in that 
> machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads 
> which is a large number and may thrash the OS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4576) Introduce a stout helper for "which"

2016-03-27 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4576:

Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29, Mesosphere Sprint 30, Mesosphere Sprint 31)

> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>Assignee: Disha Singh
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4694) DRFAllocator takes very long to allocate resources with a large number of frameworks

2016-03-24 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210399#comment-15210399
 ] 

Joris Van Remoortere commented on MESOS-4694:
-

{code}
commit 6a8738f89b01ac3ddd70c418c49f350e17fa
Author: Dario Rexin 
Date:   Thu Mar 24 14:10:31 2016 +0100

Allocator Performance: Exited early to avoid needless computation.

Review: https://reviews.apache.org/r/43668/
{code}

> DRFAllocator takes very long to allocate resources with a large number of 
> frameworks
> 
>
> Key: MESOS-4694
> URL: https://issues.apache.org/jira/browse/MESOS-4694
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Affects Versions: 0.26.0, 0.27.0, 0.27.1, 0.28.0, 0.27.2, 0.28.1
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>
> With a growing number of connected frameworks, the allocation time grows to 
> very high numbers. The addition of quota in 0.27 had an additional impact on 
> these numbers. Running `mesos-tests.sh --benchmark 
> --gtest_filter=HierarchicalAllocator_BENCHMARK_Test.DeclineOffers` gives us 
> the following numbers:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.921202secs to make 200 offers
> round 1 allocate took 2.85045secs to make 200 offers
> round 2 allocate took 2.823768secs to make 200 offers
> {noformat}
> Increasing the number of frameworks to 2000:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 28.209454secs to make 2000 offers
> round 1 allocate took 28.469419secs to make 2000 offers
> round 2 allocate took 28.138086secs to make 2000 offers
> {noformat}
> I was able to reduce this time by a substantial amount. After applying the 
> patches:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 1.016226secs to make 2000 offers
> round 1 allocate took 1.102729secs to make 2000 offers
> round 2 allocate took 1.102624secs to make 2000 offers
> {noformat}
> And with 2000 frameworks:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 12.563203secs to make 2000 offers
> round 1 allocate took 12.437517secs to make 2000 offers
> round 2 allocate took 12.470708secs to make 2000 offers
> {noformat}
> The patches do 3 things to improve the performance of the allocator.
> 1) The total values in the DRFSorter will be pre calculated per resource type
> 2) In the allocate method, when no resources are available to allocate, we 
> break out of the innermost loop to prevent looping over a large number of 
> frameworks when we have nothing to allocate
> 3) when a framework suppresses offers, we remove it from the sorter instead 
> of just calling continue in the allocation loop - this greatly improves 
> performance in the sorter and prevents looping over frameworks that don't 
> need resources
> Assuming that most of the frameworks behave nicely and suppress offers when 
> they have nothing to schedule, it is fair to assume, that point 3) has the 
> biggest impact on the performance. If we suppress offers for 90% of the 
> frameworks in the benchmark test, we see following numbers:
> {noformat}
> ==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 200 slaves and 2000 frameworks
> round 0 allocate took 11626us to make 200 offers
> round 1 allocate took 22890us to make 200 offers
> round 2 allocate took 21346us to make 200 offers
> {noformat}
> And for 200 frameworks:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 

[jira] [Updated] (MESOS-4694) DRFAllocator takes very long to allocate resources with a large number of frameworks

2016-03-24 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4694:

Affects Version/s: 0.28.1
   0.28.0
   0.27.2

> DRFAllocator takes very long to allocate resources with a large number of 
> frameworks
> 
>
> Key: MESOS-4694
> URL: https://issues.apache.org/jira/browse/MESOS-4694
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Affects Versions: 0.26.0, 0.27.0, 0.27.1, 0.28.0, 0.27.2, 0.28.1
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>
> With a growing number of connected frameworks, the allocation time grows to 
> very high numbers. The addition of quota in 0.27 had an additional impact on 
> these numbers. Running `mesos-tests.sh --benchmark 
> --gtest_filter=HierarchicalAllocator_BENCHMARK_Test.DeclineOffers` gives us 
> the following numbers:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.921202secs to make 200 offers
> round 1 allocate took 2.85045secs to make 200 offers
> round 2 allocate took 2.823768secs to make 200 offers
> {noformat}
> Increasing the number of frameworks to 2000:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 28.209454secs to make 2000 offers
> round 1 allocate took 28.469419secs to make 2000 offers
> round 2 allocate took 28.138086secs to make 2000 offers
> {noformat}
> I was able to reduce this time by a substantial amount. After applying the 
> patches:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 1.016226secs to make 2000 offers
> round 1 allocate took 1.102729secs to make 2000 offers
> round 2 allocate took 1.102624secs to make 2000 offers
> {noformat}
> And with 2000 frameworks:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 12.563203secs to make 2000 offers
> round 1 allocate took 12.437517secs to make 2000 offers
> round 2 allocate took 12.470708secs to make 2000 offers
> {noformat}
> The patches do 3 things to improve the performance of the allocator.
> 1) The total values in the DRFSorter will be pre calculated per resource type
> 2) In the allocate method, when no resources are available to allocate, we 
> break out of the innermost loop to prevent looping over a large number of 
> frameworks when we have nothing to allocate
> 3) when a framework suppresses offers, we remove it from the sorter instead 
> of just calling continue in the allocation loop - this greatly improves 
> performance in the sorter and prevents looping over frameworks that don't 
> need resources
> Assuming that most of the frameworks behave nicely and suppress offers when 
> they have nothing to schedule, it is fair to assume, that point 3) has the 
> biggest impact on the performance. If we suppress offers for 90% of the 
> frameworks in the benchmark test, we see following numbers:
> {noformat}
> ==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 200 slaves and 2000 frameworks
> round 0 allocate took 11626us to make 200 offers
> round 1 allocate took 22890us to make 200 offers
> round 2 allocate took 21346us to make 200 offers
> {noformat}
> And for 200 frameworks:
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HierarchicalAllocator_BENCHMARK_Test
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 2000 frameworks
> round 0 allocate took 1.11178secs to make 2000 offers
> round 1 allocate took 1.062649secs to make 2000 offers
> round 2 allocate took 1.080181secs to make 2000 offers
> {noformat}
> Review requests:
> 

[jira] [Commented] (MESOS-3656) Port process/socket.hpp to Windows

2016-03-24 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210024#comment-15210024
 ] 

Joris Van Remoortere commented on MESOS-3656:
-

{code}
commit 4e19c3e6f09eaa2793f4717e414429e0e6335e0f
Author: Daniel Pravat 
Date:   Thu Mar 24 09:33:05 2016 +0100

Windows: [2/2] Lifted socket API into Stout.

Review: https://reviews.apache.org/r/44139/

commit 6f8544cf5e2748a58ac979e6d12336b2dccbf1fb
Author: Daniel Pravat 
Date:   Thu Mar 24 09:32:57 2016 +0100

Windows: [1/2] Lifted socket API into Stout.

Review: https://reviews.apache.org/r/44138/
{code}

> Port process/socket.hpp to Windows
> --
>
> Key: MESOS-3656
> URL: https://issues.apache.org/jira/browse/MESOS-3656
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4827) Destroy Docker container crashes Mesos slave

2016-03-21 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4827:

Affects Version/s: 0.26.0
   0.27.0
   0.28.0

> Destroy Docker container crashes Mesos slave
> 
>
> Key: MESOS-4827
> URL: https://issues.apache.org/jira/browse/MESOS-4827
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework, slave
>Affects Versions: 0.25.0, 0.26.0, 0.27.0, 0.28.0
>Reporter: Zhenzhong Shi
>Priority: Blocker
> Fix For: 0.29.0
>
>
> The details of this issue originally [posted on 
> StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
>  
> To be short, the problem is when we destroy/re-deploy a docker-containerized 
> task, the mesos-slave got killed from time to time. It happened on our 
> production environment and I cann't re-produce it.
> Please refer to the post on StackOverflow about the error message I got and 
> details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4809) Allow parallel execution of tests

2016-03-15 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4809:

Assignee: Benjamin Bannier

> Allow parallel execution of tests
> -
>
> Key: MESOS-4809
> URL: https://issues.apache.org/jira/browse/MESOS-4809
> Project: Mesos
>  Issue Type: Epic
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> We should allow parallel execution of tests. There are two flavors to this:
> (a) tests are run in parallel in the same process, or
> (b) tests are run in parallel with separate processes (e.g., with 
> gtest-parallel).
> While (a) likely has overall better performance, it depends on tests being 
> independent of global state (e.g., current directory, and others). On the 
> other hand, already (b) improves execution time, and has much smaller 
> requirements.
> This epic tracks efforts to fix test to allow scenario (b) above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4807) IOTest.BufferedRead writes to the current directory

2016-03-15 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4807:

Labels: mesosphere newbie parallel-tests  (was: newbie parallel-tests)

> IOTest.BufferedRead writes to the current directory
> ---
>
> Key: MESOS-4807
> URL: https://issues.apache.org/jira/browse/MESOS-4807
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Reporter: Benjamin Bannier
>Assignee: Yong Tang
>Priority: Minor
>  Labels: mesosphere, newbie, parallel-tests
> Fix For: 0.29.0
>
>
> libprocess's {{IOTest.BufferedRead}} writes to the current directory. This is 
> bad for a number of reasons, e.g.,
> * should the test fail data might be leaked to random locations,
> * the test cannot be executed from a write-only directory, or
> * executing the same test in parallel would race on the existence of the 
> created file, and show bogus behavior.
> The test should probably be executed from a temporary directory, e.g., via 
> stout's {{TemporaryDirectoryTest}} fixture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4831) Master sometimes sends two inverse offers after the agent goes into maintenance.

2016-03-10 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4831:

Sprint: Mesosphere Sprint 30

> Master sometimes sends two inverse offers after the agent goes into 
> maintenance.
> 
>
> Key: MESOS-4831
> URL: https://issues.apache.org/jira/browse/MESOS-4831
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>Priority: Blocker
>  Labels: maintenance, mesosphere
> Fix For: 0.28.0
>
>
> Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}
> https://builds.apache.org/job/Mesos/1748/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/consoleFull
> {code}
> I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
> allocate!
> I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
> slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
> I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
> fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> {code}
> The ideal expected workflow for this test is something like:
> - The framework receives offers from master.
> - The framework updates its maintenance schedule.
> - The current offer is rescinded.
> - A new offer is received from the master with unavailability set.
> - After the agent goes for maintenance, an inverse offer is sent.
> For some reason, in the logs we see that the master is sending 2 inverse 
> offers. The test seems to pass as we just check for the initial inverse offer 
> being present. This can also be reproduced by a modified version of the 
> original test.
> {code}
> // Test ensures that an offer will have an `unavailability` set if the
> // slave is scheduled to go down for maintenance.
> TEST_F(MasterMaintenanceTest, PendingUnavailabilityTest)
> {
>   Try master = StartMaster();
>   ASSERT_SOME(master);
>   MockExecutor exec(DEFAULT_EXECUTOR_ID);
>   Try slave = StartSlave();
>   ASSERT_SOME(slave);
>   auto scheduler = std::make_shared();
>   EXPECT_CALL(*scheduler, heartbeat(_))
> .WillRepeatedly(Return()); // Ignore heartbeats.
>   Future connected;
>   EXPECT_CALL(*scheduler, connected(_))
> .WillOnce(FutureSatisfy())
> .WillRepeatedly(Return()); // Ignore future invocations.
>   scheduler::TestV1Mesos mesos(master.get(), ContentType::PROTOBUF, 
> scheduler);
>   AWAIT_READY(connected);
>   Future subscribed;
>   EXPECT_CALL(*scheduler, subscribed(_, _))
> .WillOnce(FutureArg<1>());
>   Future normalOffers;
>   Future unavailabilityOffers;
>   Future inverseOffers;
>   EXPECT_CALL(*scheduler, offers(_, _))
> .WillOnce(FutureArg<1>())
> .WillOnce(FutureArg<1>())
> .WillOnce(FutureArg<1>());
>   // The original offers should be rescinded when the unavailability is 
> changed.
>   Future offerRescinded;
>   EXPECT_CALL(*scheduler, rescind(_, _))
> .WillOnce(FutureSatisfy());
>   {
> Call call;
> call.set_type(Call::SUBSCRIBE);
> Call::Subscribe* subscribe = call.mutable_subscribe();
> subscribe->mutable_framework_info()->CopyFrom(DEFAULT_V1_FRAMEWORK_INFO);
> mesos.send(call);
>   }
>   AWAIT_READY(subscribed);
>   v1::FrameworkID frameworkId(subscribed->framework_id());
>   AWAIT_READY(normalOffers);
>   EXPECT_NE(0, normalOffers->offers().size());
>   // Regular offers shouldn't have unavailability.
>   foreach (const v1::Offer& offer, normalOffers->offers()) {
> EXPECT_FALSE(offer.has_unavailability());
>   }
>   // Schedule this slave for maintenance.
>   MachineID machine;
>   machine.set_hostname(maintenanceHostname);
>   machine.set_ip(stringify(slave.get().address.ip));
>   const Time start = Clock::now() + Seconds(60);
>   const Duration duration = Seconds(120);
>   const Unavailability unavailability = createUnavailability(start, duration);
>   // Post a valid schedule with one machine.
>   maintenance::Schedule schedule = createSchedule(
>   {createWindow({machine}, unavailability)});
>   // We have a few seconds between the first set of offers and the
>   // next allocation of offers. This should be enough time to perform
>   // a maintenance schedule update. This update will also trigger the
>   // rescinding of offers from the scheduled slave.
>   Future response = process::http::post(
>   master.get(),
> 

[jira] [Commented] (MESOS-4827) Destroy Docker container crashes Mesos slave

2016-03-09 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188564#comment-15188564
 ] 

Joris Van Remoortere commented on MESOS-4827:
-

No. That is why it is marked as a blocker.
It does seem like #1 and #3 may be separate issues though. 3 is what is causing 
the wide-spread task failure.

> Destroy Docker container crashes Mesos slave
> 
>
> Key: MESOS-4827
> URL: https://issues.apache.org/jira/browse/MESOS-4827
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework, slave
>Affects Versions: 0.25.0
>Reporter: Zhenzhong Shi
>Priority: Blocker
> Fix For: 0.29.0
>
>
> The details of this issue originally [posted on 
> StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
>  
> To be short, the problem is when we destroy/re-deploy a docker-containerized 
> task, the mesos-slave got killed from time to time. It happened on our 
> production environment and I cann't re-produce it.
> Please refer to the post on StackOverflow about the error message I got and 
> details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4827) Destroy Docker container from Marathon kills Mesos slave

2016-03-08 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4827:

Priority: Blocker  (was: Major)

> Destroy Docker container from Marathon kills Mesos slave
> 
>
> Key: MESOS-4827
> URL: https://issues.apache.org/jira/browse/MESOS-4827
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework, slave
>Affects Versions: 0.25.0
>Reporter: Zhenzhong Shi
>Priority: Blocker
> Fix For: 0.29.0
>
>
> The details of this issue originally [posted on 
> StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
>  
> To be short, the problem is when we destroy/re-deploy a docker-containerized 
> task, the mesos-slave got killed from time to time. It happened on our 
> production environment and I cann't re-produce it.
> Please refer to the post on StackOverflow about the error message I got and 
> details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4827) Destroy Docker container crashes Mesos slave

2016-03-08 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185017#comment-15185017
 ] 

Joris Van Remoortere commented on MESOS-4827:
-

At first glance this looks like it is happening because the directory structure 
in which we want to write the sentinel file is not fully constructed.
We need to:
- Investigate (and fix) how this can happen.

> Destroy Docker container crashes Mesos slave
> 
>
> Key: MESOS-4827
> URL: https://issues.apache.org/jira/browse/MESOS-4827
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework, slave
>Affects Versions: 0.25.0
>Reporter: Zhenzhong Shi
>Priority: Blocker
> Fix For: 0.29.0
>
>
> The details of this issue originally [posted on 
> StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
>  
> To be short, the problem is when we destroy/re-deploy a docker-containerized 
> task, the mesos-slave got killed from time to time. It happened on our 
> production environment and I cann't re-produce it.
> Please refer to the post on StackOverflow about the error message I got and 
> details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4827) Destroy Docker container crashes Mesos slave

2016-03-08 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4827:

Summary: Destroy Docker container crashes Mesos slave  (was: Destroy Docker 
container from Marathon kills Mesos slave)

> Destroy Docker container crashes Mesos slave
> 
>
> Key: MESOS-4827
> URL: https://issues.apache.org/jira/browse/MESOS-4827
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework, slave
>Affects Versions: 0.25.0
>Reporter: Zhenzhong Shi
>Priority: Blocker
> Fix For: 0.29.0
>
>
> The details of this issue originally [posted on 
> StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
>  
> To be short, the problem is when we destroy/re-deploy a docker-containerized 
> task, the mesos-slave got killed from time to time. It happened on our 
> production environment and I cann't re-produce it.
> Please refer to the post on StackOverflow about the error message I got and 
> details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4827) Destroy Docker container from Marathon kills Mesos slave

2016-03-08 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4827:

Fix Version/s: 0.29.0

> Destroy Docker container from Marathon kills Mesos slave
> 
>
> Key: MESOS-4827
> URL: https://issues.apache.org/jira/browse/MESOS-4827
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework, slave
>Affects Versions: 0.25.0
>Reporter: Zhenzhong Shi
> Fix For: 0.29.0
>
>
> The details of this issue originally [posted on 
> StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
>  
> To be short, the problem is when we destroy/re-deploy a docker-containerized 
> task, the mesos-slave got killed from time to time. It happened on our 
> production environment and I cann't re-produce it.
> Please refer to the post on StackOverflow about the error message I got and 
> details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4838) Update unavailable in batch to avoid several allocate(slaveId) call

2016-03-04 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180507#comment-15180507
 ] 

Joris Van Remoortere commented on MESOS-4838:
-

[~klaus1982] I'm not sure why we need to do this.
1. Are you seeing performance issues with the {{allocate(slaveId)}} calls 
generated by the maintenance schedule?
2. If this is the case, why wouldn't the general batching proposal for the 
allocator cover this case? Why do we need to implement batching in specific API 
entry points?
3. If this is being suggested because a maintenance schedule tends to update 
many agents simultaneously, then would it make more sense to consider calling 
the batch {{allocate()}} function in the allocator after updating all the agent 
availabilities?

If you are interested in considering some improvements around maintenance, 
let's set up a working group. I know others are also interested in this 
feature, and I know [~kaysoky] would love to help guide these discussions.
We should discuss these kinds of larger changes and ideas in terms of their 
operational and development consequences before posting patches. (Though if you 
just want to try it out to understand the performance implications or what code 
would need to be touched that's totally fine; we just may decide to go in a 
very different direction).

> Update unavailable in batch to avoid several allocate(slaveId) call
> ---
>
> Key: MESOS-4838
> URL: https://issues.apache.org/jira/browse/MESOS-4838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> In "/machine/schedule", all machines in master will trigger a 
> {{allocate(slaveId)}} which will increase the workload of master. The 
> proposal of this JIRA is to update unavailable in batch to avoid several 
> {{allocate(slaveId)}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4831) Master sometimes sends two inverse offers after the agent goes into maintenance.

2016-03-02 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4831:

Shepherd: Joris Van Remoortere

> Master sometimes sends two inverse offers after the agent goes into 
> maintenance.
> 
>
> Key: MESOS-4831
> URL: https://issues.apache.org/jira/browse/MESOS-4831
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>  Labels: maintenance, mesosphere
>
> Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}
> https://builds.apache.org/job/Mesos/1748/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/consoleFull
> {code}
> I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
> allocate!
> I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
> slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
> I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
> fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> {code}
> The ideal expected workflow for this test is something like:
> - The framework receives offers from master.
> - The framework updates its maintenance schedule.
> - The current offer is rescinded.
> - A new offer is received from the master with unavailability set.
> - After the agent goes for maintenance, an inverse offer is sent.
> For some reason, in the logs we see that the master is sending 2 inverse 
> offers. The test seems to pass as we just check for the initial inverse offer 
> being present. This can also be reproduced by a modified version of the 
> original test.
> {code}
> // Test ensures that an offer will have an `unavailability` set if the
> // slave is scheduled to go down for maintenance.
> TEST_F(MasterMaintenanceTest, PendingUnavailabilityTest)
> {
>   Try master = StartMaster();
>   ASSERT_SOME(master);
>   MockExecutor exec(DEFAULT_EXECUTOR_ID);
>   Try slave = StartSlave();
>   ASSERT_SOME(slave);
>   auto scheduler = std::make_shared();
>   EXPECT_CALL(*scheduler, heartbeat(_))
> .WillRepeatedly(Return()); // Ignore heartbeats.
>   Future connected;
>   EXPECT_CALL(*scheduler, connected(_))
> .WillOnce(FutureSatisfy())
> .WillRepeatedly(Return()); // Ignore future invocations.
>   scheduler::TestV1Mesos mesos(master.get(), ContentType::PROTOBUF, 
> scheduler);
>   AWAIT_READY(connected);
>   Future subscribed;
>   EXPECT_CALL(*scheduler, subscribed(_, _))
> .WillOnce(FutureArg<1>());
>   Future normalOffers;
>   Future unavailabilityOffers;
>   Future inverseOffers;
>   EXPECT_CALL(*scheduler, offers(_, _))
> .WillOnce(FutureArg<1>())
> .WillOnce(FutureArg<1>())
> .WillOnce(FutureArg<1>());
>   // The original offers should be rescinded when the unavailability is 
> changed.
>   Future offerRescinded;
>   EXPECT_CALL(*scheduler, rescind(_, _))
> .WillOnce(FutureSatisfy());
>   {
> Call call;
> call.set_type(Call::SUBSCRIBE);
> Call::Subscribe* subscribe = call.mutable_subscribe();
> subscribe->mutable_framework_info()->CopyFrom(DEFAULT_V1_FRAMEWORK_INFO);
> mesos.send(call);
>   }
>   AWAIT_READY(subscribed);
>   v1::FrameworkID frameworkId(subscribed->framework_id());
>   AWAIT_READY(normalOffers);
>   EXPECT_NE(0, normalOffers->offers().size());
>   // Regular offers shouldn't have unavailability.
>   foreach (const v1::Offer& offer, normalOffers->offers()) {
> EXPECT_FALSE(offer.has_unavailability());
>   }
>   // Schedule this slave for maintenance.
>   MachineID machine;
>   machine.set_hostname(maintenanceHostname);
>   machine.set_ip(stringify(slave.get().address.ip));
>   const Time start = Clock::now() + Seconds(60);
>   const Duration duration = Seconds(120);
>   const Unavailability unavailability = createUnavailability(start, duration);
>   // Post a valid schedule with one machine.
>   maintenance::Schedule schedule = createSchedule(
>   {createWindow({machine}, unavailability)});
>   // We have a few seconds between the first set of offers and the
>   // next allocation of offers. This should be enough time to perform
>   // a maintenance schedule update. This update will also trigger the
>   // rescinding of offers from the scheduled slave.
>   Future response = process::http::post(
>   master.get(),
>   "maintenance/schedule",
>   headers,
>   

[jira] [Commented] (MESOS-4831) Master sometimes sends two inverse offers after the agent goes into maintenance.

2016-03-02 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175259#comment-15175259
 ] 

Joris Van Remoortere commented on MESOS-4831:
-

Yep!

> Master sometimes sends two inverse offers after the agent goes into 
> maintenance.
> 
>
> Key: MESOS-4831
> URL: https://issues.apache.org/jira/browse/MESOS-4831
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>  Labels: maintenance, mesosphere
>
> Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}
> https://builds.apache.org/job/Mesos/1748/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/consoleFull
> {code}
> I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
> allocate!
> I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
> slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
> I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
> fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> {code}
> The ideal expected workflow for this test is something like:
> - The framework receives offers from master.
> - The framework updates its maintenance schedule.
> - The current offer is rescinded.
> - A new offer is received from the master with unavailability set.
> - After the agent goes for maintenance, an inverse offer is sent.
> For some reason, in the logs we see that the master is sending 2 inverse 
> offers. The test seems to pass as we just check for the initial inverse offer 
> being present. This can also be reproduced by a modified version of the 
> original test.
> {code}
> // Test ensures that an offer will have an `unavailability` set if the
> // slave is scheduled to go down for maintenance.
> TEST_F(MasterMaintenanceTest, PendingUnavailabilityTest)
> {
>   Try master = StartMaster();
>   ASSERT_SOME(master);
>   MockExecutor exec(DEFAULT_EXECUTOR_ID);
>   Try slave = StartSlave();
>   ASSERT_SOME(slave);
>   auto scheduler = std::make_shared();
>   EXPECT_CALL(*scheduler, heartbeat(_))
> .WillRepeatedly(Return()); // Ignore heartbeats.
>   Future connected;
>   EXPECT_CALL(*scheduler, connected(_))
> .WillOnce(FutureSatisfy())
> .WillRepeatedly(Return()); // Ignore future invocations.
>   scheduler::TestV1Mesos mesos(master.get(), ContentType::PROTOBUF, 
> scheduler);
>   AWAIT_READY(connected);
>   Future subscribed;
>   EXPECT_CALL(*scheduler, subscribed(_, _))
> .WillOnce(FutureArg<1>());
>   Future normalOffers;
>   Future unavailabilityOffers;
>   Future inverseOffers;
>   EXPECT_CALL(*scheduler, offers(_, _))
> .WillOnce(FutureArg<1>())
> .WillOnce(FutureArg<1>())
> .WillOnce(FutureArg<1>());
>   // The original offers should be rescinded when the unavailability is 
> changed.
>   Future offerRescinded;
>   EXPECT_CALL(*scheduler, rescind(_, _))
> .WillOnce(FutureSatisfy());
>   {
> Call call;
> call.set_type(Call::SUBSCRIBE);
> Call::Subscribe* subscribe = call.mutable_subscribe();
> subscribe->mutable_framework_info()->CopyFrom(DEFAULT_V1_FRAMEWORK_INFO);
> mesos.send(call);
>   }
>   AWAIT_READY(subscribed);
>   v1::FrameworkID frameworkId(subscribed->framework_id());
>   AWAIT_READY(normalOffers);
>   EXPECT_NE(0, normalOffers->offers().size());
>   // Regular offers shouldn't have unavailability.
>   foreach (const v1::Offer& offer, normalOffers->offers()) {
> EXPECT_FALSE(offer.has_unavailability());
>   }
>   // Schedule this slave for maintenance.
>   MachineID machine;
>   machine.set_hostname(maintenanceHostname);
>   machine.set_ip(stringify(slave.get().address.ip));
>   const Time start = Clock::now() + Seconds(60);
>   const Duration duration = Seconds(120);
>   const Unavailability unavailability = createUnavailability(start, duration);
>   // Post a valid schedule with one machine.
>   maintenance::Schedule schedule = createSchedule(
>   {createWindow({machine}, unavailability)});
>   // We have a few seconds between the first set of offers and the
>   // next allocation of offers. This should be enough time to perform
>   // a maintenance schedule update. This update will also trigger the
>   // rescinding of offers from the scheduled slave.
>   Future response = process::http::post(
>   master.get(),
>   "maintenance/schedule",
>   headers,
>  

[jira] [Updated] (MESOS-4691) Add a HierarchicalAllocator benchmark with reservation labels.

2016-03-01 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4691:

Shepherd: Joris Van Remoortere  (was: Michael Park)

> Add a HierarchicalAllocator benchmark with reservation labels.
> --
>
> Key: MESOS-4691
> URL: https://issues.apache.org/jira/browse/MESOS-4691
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> With {{Labels}} being part of the {{ReservationInfo}}, we should ensure that 
> we don't observe a significant performance degradation in the allocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4415) Implement stout/os/windows/rmdir.hpp

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174573#comment-15174573
 ] 

Joris Van Remoortere commented on MESOS-4415:
-

https://reviews.apache.org/r/43907/
https://reviews.apache.org/r/43908/

> Implement stout/os/windows/rmdir.hpp
> 
>
> Key: MESOS-4415
> URL: https://issues.apache.org/jira/browse/MESOS-4415
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Joris Van Remoortere
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows
> Fix For: 0.27.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4780) Remove `user` and `rootfs` flags in Windows launcher.

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174440#comment-15174440
 ] 

Joris Van Remoortere edited comment on MESOS-4780 at 3/1/16 10:42 PM:
--

https://reviews.apache.org/r/43904/
https://reviews.apache.org/r/43905/
https://reviews.apache.org/r/40938/
https://reviews.apache.org/r/40939/


was (Author: jvanremoortere):
https://reviews.apache.org/r/43904/
https://reviews.apache.org/r/43905/

> Remove `user` and `rootfs` flags in Windows launcher.
> -
>
> Key: MESOS-4780
> URL: https://issues.apache.org/jira/browse/MESOS-4780
> Project: Mesos
>  Issue Type: Task
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows-mvp
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4780) Remove `user` and `rootfs` flags in Windows launcher.

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174457#comment-15174457
 ] 

Joris Van Remoortere edited comment on MESOS-4780 at 3/1/16 10:42 PM:
--

{code}
commit 9f1b115a67a1625a4807c2a7d4e1a41bca1af2a6
Author: Daniel Pravat 
Date:   Tue Mar 1 14:18:41 2016 -0800

Stout: Marked `os::su` as deleted on Windows.

Review: https://reviews.apache.org/r/40939/

commit a1f731746657b1cbcf136ddb2bf154ca3da271fc
Author: Daniel Pravat 
Date:   Tue Mar 1 14:16:08 2016 -0800

Stout: Marked `os::chroot` as deleted on Windows.

Review: https://reviews.apache.org/r/40938/

commit a1a9cd5939d25f82214a5c533bde96a3493f81f3
Author: Alex Clemmer 
Date:   Tue Mar 1 13:35:13 2016 -0800

Windows: Stout: Removed user based functions.

Review: https://reviews.apache.org/r/43905/

commit b9de8c6a06f0d0246ea38ab5586de1d0b2478c38
Author: Alex Clemmer 
Date:   Tue Mar 1 13:33:37 2016 -0800

Windows: Removed `user` launcher flag, preventing `su`.

`su` does not exist on Windows. Unfortunately, the launcher also depends
on it. In this commit, we remove Windows support for the launcher flag
`user`, which controls whether we use `su` in the launcher. This
allows us to divest ourselves of `su` altogether on Windows.

Review: https://reviews.apache.org/r/43905/
{code}


was (Author: jvanremoortere):
{code}
commit a1a9cd5939d25f82214a5c533bde96a3493f81f3
Author: Alex Clemmer 
Date:   Tue Mar 1 13:35:13 2016 -0800

Windows: Stout: Removed user based functions.

Review: https://reviews.apache.org/r/43905/

commit b9de8c6a06f0d0246ea38ab5586de1d0b2478c38
Author: Alex Clemmer 
Date:   Tue Mar 1 13:33:37 2016 -0800

Windows: Removed `user` launcher flag, preventing `su`.

`su` does not exist on Windows. Unfortunately, the launcher also depends
on it. In this commit, we remove Windows support for the launcher flag
`user`, which controls whether we use `su` in the launcher. This
allows us to divest ourselves of `su` altogether on Windows.

Review: https://reviews.apache.org/r/43905/
{code}

> Remove `user` and `rootfs` flags in Windows launcher.
> -
>
> Key: MESOS-4780
> URL: https://issues.apache.org/jira/browse/MESOS-4780
> Project: Mesos
>  Issue Type: Task
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows-mvp
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4833) Poor allocator performance with labeled resources and/or persistent volumes

2016-03-01 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4833:

Priority: Blocker  (was: Critical)

> Poor allocator performance with labeled resources and/or persistent volumes
> ---
>
> Key: MESOS-4833
> URL: https://issues.apache.org/jira/browse/MESOS-4833
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere, resources
> Fix For: 0.28.0
>
>
> Modifying the {{HierarchicalAllocator_BENCHMARK_Test.ResourceLabels}} 
> benchmark from https://reviews.apache.org/r/43686/ to use distinct labels 
> between different slaves, performance regresses from ~2 seconds to ~3 
> minutes. The culprit seems to be the way in which the allocator merges 
> together resources; reserved resource labels (or persistent volume IDs) 
> inhibit merging, which causes performance to be much worse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4780) Remove `user` and `rootfs` flags in Windows launcher.

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174440#comment-15174440
 ] 

Joris Van Remoortere edited comment on MESOS-4780 at 3/1/16 9:31 PM:
-

https://reviews.apache.org/r/43904/
https://reviews.apache.org/r/43905/


was (Author: jvanremoortere):
https://reviews.apache.org/r/43904/

> Remove `user` and `rootfs` flags in Windows launcher.
> -
>
> Key: MESOS-4780
> URL: https://issues.apache.org/jira/browse/MESOS-4780
> Project: Mesos
>  Issue Type: Task
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows-mvp
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3525) Figure out how to enforce 64-bit builds on Windows.

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174156#comment-15174156
 ] 

Joris Van Remoortere commented on MESOS-3525:
-

https://reviews.apache.org/r/43692/
https://reviews.apache.org/r/43693/
https://reviews.apache.org/r/43694/
https://reviews.apache.org/r/43695/
https://reviews.apache.org/r/43689/

> Figure out how to enforce 64-bit builds on Windows.
> ---
>
> Key: MESOS-3525
> URL: https://issues.apache.org/jira/browse/MESOS-3525
> Project: Mesos
>  Issue Type: Task
>  Components: build
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: build, cmake, mesosphere
> Fix For: 0.28.0
>
>
> We need to make sure people don't try to compile Mesos on 32-bit 
> architectures. We don't want a Windows repeat of something like this:
> https://issues.apache.org/jira/browse/MESOS-267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173239#comment-15173239
 ] 

Joris Van Remoortere commented on MESOS-4825:
-

I can shepherd this.
I don't think we should reject if there is a version mismatch. That would 
prevent us from doing rolling upgrades.
We just want to update the version to the current one the agent is running, so 
that the {{/slaves}} endpoint reports it correctly, and any logic that is 
dependent on the slave's version works correctly.

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173208#comment-15173208
 ] 

Joris Van Remoortere edited comment on MESOS-4825 at 3/1/16 4:18 AM:
-

[~klaus1982] Not all re-register paths construct a {{new Slave()}}:
https://github.com/apache/mesos/blob/0fd95ccc54e4d144c3eb60e98bf77d53b6bdab63/src/master/master.cpp#L4405-L4467


was (Author: jvanremoortere):
[~klaus1982]Not all re-register paths construct a {{new Slave()}}:
https://github.com/apache/mesos/blob/0fd95ccc54e4d144c3eb60e98bf77d53b6bdab63/src/master/master.cpp#L4405-L4467

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173208#comment-15173208
 ] 

Joris Van Remoortere commented on MESOS-4825:
-

[~klaus1982]Not all re-register paths construct a {{new Slave()}}:
https://github.com/apache/mesos/blob/0fd95ccc54e4d144c3eb60e98bf77d53b6bdab63/src/master/master.cpp#L4405-L4467

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4825) Master's slave reregister logic does not update version field

2016-02-29 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-4825:
---

 Summary: Master's slave reregister logic does not update version 
field
 Key: MESOS-4825
 URL: https://issues.apache.org/jira/browse/MESOS-4825
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Joris Van Remoortere
Priority: Blocker
 Fix For: 0.28.0


The master's logic for reregistering a slave does not update the version field 
if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.

2016-02-26 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170238#comment-15170238
 ] 

Joris Van Remoortere commented on MESOS-3271:
-

{code}
commit 16aa038949741f4dc6bf43423dc0340f869605ce
Author: Alexander Rojas 
Date:   Fri Feb 26 17:17:50 2016 -0800

Removed race condition from libevent based poll implementation.

Under certains circumstances, the future returned by poll is discarded
right after the event is triggered, this causes the event callback to be
called before the discard callback which results in an abort signal
being raised by the libevent library.

Review: https://reviews.apache.org/r/43799/
{code}

> SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
> ---
>
> Key: MESOS-3271
> URL: https://issues.apache.org/jira/browse/MESOS-3271
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Paul Brett
> Attachments: build.txt
>
>
> Test failure on Ubuntu 14 configured with {{--disable-java --disable-python 
> --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation}}
> Commit: {{9b78b301469667b5a44f0a351de5f3a71edae499}}
> {code}
> [ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
> I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
> I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
> 20150815-064146-544909504-51064-12195-S0
> Registered executor on slave1-ubuntu12
> Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
> Forked command at 17114
> sh -c 'sleep 1000'
> [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
> 0x2, fd: 21, flags: 0x80)
> *** Aborted at 1439646107 (unix time) try "date -d @1439646107" if you are 
> using GNU date ***
> PC: @ 0x7f6ba512d0d5 (unknown)
> *** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
> 12195; stack trace: ***
> @ 0x7f6ba54c4cb0 (unknown)
> @ 0x7f6ba512d0d5 (unknown)
> @ 0x7f6ba513083b (unknown)
> @ 0x7f6ba448e1ba (unknown)
> @ 0x7f6ba448e52b (unknown)
> @ 0x7f6ba447dcc9 (unknown)
> @   0x4c4033 process::internal::run<>()
> @ 0x7f6ba72642ab process::Future<>::discard()
> @ 0x7f6ba72643be process::internal::discard<>()
> @ 0x7f6ba7262298 
> _ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
> @   0x4c4033 process::internal::run<>()
> @   0x6fa0cb process::Future<>::discard()
> @ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
> @ 0x7f6ba728fb11 process::ProcessManager::resume()
> @ 0x7f6ba728fe0f process::internal::schedule()
> @ 0x7f6ba5c9d490 (unknown)
> @ 0x7f6ba54bce9a start_thread
> @ 0x7f6ba51ea38d (unknown)
> + /bin/true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4711) Race condition in libevent poll implementation causes crash

2016-02-26 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170239#comment-15170239
 ] 

Joris Van Remoortere commented on MESOS-4711:
-

{code}
commit 16aa038949741f4dc6bf43423dc0340f869605ce
Author: Alexander Rojas 
Date:   Fri Feb 26 17:17:50 2016 -0800

Removed race condition from libevent based poll implementation.

Under certains circumstances, the future returned by poll is discarded
right after the event is triggered, this causes the event callback to be
called before the discard callback which results in an abort signal
being raised by the libevent library.

Review: https://reviews.apache.org/r/43799/
{code}

> Race condition in libevent poll implementation causes crash
> ---
>
> Key: MESOS-4711
> URL: https://issues.apache.org/jira/browse/MESOS-4711
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.28.0
> Environment: CentOS 6.7 running in VirtualBox
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere
> Fix For: 0.28.0, 0.27.2
>
>
> The issue first arose in MESOS-3271, but can be reproduced every time by 
> using the mentioned environment and running:
> {noformat}
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery" 
> --gtest_repeat=1000
> {noformat}
> The problem can be traced back to 
> [{{libevent_poll.cpp}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp].
>  If the event is triggered and the the future associated with the event is 
> discarded, the situation arises in which  
> [{{pollCallback()}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp#L33]
>  starts executing just early enough to finish before 
> [{{pollDiscard()}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp#L53]
>  executes. If that happens, {{pollCallback()}} deletes the poll object and 
> {{pollDiscard()}} is left with a dangling pointer which crashes when it 
> executes the line {{event_active(ev, EV_READ, 0);}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.

2016-02-26 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3271:

Comment: was deleted

(was: {code}
commit 2297a3cf8db2b88860bc839cf934894b1d09dbbc
Author: Alexander Rojas 
Date:   Fri Feb 26 14:38:05 2016 -0800

Removed race condition from libevent based poll implementation.

Under certains circumstances, the future returned by poll is discarded
right after the event is triggered, this causes the event callback to be
called before the discard callback which results in an abort signal
being raised by the libevent library.

Review: https://reviews.apache.org/r/43799/
{code})

> SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
> ---
>
> Key: MESOS-3271
> URL: https://issues.apache.org/jira/browse/MESOS-3271
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Paul Brett
> Attachments: build.txt
>
>
> Test failure on Ubuntu 14 configured with {{--disable-java --disable-python 
> --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation}}
> Commit: {{9b78b301469667b5a44f0a351de5f3a71edae499}}
> {code}
> [ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
> I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
> I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
> 20150815-064146-544909504-51064-12195-S0
> Registered executor on slave1-ubuntu12
> Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
> Forked command at 17114
> sh -c 'sleep 1000'
> [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
> 0x2, fd: 21, flags: 0x80)
> *** Aborted at 1439646107 (unix time) try "date -d @1439646107" if you are 
> using GNU date ***
> PC: @ 0x7f6ba512d0d5 (unknown)
> *** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
> 12195; stack trace: ***
> @ 0x7f6ba54c4cb0 (unknown)
> @ 0x7f6ba512d0d5 (unknown)
> @ 0x7f6ba513083b (unknown)
> @ 0x7f6ba448e1ba (unknown)
> @ 0x7f6ba448e52b (unknown)
> @ 0x7f6ba447dcc9 (unknown)
> @   0x4c4033 process::internal::run<>()
> @ 0x7f6ba72642ab process::Future<>::discard()
> @ 0x7f6ba72643be process::internal::discard<>()
> @ 0x7f6ba7262298 
> _ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
> @   0x4c4033 process::internal::run<>()
> @   0x6fa0cb process::Future<>::discard()
> @ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
> @ 0x7f6ba728fb11 process::ProcessManager::resume()
> @ 0x7f6ba728fe0f process::internal::schedule()
> @ 0x7f6ba5c9d490 (unknown)
> @ 0x7f6ba54bce9a start_thread
> @ 0x7f6ba51ea38d (unknown)
> + /bin/true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   >