[jira] [Created] (MESOS-6224) mesos-docker-executor port collision with container(network:host)

2016-09-21 Thread Wei Wei (JIRA)
Wei Wei created MESOS-6224:
--

 Summary: mesos-docker-executor port collision with 
container(network:host)
 Key: MESOS-6224
 URL: https://issues.apache.org/jira/browse/MESOS-6224
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 1.0.1
 Environment: ubuntu 14.04 LTS
mesos 1.0.1
Reporter: Wei Wei


we implement a scheduler to launch a batch of data process worker in docker 
container, we use network: host and assign random pick port for the process 
worker and encounter a port collision problem:
the port we choose for the container was taken by the mesos-docker-container.

mesos-agent resource config:
{"name":"ports","type":"RANGES","ranges":{"range": 
[{"begin":1,"end":32000}]}}]

root@xs35:lsof -i:31981
COMMAND PID USER   FD   TYPEDEVICE SIZE/OFF NODE NAME
mesos-doc  2835 root8u  IPv4 433237650  0t0  TCP 10.34.38.30:31981 
(LISTEN)

the port was taken by mesos-docker-executor and the framework still offe port 
resource 31981

is there a way to set mesos-docker-executors port range?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6225) mesos-docker-executor port collision with container(network:host)

2016-09-21 Thread Wei Wei (JIRA)
Wei Wei created MESOS-6225:
--

 Summary: mesos-docker-executor port collision with 
container(network:host)
 Key: MESOS-6225
 URL: https://issues.apache.org/jira/browse/MESOS-6225
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 1.0.1
 Environment: ubuntu 14.04 LTS
mesos 1.0.1
Reporter: Wei Wei


we implement a scheduler to launch a batch of data process worker in docker 
container, we use network: host and assign random pick port for the process 
worker and encounter a port collision problem:
the port we choose for the container was taken by the mesos-docker-container.

mesos-agent resource config:
{"name":"ports","type":"RANGES","ranges":{"range": 
[{"begin":1,"end":32000}]}}]

root@xs35:lsof -i:31981
COMMAND PID USER   FD   TYPEDEVICE SIZE/OFF NODE NAME
mesos-doc  2835 root8u  IPv4 433237650  0t0  TCP 10.34.38.30:31981 
(LISTEN)

the port was taken by mesos-docker-executor and the framework still offe port 
resource 31981

is there a way to set mesos-docker-executors port range?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6156) Make the `network/cni` isolator nesting aware

2016-09-21 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511777#comment-15511777
 ] 

Jie Yu commented on MESOS-6156:
---

commit 2a8de6255494eed2c435ef2b80dc846e1c1b5e90
Author: Avinash sridharan 
Date:   Wed Sep 21 17:16:37 2016 -0700

Modified the `network/cni` isolator to be nesting aware.

The network file setup in the `network/cni` isolator is now nesting
aware. Since the children share the network and UTS namespace with the
parent, the network files need to be created only for the parent
container. For the child containers, the network files will be simply
a bind mount of the parents network files.

Review: https://reviews.apache.org/r/51857/

> Make the `network/cni` isolator nesting aware
> -
>
> Key: MESOS-6156
> URL: https://issues.apache.org/jira/browse/MESOS-6156
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Affects Versions: 1.1.0
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
> Fix For: 1.1.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> In pods, child containers share the network and UTS namespace with the parent 
> containers. This implies that during `prepare` and `isolate` the 
> `network/cni` isolator needs to be aware the parent-child relationship 
> between containers to make the following decisions:
> a) During `prepare` a container should be allocated a new network namespace 
> and UTS namespace only if the container is a top level container.
> b) During `isolate` the network files (/etc/hosts, /etc/hostname, 
> /etc/resolv.conf) should be created only for top level containers. The 
> network files for child containers will just be symlinks to the parent 
> containers network files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6219) Improve auto-detection of predefined resource types.

2016-09-21 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-6219:
--
Comment: was deleted

(was: One way to address this may be to have a special resource type:

{code}
message Value {
  enum Type {
SCALAR = 0;
RANGES = 1;
SET = 2;
TEXT = 3;
+AUTO = 4;
  }
{code}


- We only do auto-detection for predefined resources and they are identified by 
names and other fields so the {{type}} is already custom defined. i.e., it 
doesn't make sense to have {{cpus}} with {{type=RANGES}}. When the parser sees 
{{type=AUTO}} for {{cpus}}, it auto-detects the value and then assigns the type.
- For custom resources we don't support auto-detection anyways so {{type=AUTO}} 
would be invalid and the parser would bail.)

> Improve auto-detection of predefined resource types.
> 
>
> Key: MESOS-6219
> URL: https://issues.apache.org/jira/browse/MESOS-6219
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>
> Mesos agent currently auto-detects certain predefined resource types when 
> they are not specified. These include cpus, mem, (root) disk, gpus and also 
> MESOS-6062 is adding auto-detection for mount disks when the size is 
> specified as a special value "0". 
> Due to the limitation of the {{Resources}} abstraction the caller of 
> {{Resources::parse()}} can't tell if a resource is intentionally specified 
> with an empty value or unspecified. The current resource auto-detection in 
> {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag 
> to check if a resource is specified, this is very fragile, e.g., it would 
> think {{gpus:0}} is specified (and not auto-detected) if there is a mount 
> disk with its root being {{/biggpush}}.
> It would be nice if we can have the user explicitly specify the intention to 
> have value of a standard resource auto-detected (at least with the JSON 
> input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6223) Allow agents to re-register post a host reboot

2016-09-21 Thread Megha (JIRA)
Megha created MESOS-6223:


 Summary: Allow agents to re-register post a host reboot
 Key: MESOS-6223
 URL: https://issues.apache.org/jira/browse/MESOS-6223
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Reporter: Megha


Agent does’t recover its state post a host reboot, it registers with the master 
and gets a new SlaveID. With partition awareness, the agents are now allowed to 
re-register after they have been marked Unreachable. The executors are anyway 
terminated on the agent when it reboots so there is no harm in letting the 
agent keep its SlaveID, re-register with the master and reconcile the lost 
executors. This is a pre-requisite for supporting persistent/restartable tasks 
in mesos (https://issues.apache.org/jira/browse/MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6223) Allow agents to re-register post a host reboot

2016-09-21 Thread Megha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megha updated MESOS-6223:
-
Description: Agent does’t recover its state post a host reboot, it 
registers with the master and gets a new SlaveID. With partition awareness, the 
agents are now allowed to re-register after they have been marked Unreachable. 
The executors are anyway terminated on the agent when it reboots so there is no 
harm in letting the agent keep its SlaveID, re-register with the master and 
reconcile the lost executors. This is a pre-requisite for supporting 
persistent/restartable tasks in mesos (MESOS-3545).  (was: Agent does’t recover 
its state post a host reboot, it registers with the master and gets a new 
SlaveID. With partition awareness, the agents are now allowed to re-register 
after they have been marked Unreachable. The executors are anyway terminated on 
the agent when it reboots so there is no harm in letting the agent keep its 
SlaveID, re-register with the master and reconcile the lost executors. This is 
a pre-requisite for supporting persistent/restartable tasks in mesos 
(https://issues.apache.org/jira/browse/MESOS-3545).)

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Megha
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6222) Improve organization of methods related to Resources

2016-09-21 Thread Yan Xu (JIRA)
Yan Xu created MESOS-6222:
-

 Summary: Improve organization of methods related to Resources
 Key: MESOS-6222
 URL: https://issues.apache.org/jira/browse/MESOS-6222
 Project: Mesos
  Issue Type: Improvement
  Components: c++ api
Reporter: Yan Xu


Currently the {{Resources}} class is used to dump everything loosely related to 
the protobuf {{Resource}} as member methods.

As a result {{Resources}} has a large amount of utility methods that are not 
related to the {{Resources}} abstraction.

Examples:
{code:title=This returns a protobuf Resource and not Resources.}
static Try parse(
  const std::string& name,
  const std::string& value,
  const std::string& role);
{code}

{code:title=This only looks at the protobuf too.}
static bool isPersistentVolume(const Resource& resource);
{code}

This makes it hard to name and distinguish similar methods which work at 
different abstraction levels (see the {{parse(text, role)}} function below).

It would be way simpler to have them as namespaced free-standing functions.

{code:title=}
namespace mesos {

// Methods for the protobuf `Resource`.
namespace resource {
  Try parse(
  const std::string& name,
  const std::string& value,
  const std::string& role);

  bool isPersistentVolume(const Resource& resource);

  // Now I can add a `parse` method for multiple resources but at the 
`Resource` level.
  Try parse(
  const std::string& text,
  const std::string& defaultRole = "*");
}

// Methods for the `Resources` abstraction.
namespace resources {
   ...
}
}
{code}

Static member methods of Resources are still fine if they directly pertain to 
the {{Resources}} abstraction itself and we can use the private members/methods 
when useful.

{code:title=e.g., Resources::flatten() uses the internal validation-free `add` 
method as opposed to `+=` to accumulate resource objects}
Try flatten(
  const std::string& role,
  const Option& reservation = None()) const;
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6221) Ability to post maintenance/schedule with better granularity

2016-09-21 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511324#comment-15511324
 ] 

Joseph Wu commented on MESOS-6221:
--

The race is definitely a possibility.  The current implementation of 
maintenance primitives is an MVP.  We're waiting for some community 
adoption/feedback (particularly framework support) before hardening the feature 
further.  

As part of the MVP, we decided it would be logically simpler to assume only one 
operator does any maintenance, including changing the schedule.  (Note that 
there are TODOs in the codebase about having multiple schedules.  That would be 
one way of isolating two operators.)

> Ability to post maintenance/schedule with better granularity
> 
>
> Key: MESOS-6221
> URL: https://issues.apache.org/jira/browse/MESOS-6221
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Huadong Liu
>
> Currently the maintenance schedule update is at cluster granularity: "To 
> update the maintenance schedule, the operator should first read the current 
> schedule, make any necessary changes, and then post the modified schedule." 
> http://mesos.apache.org/documentation/latest/maintenance/
> In contrast, the machine/down and up endpoints operate at host granularity. 
> One or a set of hosts can be moved to DOWN mode or UP mode once the schedule 
> exists.
> Requiring to GET current schedule before POSTing an updated schedule may 
> create races if machine/up and maintenance/schedule update happen at 
> different hosts/processes, for example.
> 1. mesos master has host A in maintenance down mode.
> 2. process p1 tries to UP host A.
> 3. process p2 tries to get the current schedule and then append host B to the 
> schedule.
> 4. mesos master may end up have A and B in maintenance DRAIN mode although 
> the desired result is to have B in DRAIN mode only.
> I cannot find a document to explain why the maintenance schedule has to be 
> updated at the cluster granularity. Although the problem can be resolved by 
> external synchronization, having the ability to update maintenance schedule 
> at hosts granularity seems a better choice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6221) Ability to post maintenance/schedule with better granularity

2016-09-21 Thread Huadong Liu (JIRA)
Huadong Liu created MESOS-6221:
--

 Summary: Ability to post maintenance/schedule with better 
granularity
 Key: MESOS-6221
 URL: https://issues.apache.org/jira/browse/MESOS-6221
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Reporter: Huadong Liu


Currently the maintenance schedule update is at cluster granularity: "To update 
the maintenance schedule, the operator should first read the current schedule, 
make any necessary changes, and then post the modified schedule." 
http://mesos.apache.org/documentation/latest/maintenance/

In contrast, the machine/down and up endpoints operate at host granularity. One 
or a set of hosts can be moved to DOWN mode or UP mode once the schedule exists.

Requiring to GET current schedule before POSTing an updated schedule may create 
races if machine/up and maintenance/schedule update happen at different 
hosts/processes, for example.

1. mesos master has host A in maintenance down mode.
2. process p1 tries to UP host A.
3. process p2 tries to get the current schedule and then append host B to the 
schedule.
4. mesos master may end up have A and B in maintenance DRAIN mode although the 
desired result is to have B in DRAIN mode only.

I cannot find a document to explain why the maintenance schedule has to be 
updated at the cluster granularity. Although the problem can be resolved by 
external synchronization, having the ability to update maintenance schedule at 
hosts granularity seems a better choice.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6220) HTTP handler failures should result in 500 response rather than 503 response.

2016-09-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6220:
--
Shepherd: Vinod Kone

> HTTP handler failures should result in 500 response rather than 503 response.
> -
>
> Key: MESOS-6220
> URL: https://issues.apache.org/jira/browse/MESOS-6220
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Minor
> Fix For: 1.1.0
>
>
> Currently, when an HTTP handler fails, libprocess will send a {{503 Service 
> Unavailable}} (see 
> [here|https://github.com/apache/mesos/blob/8322b403cbaa1ace61733b61d4325ec6ee808ffd/3rdparty/libprocess/src/process.cpp#L1232-L1234].
> However, more appropriate would be to send a {{500 Internal Server Error}} 
> given the documented behavior of these statuses:
> *500 Internal Server Error*
> A generic error message, given when an unexpected condition was encountered 
> and no more specific message is suitable.
> *503 Service Unavailable*
> The server is currently unavailable (because it is overloaded or down for 
> maintenance). Generally, this is a temporary state.
> From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6220) HTTP handler failures should result in 500 response rather than 503 response.

2016-09-21 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511270#comment-15511270
 ] 

Benjamin Mahler commented on MESOS-6220:


Added as an API change in the CHANGELOG:

{noformat}
commit a640eb25a9ebc4b2409c86df5cc71bd625427001
Author: Benjamin Mahler 
Date:   Wed Sep 21 14:36:49 2016 -0700

Added MESOS-6220 as an API change to the 1.1.0 CHANGELOG.
{noformat}

> HTTP handler failures should result in 500 response rather than 503 response.
> -
>
> Key: MESOS-6220
> URL: https://issues.apache.org/jira/browse/MESOS-6220
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Minor
> Fix For: 1.1.0
>
>
> Currently, when an HTTP handler fails, libprocess will send a {{503 Service 
> Unavailable}} (see 
> [here|https://github.com/apache/mesos/blob/8322b403cbaa1ace61733b61d4325ec6ee808ffd/3rdparty/libprocess/src/process.cpp#L1232-L1234].
> However, more appropriate would be to send a {{500 Internal Server Error}} 
> given the documented behavior of these statuses:
> *500 Internal Server Error*
> A generic error message, given when an unexpected condition was encountered 
> and no more specific message is suitable.
> *503 Service Unavailable*
> The server is currently unavailable (because it is overloaded or down for 
> maintenance). Generally, this is a temporary state.
> From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6220) HTTP handler failures should result in 500 response rather than 503 response.

2016-09-21 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-6220:
--

Assignee: Benjamin Mahler

> HTTP handler failures should result in 500 response rather than 503 response.
> -
>
> Key: MESOS-6220
> URL: https://issues.apache.org/jira/browse/MESOS-6220
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Minor
>
> Currently, when an HTTP handler fails, libprocess will send a {{503 Service 
> Unavailable}} (see 
> [here|https://github.com/apache/mesos/blob/8322b403cbaa1ace61733b61d4325ec6ee808ffd/3rdparty/libprocess/src/process.cpp#L1232-L1234].
> However, more appropriate would be to send a {{500 Internal Server Error}} 
> given the documented behavior of these statuses:
> *500 Internal Server Error*
> A generic error message, given when an unexpected condition was encountered 
> and no more specific message is suitable.
> *503 Service Unavailable*
> The server is currently unavailable (because it is overloaded or down for 
> maintenance). Generally, this is a temporary state.
> From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6220) HTTP handler failures should result in 500 response rather than 503 response.

2016-09-21 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6220:
--

 Summary: HTTP handler failures should result in 500 response 
rather than 503 response.
 Key: MESOS-6220
 URL: https://issues.apache.org/jira/browse/MESOS-6220
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Benjamin Mahler
Priority: Minor


Currently, when an HTTP handler fails, libprocess will send a {{503 Service 
Unavailable}} (see 
[here|https://github.com/apache/mesos/blob/8322b403cbaa1ace61733b61d4325ec6ee808ffd/3rdparty/libprocess/src/process.cpp#L1232-L1234].

However, more appropriate would be to send a {{500 Internal Server Error}} 
given the documented behavior of these statuses:

*500 Internal Server Error*
A generic error message, given when an unexpected condition was encountered and 
no more specific message is suitable.

*503 Service Unavailable*
The server is currently unavailable (because it is overloaded or down for 
maintenance). Generally, this is a temporary state.

>From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6219) Improve auto-detection of predefined resource types.

2016-09-21 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-6219:
--
Description: 
Mesos agent currently auto-detects certain predefined resource types when they 
are not specified. These include cpus, mem, (root) disk, gpus and also 
MESOS-6062 is adding auto-detection for mount disks when the size is specified 
as a special value "0". 

Due to the limitation of the {{Resources}} abstraction the caller of 
{{Resources::parse()}} can't tell if a resource is intentionally specified with 
an empty value or unspecified. The current resource auto-detection in 
{{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to 
check if a resource is specified, this is very fragile, e.g., it would think 
{{gpus:0}} is specified (and not auto-detected) if there is a mount disk with 
its root being {{/biggpush}}.

It would be nice if we can have the user explicitly specify the intention to 
have value of a standard resource auto-detected (at least with the JSON input).

  was:
Mesos agent currently auto-detects certain predefined resource types when they 
are not specified. These include cpus, mem, (root) disk, gpus and also 
MESOS-6062 is adding auto-detection for mount disks when the size is specified 
as a special value "0". 

Due to the limitation of the {{Resources}} abstraction the caller of 
{{Resources::parse()}} can't tell if a resource is intentionally specified with 
an empty value or unspecified. The current resource auto-detection in 
{{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to 
check if resource is specified, this is very fragile, e.g., it would think 
{{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with 
its root being {{/biggpush}}.

It would be nice if we can have the user explicitly specify the intention to 
have value of a standard resource auto-detected (at least with the JSON input).


> Improve auto-detection of predefined resource types.
> 
>
> Key: MESOS-6219
> URL: https://issues.apache.org/jira/browse/MESOS-6219
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>
> Mesos agent currently auto-detects certain predefined resource types when 
> they are not specified. These include cpus, mem, (root) disk, gpus and also 
> MESOS-6062 is adding auto-detection for mount disks when the size is 
> specified as a special value "0". 
> Due to the limitation of the {{Resources}} abstraction the caller of 
> {{Resources::parse()}} can't tell if a resource is intentionally specified 
> with an empty value or unspecified. The current resource auto-detection in 
> {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag 
> to check if a resource is specified, this is very fragile, e.g., it would 
> think {{gpus:0}} is specified (and not auto-detected) if there is a mount 
> disk with its root being {{/biggpush}}.
> It would be nice if we can have the user explicitly specify the intention to 
> have value of a standard resource auto-detected (at least with the JSON 
> input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6219) Improve auto-detection of predefined resource types.

2016-09-21 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511178#comment-15511178
 ] 

Yan Xu commented on MESOS-6219:
---

One way to address this may be to have a special resource type:

{code}
message Value {
  enum Type {
SCALAR = 0;
RANGES = 1;
SET = 2;
TEXT = 3;
+AUTO = 4;
  }
{code}


- We only do auto-detection for predefined resources and they are identified by 
names and other fields so the {{type}} is already custom defined. i.e., it 
doesn't make sense to have {{cpus}} with {{type=RANGES}}. When the parser sees 
{{type=AUTO}} for {{cpus}}, it auto-detects the value and then assigns the type.
- For custom resources we don't support auto-detection anyways so {{type=AUTO}} 
would be invalid and the parser would bail.

> Improve auto-detection of predefined resource types.
> 
>
> Key: MESOS-6219
> URL: https://issues.apache.org/jira/browse/MESOS-6219
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>
> Mesos agent currently auto-detects certain predefined resource types when 
> they are not specified. These include cpus, mem, (root) disk, gpus and also 
> MESOS-6062 is adding auto-detection for mount disks when the size is 
> specified as a special value "0". 
> Due to the limitation of the {{Resources}} abstraction the caller of 
> {{Resources::parse()}} can't tell if a resource is intentionally specified 
> with an empty value or unspecified. The current resource auto-detection in 
> {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag 
> to check if resource is specified, this is very fragile, e.g., it would think 
> {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with 
> its root being {{/biggpush}}.
> It would be nice if we can have the user explicitly specify the intention to 
> have value of a standard resource auto-detected (at least with the JSON 
> input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6219) Improve auto-detection of predefined resource types.

2016-09-21 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-6219:
--
Description: 
Mesos agent currently auto-detects certain predefined resource types when they 
are not specified. These include cpus, mem, (root) disk, gpus and also 
MESOS-6062 is adding auto-detection for mount disks when the size is specified 
as a special value "0". 

Due to the limitation of the {{Resources}} abstraction the caller of 
{{Resources::parse()}} can't tell if a resource is intentionally specified with 
an empty value or unspecified. The current resource auto-detection in 
{{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to 
check if resource is specified, this is very fragile, e.g., it would think 
{{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with 
its root being {{/biggpush}}.

It would be nice if we can have the user explicitly specify the intention to 
have value of a standard resource auto-detected (at least with the JSON input).

  was:
Mesos agent currently auto-detects certain standard resources when they are not 
specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 is 
adding auto-detection for mount disks when the size is specified as a special 
value "0". 

Due to the limitation of the {{Resources}} abstraction the caller of 
{{Resources::parse()}} can't tell if a resource is intentionally specified with 
an empty value or unspecified. The current resource auto-detection in 
{{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to 
check if resource is specified, this is very fragile, e.g., it would think 
{{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with 
its root being {{/biggpush}}.

It would be nice if we can have the user explicitly specify the intention to 
have value of a standard resource auto-detected (at least with the JSON input).


> Improve auto-detection of predefined resource types.
> 
>
> Key: MESOS-6219
> URL: https://issues.apache.org/jira/browse/MESOS-6219
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>
> Mesos agent currently auto-detects certain predefined resource types when 
> they are not specified. These include cpus, mem, (root) disk, gpus and also 
> MESOS-6062 is adding auto-detection for mount disks when the size is 
> specified as a special value "0". 
> Due to the limitation of the {{Resources}} abstraction the caller of 
> {{Resources::parse()}} can't tell if a resource is intentionally specified 
> with an empty value or unspecified. The current resource auto-detection in 
> {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag 
> to check if resource is specified, this is very fragile, e.g., it would think 
> {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with 
> its root being {{/biggpush}}.
> It would be nice if we can have the user explicitly specify the intention to 
> have value of a standard resource auto-detected (at least with the JSON 
> input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6219) Improve auto-detection of predefined resource types.

2016-09-21 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-6219:
--
Summary: Improve auto-detection of predefined resource types.  (was: 
Improve auto-detection of built-in resources)

> Improve auto-detection of predefined resource types.
> 
>
> Key: MESOS-6219
> URL: https://issues.apache.org/jira/browse/MESOS-6219
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Yan Xu
>
> Mesos agent currently auto-detects certain standard resources when they are 
> not specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 
> is adding auto-detection for mount disks when the size is specified as a 
> special value "0". 
> Due to the limitation of the {{Resources}} abstraction the caller of 
> {{Resources::parse()}} can't tell if a resource is intentionally specified 
> with an empty value or unspecified. The current resource auto-detection in 
> {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag 
> to check if resource is specified, this is very fragile, e.g., it would think 
> {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with 
> its root being {{/biggpush}}.
> It would be nice if we can have the user explicitly specify the intention to 
> have value of a standard resource auto-detected (at least with the JSON 
> input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6219) Improve auto-detection of built-in resources

2016-09-21 Thread Yan Xu (JIRA)
Yan Xu created MESOS-6219:
-

 Summary: Improve auto-detection of built-in resources
 Key: MESOS-6219
 URL: https://issues.apache.org/jira/browse/MESOS-6219
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Yan Xu


Mesos agent currently auto-detects certain standard resources when they are not 
specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 is 
adding auto-detection for mount disks when the size is specified as a special 
value "0". 

Due to the limitation of the {{Resources}} abstraction the caller of 
{{Resources::parse()}} can't tell if a resource is intentionally specified with 
an empty value or unspecified. The current resource auto-detection in 
{{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to 
check if resource is specified, this is very fragile, e.g., it would think 
{{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with 
its root being {{/biggpush}}.

It would be nice if we can have the user explicitly specify the intention to 
have value of a standard resource auto-detected (at least with the JSON input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.

2016-09-21 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510671#comment-15510671
 ] 

haosdent commented on MESOS-5070:
-

Hi, [~benjaminhindman] Thanks for your comment! 

{quote}
(1) This implementation will never enter a pid namespace properly and there's 
no check that someone isn't passing in a pid namespace ... bug?
{quote}

I saw we didn't support enter pid namesapce in {{setns}} before. And I think 
{{mnt}} and {{net}} should be enough for health check although enter to all 
namespaces would be better. 

{quote}
(2) This should not live in src/health-check/health_checker.cpp
{quote} 

Yes, alexr told me we should add {{Subprocess::ChildHook::SETNS}} like 
[Subprocess::ChildHook::SUPERVISOR | 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/subprocess.cpp#L98]
 cc [~alexr] Please correct me if I understand wrong. 

> Introduce more flexible subprocess interface for child options.
> ---
>
> Key: MESOS-5070
> URL: https://issues.apache.org/jira/browse/MESOS-5070
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: tech-debt
>
> We introduced a number of parameters to the subprocess interface with 
> MESOS-5049.
> Adding all options explicitly to the subprocess interface makes it 
> inflexible. 
> We should investigate a flexible options, which still prevents arbitrary code 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra

2016-09-21 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510662#comment-15510662
 ] 

Charles Allen commented on MESOS-6213:
--

I'm really curious what changed in my build env that allowed this to pass :-/

> Build failure on macOS Sierra
> -
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6218) Avoided to concat cgroup internally in subsystems.

2016-09-21 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-6218:
---

Assignee: haosdent

> Avoided to concat cgroup internally in subsystems.
> --
>
> Key: MESOS-6218
> URL: https://issues.apache.org/jira/browse/MESOS-6218
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>
> Now we using using {{path::join(flags.cgroups_root, containerId.value())}} to 
> concat cgroup internally in subsystems, we should avoid this and pass it to 
> the subsystems directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6218) Avoided to concat cgroup internally in subsystems.

2016-09-21 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6218:

Description: Now we using using {{path::join(flags.cgroups_root, 
containerId.value())}} to concat cgroup internally in subsystems, we should 
avoid this and pass it to the subsystems directly.

> Avoided to concat cgroup internally in subsystems.
> --
>
> Key: MESOS-6218
> URL: https://issues.apache.org/jira/browse/MESOS-6218
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>
> Now we using using {{path::join(flags.cgroups_root, containerId.value())}} to 
> concat cgroup internally in subsystems, we should avoid this and pass it to 
> the subsystems directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6218) Avoided to concat cgroup internally in subsystems.

2016-09-21 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6218:

Issue Type: Improvement  (was: Task)

> Avoided to concat cgroup internally in subsystems.
> --
>
> Key: MESOS-6218
> URL: https://issues.apache.org/jira/browse/MESOS-6218
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6218) Avoided to concat cgroup internally in subsystems.

2016-09-21 Thread haosdent (JIRA)
haosdent created MESOS-6218:
---

 Summary: Avoided to concat cgroup internally in subsystems.
 Key: MESOS-6218
 URL: https://issues.apache.org/jira/browse/MESOS-6218
 Project: Mesos
  Issue Type: Task
Reporter: haosdent






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.

2016-09-21 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510527#comment-15510527
 ] 

Jie Yu commented on MESOS-5070:
---

commit 3c730dc42e35d13dbeaf69ce391766d4ba7ba962
Author: Joerg Schad 
Date:   Wed Sep 21 09:50:43 2016 -0700

Converted watchdog option into childhook in Mesos [2/2].

Review: https://reviews.apache.org/r/52121/

commit 6d58c241716244d29a932440eabed31dccb639cf
Author: Joerg Schad 
Date:   Wed Sep 21 09:50:39 2016 -0700

Converted watchdog option into a childhook in libprocess [1/2].

Review: https://reviews.apache.org/r/52120/

commit f4be028f5eb8c04a2c13a58863896ad4f571d541
Author: Joerg Schad 
Date:   Wed Sep 21 09:50:36 2016 -0700

Renamed Hook to ParentHook in Mesos [2/2].

Review: https://reviews.apache.org/r/52018/

commit 6a4a4f1a29301c1bac74042138f6bb3428adc9f3
Author: Joerg Schad 
Date:   Wed Sep 21 09:50:33 2016 -0700

Renamed Hook to parent Hook in libprocess [1/2].

Review: https://reviews.apache.org/r/52017/

commit 1db3bbb1692406f6395a21624d2041f220eca744
Author: Joerg Schad 
Date:   Wed Sep 21 09:50:30 2016 -0700

Replaces Hook::None() by {} in Mesos [2/2].

Review: https://reviews.apache.org/r/52016/

commit 2af7e5ebae976735d263f45e540c886111d27982
Author: Joerg Schad 
Date:   Wed Sep 21 09:50:26 2016 -0700

Used {} instead of Hook::None() in libprocess [1/2].

Review: https://reviews.apache.org/r/52015/

commit 059f47bfe82c2c589cefbf0bcab131697cb0d9f9
Author: Joerg Schad 
Date:   Wed Sep 21 09:50:19 2016 -0700

Used ChildHooks in Mesos [2/2].

We now use the new ChildHooks instead of explicit options such
as setsid.

Review: https://reviews.apache.org/r/45492/

commit 5ce0e46aeb083de1af09d53364ac7260441e9e94
Author: Joerg Schad 
Date:   Wed Sep 21 09:50:15 2016 -0700

Refactored subprocess options [1/2].

Previously the subprocess interface supported a several options for the
child process such as setsid. In order to make the interface more
flexible we refactored such options into a vector of ChildHooks.
In order not to allow arbitrary code inside a ChildHook it has to be
constructed via pre-defined factory methods.

Review: https://reviews.apache.org/r/45491/


> Introduce more flexible subprocess interface for child options.
> ---
>
> Key: MESOS-5070
> URL: https://issues.apache.org/jira/browse/MESOS-5070
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: tech-debt
>
> We introduced a number of parameters to the subprocess interface with 
> MESOS-5049.
> Adding all options explicitly to the subprocess interface makes it 
> inflexible. 
> We should investigate a flexible options, which still prevents arbitrary code 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.

2016-09-21 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510522#comment-15510522
 ] 

Jie Yu commented on MESOS-5070:
---

That's still tricky because ns::enter is in src/linux/ns.hpp. So we need to 
pull ns functions to stout to make it work.

> Introduce more flexible subprocess interface for child options.
> ---
>
> Key: MESOS-5070
> URL: https://issues.apache.org/jira/browse/MESOS-5070
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: tech-debt
>
> We introduced a number of parameters to the subprocess interface with 
> MESOS-5049.
> Adding all options explicitly to the subprocess interface makes it 
> inflexible. 
> We should investigate a flexible options, which still prevents arbitrary code 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.

2016-09-21 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510520#comment-15510520
 ] 

Jie Yu commented on MESOS-5070:
---

I think we should hide parent_hooks, child_hooks in the subprocess interface on 
windows platforms (i.e., #ifdef) so the windows code don't accidentally touch 
it, expecting it to work.

In long term, I agree with [~js84] that: since we don't allow arbitrary child 
hooks, we may be able to support that on windows. We just need to convert child 
hooks to proper 'CreateProcess' parameter.

> Introduce more flexible subprocess interface for child options.
> ---
>
> Key: MESOS-5070
> URL: https://issues.apache.org/jira/browse/MESOS-5070
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: tech-debt
>
> We introduced a number of parameters to the subprocess interface with 
> MESOS-5049.
> Adding all options explicitly to the subprocess interface makes it 
> inflexible. 
> We should investigate a flexible options, which still prevents arbitrary code 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5070) Introduce more flexible subprocess interface for child options.

2016-09-21 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510435#comment-15510435
 ] 

Joerg Schad edited comment on MESOS-5070 at 9/21/16 4:51 PM:
-

The current state is that they only work on linux (windows also ignores the 
explicit options right now and as you pointed out does not care about 
parentHooks).

If we are are talking about adding windows support in the future childHooks 
have one advantage over parentHooks in my opinion:
While parentHooks can be arbitrary functions, childHooks are constrained (via 
the factory methods) to a small set of predefined hooks to which we could add 
some form of id and enable windows to implement its own version of it.

Maybe [~jieyu] can add more details here.



was (Author: js84):
The current state is that they only work on linux (windows also ignores the 
explicit options right now and as you pointed out does not care about 
parentHooks).

If we are are talking about adding windows support in the future childHooks 
have one advantage over parentHooks in my opinion:
While parentHooks can be arbitrary functions, childHooks are constrained (via 
the factory methods) to a small set of predefined hooks -to which we could add 
some form of id and enable windows to implement its own version of it-.

Maybe [~jieyu] can add more details here.


> Introduce more flexible subprocess interface for child options.
> ---
>
> Key: MESOS-5070
> URL: https://issues.apache.org/jira/browse/MESOS-5070
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: tech-debt
>
> We introduced a number of parameters to the subprocess interface with 
> MESOS-5049.
> Adding all options explicitly to the subprocess interface makes it 
> inflexible. 
> We should investigate a flexible options, which still prevents arbitrary code 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6217) PAGE_SIZE was not declared in PPC64LE

2016-09-21 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6217:

Story Points: 1
  Labels: cgroups ppc64  (was: )

> PAGE_SIZE was not declared in PPC64LE
> -
>
> Key: MESOS-6217
> URL: https://issues.apache.org/jira/browse/MESOS-6217
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: cgroups, ppc64
>
> When compile Mesos in PPC64LE, get this error
> {code}
> ../../src/slave/containerizer/mesos/isolators/gpu/isolator.cpp  -fPIC -DPIC 
> -o 
> slave/containerizer/mesos/isolators/gpu/.libs/libmesos_no_3rdparty_la-isolator.o
> ../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp: 
> In member function 'virtual process::Future 
> mesos::internal::slave::MemorySubsystem::update(const mesos::ContainerID&, 
> const mesos::Resources&)':
> ../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp:230:55:
>  error: 'PAGE_SIZE' was not declared in this scope
>Bytes initialLimit(static_cast(LONG_MAX / PAGE_SIZE * 
> PAGE_SIZE));
>^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.

2016-09-21 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510447#comment-15510447
 ] 

Benjamin Hindman commented on MESOS-5070:
-

[~alexr], [~jieyu], [~haosdent], regarding 
{{mesos::internal::health::cloneWithSetns}} function:

(1) This implementation will never enter a pid namespace properly and there's 
no check that someone isn't passing in a pid namespace ... bug?
(2) This should not live in {{src/health-check/health_checker.cpp}} as it's a 
generic function that others probably want to reuse. In fact, it's very 
reminiscent to the {{ns::enter}} function we recently wrote for the nested 
containerization stuff that was later replaced with {{ns::clone}}, and I'd 
rather us reintroduce a generic {{ns::enter}} that lots of people can use 
rather than implement one-offs throughout the code base.

> Introduce more flexible subprocess interface for child options.
> ---
>
> Key: MESOS-5070
> URL: https://issues.apache.org/jira/browse/MESOS-5070
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: tech-debt
>
> We introduced a number of parameters to the subprocess interface with 
> MESOS-5049.
> Adding all options explicitly to the subprocess interface makes it 
> inflexible. 
> We should investigate a flexible options, which still prevents arbitrary code 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.

2016-09-21 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510435#comment-15510435
 ] 

Joerg Schad commented on MESOS-5070:


The current state is that they only work on linux (windows also ignores the 
explicit options right now and as you pointed out does not care about 
parentHooks).

If we are are talking about adding windows support in the future childHooks 
have one advantage over parentHooks in my opinion:
While parentHooks can be arbitrary functions, childHooks are constrained (via 
the factory methods) to a small set of predefined hooks -to which we could add 
some form of id and enable windows to implement its own version of it-.

Maybe [~jieyu] can add more details here.


> Introduce more flexible subprocess interface for child options.
> ---
>
> Key: MESOS-5070
> URL: https://issues.apache.org/jira/browse/MESOS-5070
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: tech-debt
>
> We introduced a number of parameters to the subprocess interface with 
> MESOS-5049.
> Adding all options explicitly to the subprocess interface makes it 
> inflexible. 
> We should investigate a flexible options, which still prevents arbitrary code 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6217) PAGE_SIZE was not declared in PPC64LE

2016-09-21 Thread haosdent (JIRA)
haosdent created MESOS-6217:
---

 Summary: PAGE_SIZE was not declared in PPC64LE
 Key: MESOS-6217
 URL: https://issues.apache.org/jira/browse/MESOS-6217
 Project: Mesos
  Issue Type: Bug
Reporter: haosdent
Assignee: haosdent


When compile Mesos in PPC64LE, get this error

{code}
../../src/slave/containerizer/mesos/isolators/gpu/isolator.cpp  -fPIC -DPIC -o 
slave/containerizer/mesos/isolators/gpu/.libs/libmesos_no_3rdparty_la-isolator.o
../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp: In 
member function 'virtual process::Future 
mesos::internal::slave::MemorySubsystem::update(const mesos::ContainerID&, 
const mesos::Resources&)':
../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp:230:55:
 error: 'PAGE_SIZE' was not declared in this scope
   Bytes initialLimit(static_cast(LONG_MAX / PAGE_SIZE * PAGE_SIZE));
   ^
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6063) Track recovered and prepared subsystems for a container

2016-09-21 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6063:

Description: Currently, when we restart Mesos Agent with different cgroups 
subsystems, the exist containers would recover failed on newly added 
subsystems. In this case, we ignore them and continue to perform {{usage}}, 
{{status}} and {{cleanup}} on them.  It would be better that we track recovered 
and prepared subsystems for a container. Then ignore perform {{update}}, 
{{wait}}, {{usage}}, {{status}} on them.  (was: Currently, when we restart 
Mesos Agent with different cgroups subsystems, the exist containers would 
recover failed on newly added subsystems. In this case, we ignore them and 
continue to perform `usage`, `status` and `cleanup` on them.  It would be 
better that we track recovered and prepared subsystems for a container. Then 
ignore perform `update`, `wait`, `usage`, `status` on them.)

> Track recovered and prepared subsystems for a container
> ---
>
> Key: MESOS-6063
> URL: https://issues.apache.org/jira/browse/MESOS-6063
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>  Labels: cgroups
>
> Currently, when we restart Mesos Agent with different cgroups subsystems, the 
> exist containers would recover failed on newly added subsystems. In this 
> case, we ignore them and continue to perform {{usage}}, {{status}} and 
> {{cleanup}} on them.  It would be better that we track recovered and prepared 
> subsystems for a container. Then ignore perform {{update}}, {{wait}}, 
> {{usage}}, {{status}} on them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.

2016-09-21 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510413#comment-15510413
 ] 

Benjamin Hindman commented on MESOS-5070:
-

[~joerg84]: was just looking over some of the reviews for this. Was the 
assumption that adding child hooks would only be for Linux? Even the current 
parent hooks don't work for Windows IIUC and it's unlikely that we'll EVER be 
able to do child hooks on Windows ...

> Introduce more flexible subprocess interface for child options.
> ---
>
> Key: MESOS-5070
> URL: https://issues.apache.org/jira/browse/MESOS-5070
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: tech-debt
>
> We introduced a number of parameters to the subprocess interface with 
> MESOS-5049.
> Adding all options explicitly to the subprocess interface makes it 
> inflexible. 
> We should investigate a flexible options, which still prevents arbitrary code 
> to be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2564) Kill superfluous forward declaration comments.

2016-09-21 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510409#comment-15510409
 ] 

Alexander Rukletsov commented on MESOS-2564:


https://reviews.apache.org/r/32608/

> Kill superfluous forward declaration comments.
> --
>
> Key: MESOS-2564
> URL: https://issues.apache.org/jira/browse/MESOS-2564
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: easyfix, newbie
>
> We often prepend forward declarations with a comment, which is pretty 
> useless, e.g.: 
> {code}
> // Forward declarations.
> class LogStorageProcess;
> {code}
> or
> {code}
> // Forward declarations.
> namespace registry {
> class Slaves;
> }
> class Authorizer;
> class WhitelistWatcher;
> {code}
> This JIRA aims to clean up such comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510107#comment-15510107
 ] 

Qian Zhang commented on MESOS-6215:
---

RR: https://reviews.apache.org/r/52118/

> Add support for opaque whiteout (.wh..wh..opq) in provisioner
> -
>
> Key: MESOS-6215
> URL: https://issues.apache.org/jira/browse/MESOS-6215
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
> Attachments: whiteout.diff
>
>
> In a Docker image, there can be a opaque whiteout entry (a file with the name 
> {{.wh..wh..opq}}) under a directory which indicates all siblings under that 
> directory should be removed. But currently Mesos provisioner does not support 
> to handle such opaque whiteout entry which will cause launching container 
> with some Docker images fails, e.g.:
> {code}
> $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
> --docker_image=rabbitmq --command="sleep 100"
> I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
> I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
> master@192.168.122.171:5050
> Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
> Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to remove whiteout file 
> '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
>  No such file or directory'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Check OCI image spec for more details about opaque whiteout:
> https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509895#comment-15509895
 ] 

Qian Zhang commented on MESOS-6215:
---

That is because we wrongly cut the {{.wh.}} from {{.wh..wh..opq}} with 
{{substr()}} in Mesos provisioner's code:
https://github.com/apache/mesos/blob/1.0.1/src/slave/containerizer/mesos/provisioner/provisioner.cpp#L359:L360

> Add support for opaque whiteout (.wh..wh..opq) in provisioner
> -
>
> Key: MESOS-6215
> URL: https://issues.apache.org/jira/browse/MESOS-6215
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
> Attachments: whiteout.diff
>
>
> In a Docker image, there can be a opaque whiteout entry (a file with the name 
> {{.wh..wh..opq}}) under a directory which indicates all siblings under that 
> directory should be removed. But currently Mesos provisioner does not support 
> to handle such opaque whiteout entry which will cause launching container 
> with some Docker images fails, e.g.:
> {code}
> $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
> --docker_image=rabbitmq --command="sleep 100"
> I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
> I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
> master@192.168.122.171:5050
> Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
> Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to remove whiteout file 
> '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
>  No such file or directory'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Check OCI image spec for more details about opaque whiteout:
> https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509466#comment-15509466
 ] 

Stéphane Cottin commented on MESOS-6215:


>From the example on top : lists/partial/.wh..opq': No such file or directory

> Add support for opaque whiteout (.wh..wh..opq) in provisioner
> -
>
> Key: MESOS-6215
> URL: https://issues.apache.org/jira/browse/MESOS-6215
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
> Attachments: whiteout.diff
>
>
> In a Docker image, there can be a opaque whiteout entry (a file with the name 
> {{.wh..wh..opq}}) under a directory which indicates all siblings under that 
> directory should be removed. But currently Mesos provisioner does not support 
> to handle such opaque whiteout entry which will cause launching container 
> with some Docker images fails, e.g.:
> {code}
> $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
> --docker_image=rabbitmq --command="sleep 100"
> I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
> I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
> master@192.168.122.171:5050
> Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
> Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to remove whiteout file 
> '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
>  No such file or directory'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Check OCI image spec for more details about opaque whiteout:
> https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509466#comment-15509466
 ] 

Stéphane Cottin edited comment on MESOS-6215 at 9/21/16 10:11 AM:
--

>From the example on top : {{lists/partial/.wh..opq': No such file or 
>directory}}


was (Author: kaalh):
>From the example on top : lists/partial/.wh..opq': No such file or directory

> Add support for opaque whiteout (.wh..wh..opq) in provisioner
> -
>
> Key: MESOS-6215
> URL: https://issues.apache.org/jira/browse/MESOS-6215
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
> Attachments: whiteout.diff
>
>
> In a Docker image, there can be a opaque whiteout entry (a file with the name 
> {{.wh..wh..opq}}) under a directory which indicates all siblings under that 
> directory should be removed. But currently Mesos provisioner does not support 
> to handle such opaque whiteout entry which will cause launching container 
> with some Docker images fails, e.g.:
> {code}
> $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
> --docker_image=rabbitmq --command="sleep 100"
> I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
> I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
> master@192.168.122.171:5050
> Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
> Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to remove whiteout file 
> '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
>  No such file or directory'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Check OCI image spec for more details about opaque whiteout:
> https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6209) Containers that use the Mesos containerizer but don't want to provision a container image fail to validate.

2016-09-21 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509423#comment-15509423
 ] 

Jan Schlicht commented on MESOS-6209:
-

Looks like JIRA created this ticket twice. MESOS-6208 is the canonical ticket.

> Containers that use the Mesos containerizer but don't want to provision a 
> container image fail to validate.
> ---
>
> Key: MESOS-6209
> URL: https://issues.apache.org/jira/browse/MESOS-6209
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Mesos HEAD, change was introduced with 
> e65f580bf0cbea64cedf521cf169b9b4c9f85454
>Reporter: Jan Schlicht
>
> Tasks using  features like volumes or CNI in their containers, have to define 
> these in {{TaskInfo.container}}. When these tasks don't want/need to 
> provision a container image, neither {{ContainerInfo.docker}} nor 
> {{ContainerInfo.mesos}} will be set. Nevertheless, the container type in 
> {{ContainerInfo.type}} needs to be set, because it is a required field.
> In that case, the recently introduced validation rules in 
> {{master/validation.cpp}} ({{validateContainerInfo}} will fail, which isn't 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509242#comment-15509242
 ] 

Qian Zhang commented on MESOS-6215:
---

Thanks [~kaalh], but I think the filename with Docker image is also 
{{.wh..wh..opq}}, please check the code below for details:
https://github.com/docker/docker/blob/v1.12.1/pkg/archive/whiteouts.go#L23

> Add support for opaque whiteout (.wh..wh..opq) in provisioner
> -
>
> Key: MESOS-6215
> URL: https://issues.apache.org/jira/browse/MESOS-6215
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
> Attachments: whiteout.diff
>
>
> In a Docker image, there can be a opaque whiteout entry (a file with the name 
> {{.wh..wh..opq}}) under a directory which indicates all siblings under that 
> directory should be removed. But currently Mesos provisioner does not support 
> to handle such opaque whiteout entry which will cause launching container 
> with some Docker images fails, e.g.:
> {code}
> $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
> --docker_image=rabbitmq --command="sleep 100"
> I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
> I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
> master@192.168.122.171:5050
> Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
> Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to remove whiteout file 
> '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
>  No such file or directory'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Check OCI image spec for more details about opaque whiteout:
> https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6216) LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv

2016-09-21 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6216:

Description: 
{{LibeventSSLSocketImpl::create}} is called whenever a potentially ssl-enabled 
socket is created. It in turn calls {{openssl::initialize}} which calls a 
function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} is used to 
set up SSL-related libprocess environment variables {{LIBPROCESS_SSL_*}}.

Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, 
any calling of functions like {{os::getenv}} (or via {{os::environment}}) 
concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} 
performs unsynchronized r/w access to the same data structure in the runtime.

We usually perform most setup of the environment before we start the libprocess 
runtime with {{process::initialize}} from a {{main}} function, see e.g., 
{{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It appears that 
we should move the setup of libprocess' SSL environment variables to a similar 
spot.

  was:
{{LibeventSSLSocketImpl::create}} is called whenever a potentially ssl-enabled 
socket is created. It in turn calls {{openssl::initialize}} which calls a 
function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} is used to 
set up SSL-related libprocess environment variables {{LIBPROCESS_SSL_*}}.

Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, 
any code calling functions like {{os::getenv}} (via or {{os::environment}}) 
concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} 
perform unsynchronized r/w access to the same data structure in the runtime.

We usually perform most setup of the environment before we start the libprocess 
runtime with {{process::initialize}} from a {{main}} function, see e.g., 
{{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It appears that 
we should move the setup of libprocess' SSL environment variables to a similar 
spot.


> LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv
> --
>
> Key: MESOS-6216
> URL: https://issues.apache.org/jira/browse/MESOS-6216
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Benjamin Bannier
> Attachments: build.log
>
>
> {{LibeventSSLSocketImpl::create}} is called whenever a potentially 
> ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which 
> calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} 
> is used to set up SSL-related libprocess environment variables 
> {{LIBPROCESS_SSL_*}}.
> Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, 
> any calling of functions like {{os::getenv}} (or via {{os::environment}}) 
> concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} 
> performs unsynchronized r/w access to the same data structure in the runtime.
> We usually perform most setup of the environment before we start the 
> libprocess runtime with {{process::initialize}} from a {{main}} function, see 
> e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It 
> appears that we should move the setup of libprocess' SSL environment 
> variables to a similar spot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6216) LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv

2016-09-21 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6216:

Summary: LibeventSSLSocketImpl::create is not safe to call concurrently 
with os::getenv  (was: LibeventSSLSocketImpl::create is not thread-safe, but 
used as if it were)

> LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv
> --
>
> Key: MESOS-6216
> URL: https://issues.apache.org/jira/browse/MESOS-6216
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Benjamin Bannier
> Attachments: build.log
>
>
> {{LibeventSSLSocketImpl::create}} is called whenever a potentially 
> ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which 
> calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} 
> is used to set up SSL-related libprocess environment variables 
> {{LIBPROCESS_SSL_*}}.
> Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, 
> any code calling functions like {{os::getenv}} (via or {{os::environment}}) 
> concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} 
> perform unsynchronized r/w access to the same data structure in the runtime.
> We usually perform most setup of the environment before we start the 
> libprocess runtime with {{process::initialize}} from a {{main}} function, see 
> e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It 
> appears that we should move the setup of libprocess' SSL environment 
> variables to a similar spot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6213) Build failure on macOS Sierra

2016-09-21 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509208#comment-15509208
 ] 

Jan Schlicht edited comment on MESOS-6213 at 9/21/16 8:33 AM:
--

Still fails here. Mesos uses protobuf generated classes that use functions that 
have been marked as deprecated beginning with macOS Sierra. Normally this would 
result in a warning, but Mesos is compiled with {{-Werror}}, hence the build 
will fail.


was (Author: nfnt):
Still fails here. Mesos uses protobuf generated classes that use functions that 
have been marked as deprecated beginning with macOS Sierra. Normally this would 
result in a warning, but Mesos is compiled with {{-Werror}}, hence the build 
will fail. This has been [reported for Protocol 
Buffers|https://github.com/google/protobuf/issues/74].

> Build failure on macOS Sierra
> -
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra

2016-09-21 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509208#comment-15509208
 ] 

Jan Schlicht commented on MESOS-6213:
-

Still fails here. Mesos uses protobuf generated classes that use functions that 
have been marked as deprecated beginning with macOS Sierra. Normally this would 
result in a warning, but Mesos is compiled with {{-Werror}}, hence the build 
will fail. This has been [reported for Protocol 
Buffers|https://github.com/google/protobuf/issues/74].

> Build failure on macOS Sierra
> -
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6216) LibeventSSLSocketImpl::create is not thread-safe, but used as if it were

2016-09-21 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6216:

Attachment: build.log

Attached a trimmed build log showing a likely related issue, 
https://builds.apache.org/job/Mesos/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)/2658/changes.

This build was configured with {{--enable-libevent --enable-ssl}} on centos-7.

> LibeventSSLSocketImpl::create is not thread-safe, but used as if it were
> 
>
> Key: MESOS-6216
> URL: https://issues.apache.org/jira/browse/MESOS-6216
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Benjamin Bannier
> Attachments: build.log
>
>
> {{LibeventSSLSocketImpl::create}} is called whenever a potentially 
> ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which 
> calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} 
> is used to set up SSL-related libprocess environment variables 
> {{LIBPROCESS_SSL_*}}.
> Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, 
> any code calling functions like {{os::getenv}} (via or {{os::environment}}) 
> concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} 
> perform unsynchronized r/w access to the same data structure in the runtime.
> We usually perform most setup of the environment before we start the 
> libprocess runtime with {{process::initialize}} from a {{main}} function, see 
> e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It 
> appears that we should move the setup of libprocess' SSL environment 
> variables to a similar spot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6216) LibeventSSLSocketImpl::create is not thread-safe, but used as if it were

2016-09-21 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-6216:
---

 Summary: LibeventSSLSocketImpl::create is not thread-safe, but 
used as if it were
 Key: MESOS-6216
 URL: https://issues.apache.org/jira/browse/MESOS-6216
 Project: Mesos
  Issue Type: Bug
  Components: security
Reporter: Benjamin Bannier


{{LibeventSSLSocketImpl::create}} is called whenever a potentially ssl-enabled 
socket is created. It in turn calls {{openssl::initialize}} which calls a 
function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} is used to 
set up SSL-related libprocess environment variables {{LIBPROCESS_SSL_*}}.

Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, 
any code calling functions like {{os::getenv}} (via or {{os::environment}}) 
concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} 
perform unsynchronized r/w access to the same data structure in the runtime.

We usually perform most setup of the environment before we start the libprocess 
runtime with {{process::initialize}} from a {{main}} function, see e.g., 
{{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It appears that 
we should move the setup of libprocess' SSL environment variables to a similar 
spot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra

2016-09-21 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509154#comment-15509154
 ] 

Jan Schlicht commented on MESOS-6213:
-

I ran into the same problem after updating to macOS Sierra. The log indicates 
that this is due to some deprecated functions. I wouldn't expect that a reboot 
would solve it, but will try it out. For now my workaround was compiling with 
{{-Wno-deprecated-declarations}}.

> Build failure on macOS Sierra
> -
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6213) Build failure on macOS Sierra

2016-09-21 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-6213:

Summary: Build failure on macOS Sierra  (was: Build failure on OSX)

> Build failure on macOS Sierra
> -
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509136#comment-15509136
 ] 

Stéphane Cottin commented on MESOS-6215:


With docker images the filename seems to be {{.wh..opq}}, not {{.wh..wh..opq}} 
like in the OCI Image spec, anyway, the purpose is the same.

> Add support for opaque whiteout (.wh..wh..opq) in provisioner
> -
>
> Key: MESOS-6215
> URL: https://issues.apache.org/jira/browse/MESOS-6215
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
> Attachments: whiteout.diff
>
>
> In a Docker image, there can be a opaque whiteout entry (a file with the name 
> {{.wh..wh..opq}}) under a directory which indicates all siblings under that 
> directory should be removed. But currently Mesos provisioner does not support 
> to handle such opaque whiteout entry which will cause launching container 
> with some Docker images fails, e.g.:
> {code}
> $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
> --docker_image=rabbitmq --command="sleep 100"
> I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
> I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
> master@192.168.122.171:5050
> Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
> Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to remove whiteout file 
> '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
>  No such file or directory'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Check OCI image spec for more details about opaque whiteout:
> https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stéphane Cottin updated MESOS-6215:
---
Attachment: whiteout.diff

Temporary workaround for anyone blocked by this issue.

> Add support for opaque whiteout (.wh..wh..opq) in provisioner
> -
>
> Key: MESOS-6215
> URL: https://issues.apache.org/jira/browse/MESOS-6215
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
> Attachments: whiteout.diff
>
>
> In a Docker image, there can be a opaque whiteout entry (a file with the name 
> {{.wh..wh..opq}}) under a directory which indicates all siblings under that 
> directory should be removed. But currently Mesos provisioner does not support 
> to handle such opaque whiteout entry which will cause launching container 
> with some Docker images fails, e.g.:
> {code}
> $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
> --docker_image=rabbitmq --command="sleep 100"
> I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
> I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
> master@192.168.122.171:5050
> Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
> Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to remove whiteout file 
> '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
>  No such file or directory'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Check OCI image spec for more details about opaque whiteout:
> https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6002) The whiteout file cannot be removed correctly using aufs backend.

2016-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509074#comment-15509074
 ] 

Qian Zhang commented on MESOS-6002:
---

[~philwinder] and [~kaalh], I think what you reported is a different issue than 
what [~gilbert] reported in this ticket. I have created another ticket for your 
issue, please check https://issues.apache.org/jira/browse/MESOS-6215 for more 
details. 

> The whiteout file cannot be removed correctly using aufs backend.
> -
>
> Key: MESOS-6002
> URL: https://issues.apache.org/jira/browse/MESOS-6002
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any os with aufs module
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>  Labels: aufs, backend, containerizer
> Attachments: whiteout.diff
>
>
> The whiteout file is not removed correctly when using the aufs backend in 
> unified containerizer. It can be verified by this unit test with the aufs 
> manually specified.
> {noformat}
> [20:11:24] :   [Step 10/10] [ RUN  ] 
> ProvisionerDockerPullerTest.ROOT_INTERNET_CURL_Whiteout
> [20:11:24]W:   [Step 10/10] I0805 20:11:24.986734 24295 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.001153 24295 leveldb.cpp:174] 
> Opened db in 14.308627ms
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003731 24295 leveldb.cpp:181] 
> Compacted db in 2.558329ms
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003749 24295 leveldb.cpp:196] 
> Created db iterator in 3086ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003754 24295 leveldb.cpp:202] 
> Seeked to beginning of db in 595ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003758 24295 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 314ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003769 24295 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004086 24315 recover.cpp:451] 
> Starting replica recovery
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004251 24312 recover.cpp:477] 
> Replica is in EMPTY status
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004546 24314 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5640)@172.30.2.105:36006
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004607 24312 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004762 24313 recover.cpp:568] 
> Updating replica status to STARTING
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004776 24314 master.cpp:375] 
> Master 21665992-d47e-402f-a00c-6f8fab613019 (ip-172-30-2-105.mesosphere.io) 
> started on 172.30.2.105:36006
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004787 24314 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/0z753P/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/0z753P/master" --zk_session_timeout="10secs"
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004920 24314 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004930 24314 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004935 24314 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004942 24314 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/0z753P/credentials'
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005018 24314 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005101 24314 http.cpp:883] Using 
> default 'basic' 

[jira] [Updated] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang updated MESOS-6215:
--
Shepherd: Jie Yu

> Add support for opaque whiteout (.wh..wh..opq) in provisioner
> -
>
> Key: MESOS-6215
> URL: https://issues.apache.org/jira/browse/MESOS-6215
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> In a Docker image, there can be a opaque whiteout entry (a file with the name 
> {{.wh..wh..opq}}) under a directory which indicates all siblings under that 
> directory should be removed. But currently Mesos provisioner does not support 
> to handle such opaque whiteout entry which will cause launching container 
> with some Docker images fails, e.g.:
> {code}
> $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
> --docker_image=rabbitmq --command="sleep 100"
> I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
> I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
> master@192.168.122.171:5050
> Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
> Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to remove whiteout file 
> '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
>  No such file or directory'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Check OCI image spec for more details about opaque whiteout:
> https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner

2016-09-21 Thread Qian Zhang (JIRA)
Qian Zhang created MESOS-6215:
-

 Summary: Add support for opaque whiteout (.wh..wh..opq) in 
provisioner
 Key: MESOS-6215
 URL: https://issues.apache.org/jira/browse/MESOS-6215
 Project: Mesos
  Issue Type: Bug
Reporter: Qian Zhang
Assignee: Qian Zhang


In a Docker image, there can be a opaque whiteout entry (a file with the name 
{{.wh..wh..opq}}) under a directory which indicates all siblings under that 
directory should be removed. But currently Mesos provisioner does not support 
to handle such opaque whiteout entry which will cause launching container with 
some Docker images fails, e.g.:
{code}
$ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test 
--docker_image=rabbitmq --command="sleep 100"
I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0
I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at 
master@192.168.122.171:5050
Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e-
Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0'
Received status update TASK_FAILED for task 'test'
  message: 'Failed to launch container: Failed to remove whiteout file 
'/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq':
 No such file or directory'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
{code}

Check OCI image spec for more details about opaque whiteout:
https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early

2016-09-21 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508943#comment-15508943
 ] 

haosdent commented on MESOS-6180:
-

Awesome! Thanks a lot for your helps!

> Several tests are flaky, with futures timing out early
> --
>
> Key: MESOS-6180
> URL: https://issues.apache.org/jira/browse/MESOS-6180
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Greg Mann
>Assignee: haosdent
>  Labels: mesosphere, tests
> Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, 
> CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log, 
> RoleTest.ImplicitRoleRegister.txt, 
> flaky-containerizer-pid-namespace-backward.txt, 
> flaky-containerizer-pid-namespace-forward.txt
>
>
> Following the merging of a large patch chain, it was noticed on our internal 
> CI that several tests had become flaky, with a similar pattern in the 
> failures: the tests fail early when a future times out. Often, this occurs 
> when a test cluster is being spun up and one of the offer futures times out. 
> This has been observed in the following tests:
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
> * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch
> * RoleTest.ImplicitRoleRegister
> * SlaveRecoveryTest/0.MultipleFrameworks
> * SlaveRecoveryTest/0.ReconcileShutdownFramework
> * SlaveTest.ContainerizerUsageFailure
> * MesosSchedulerDriverTest.ExplicitAcknowledgements
> * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164)
> * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165)
> * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166)
> See the linked JIRAs noted above for individual tickets addressing a couple 
> of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)