[jira] [Created] (MESOS-6224) mesos-docker-executor port collision with container(network:host)
Wei Wei created MESOS-6224: -- Summary: mesos-docker-executor port collision with container(network:host) Key: MESOS-6224 URL: https://issues.apache.org/jira/browse/MESOS-6224 Project: Mesos Issue Type: Bug Components: containerization Affects Versions: 1.0.1 Environment: ubuntu 14.04 LTS mesos 1.0.1 Reporter: Wei Wei we implement a scheduler to launch a batch of data process worker in docker container, we use network: host and assign random pick port for the process worker and encounter a port collision problem: the port we choose for the container was taken by the mesos-docker-container. mesos-agent resource config: {"name":"ports","type":"RANGES","ranges":{"range": [{"begin":1,"end":32000}]}}] root@xs35:lsof -i:31981 COMMAND PID USER FD TYPEDEVICE SIZE/OFF NODE NAME mesos-doc 2835 root8u IPv4 433237650 0t0 TCP 10.34.38.30:31981 (LISTEN) the port was taken by mesos-docker-executor and the framework still offe port resource 31981 is there a way to set mesos-docker-executors port range? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6225) mesos-docker-executor port collision with container(network:host)
Wei Wei created MESOS-6225: -- Summary: mesos-docker-executor port collision with container(network:host) Key: MESOS-6225 URL: https://issues.apache.org/jira/browse/MESOS-6225 Project: Mesos Issue Type: Bug Components: containerization Affects Versions: 1.0.1 Environment: ubuntu 14.04 LTS mesos 1.0.1 Reporter: Wei Wei we implement a scheduler to launch a batch of data process worker in docker container, we use network: host and assign random pick port for the process worker and encounter a port collision problem: the port we choose for the container was taken by the mesos-docker-container. mesos-agent resource config: {"name":"ports","type":"RANGES","ranges":{"range": [{"begin":1,"end":32000}]}}] root@xs35:lsof -i:31981 COMMAND PID USER FD TYPEDEVICE SIZE/OFF NODE NAME mesos-doc 2835 root8u IPv4 433237650 0t0 TCP 10.34.38.30:31981 (LISTEN) the port was taken by mesos-docker-executor and the framework still offe port resource 31981 is there a way to set mesos-docker-executors port range? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6156) Make the `network/cni` isolator nesting aware
[ https://issues.apache.org/jira/browse/MESOS-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511777#comment-15511777 ] Jie Yu commented on MESOS-6156: --- commit 2a8de6255494eed2c435ef2b80dc846e1c1b5e90 Author: Avinash sridharanDate: Wed Sep 21 17:16:37 2016 -0700 Modified the `network/cni` isolator to be nesting aware. The network file setup in the `network/cni` isolator is now nesting aware. Since the children share the network and UTS namespace with the parent, the network files need to be created only for the parent container. For the child containers, the network files will be simply a bind mount of the parents network files. Review: https://reviews.apache.org/r/51857/ > Make the `network/cni` isolator nesting aware > - > > Key: MESOS-6156 > URL: https://issues.apache.org/jira/browse/MESOS-6156 > Project: Mesos > Issue Type: Task > Components: containerization >Affects Versions: 1.1.0 >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > Fix For: 1.1.0 > > Original Estimate: 96h > Remaining Estimate: 96h > > In pods, child containers share the network and UTS namespace with the parent > containers. This implies that during `prepare` and `isolate` the > `network/cni` isolator needs to be aware the parent-child relationship > between containers to make the following decisions: > a) During `prepare` a container should be allocated a new network namespace > and UTS namespace only if the container is a top level container. > b) During `isolate` the network files (/etc/hosts, /etc/hostname, > /etc/resolv.conf) should be created only for top level containers. The > network files for child containers will just be symlinks to the parent > containers network files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-6219) Improve auto-detection of predefined resource types.
[ https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-6219: -- Comment: was deleted (was: One way to address this may be to have a special resource type: {code} message Value { enum Type { SCALAR = 0; RANGES = 1; SET = 2; TEXT = 3; +AUTO = 4; } {code} - We only do auto-detection for predefined resources and they are identified by names and other fields so the {{type}} is already custom defined. i.e., it doesn't make sense to have {{cpus}} with {{type=RANGES}}. When the parser sees {{type=AUTO}} for {{cpus}}, it auto-detects the value and then assigns the type. - For custom resources we don't support auto-detection anyways so {{type=AUTO}} would be invalid and the parser would bail.) > Improve auto-detection of predefined resource types. > > > Key: MESOS-6219 > URL: https://issues.apache.org/jira/browse/MESOS-6219 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Yan Xu > > Mesos agent currently auto-detects certain predefined resource types when > they are not specified. These include cpus, mem, (root) disk, gpus and also > MESOS-6062 is adding auto-detection for mount disks when the size is > specified as a special value "0". > Due to the limitation of the {{Resources}} abstraction the caller of > {{Resources::parse()}} can't tell if a resource is intentionally specified > with an empty value or unspecified. The current resource auto-detection in > {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag > to check if a resource is specified, this is very fragile, e.g., it would > think {{gpus:0}} is specified (and not auto-detected) if there is a mount > disk with its root being {{/biggpush}}. > It would be nice if we can have the user explicitly specify the intention to > have value of a standard resource auto-detected (at least with the JSON > input). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6223) Allow agents to re-register post a host reboot
Megha created MESOS-6223: Summary: Allow agents to re-register post a host reboot Key: MESOS-6223 URL: https://issues.apache.org/jira/browse/MESOS-6223 Project: Mesos Issue Type: Improvement Components: slave Reporter: Megha Agent does’t recover its state post a host reboot, it registers with the master and gets a new SlaveID. With partition awareness, the agents are now allowed to re-register after they have been marked Unreachable. The executors are anyway terminated on the agent when it reboots so there is no harm in letting the agent keep its SlaveID, re-register with the master and reconcile the lost executors. This is a pre-requisite for supporting persistent/restartable tasks in mesos (https://issues.apache.org/jira/browse/MESOS-3545). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6223) Allow agents to re-register post a host reboot
[ https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megha updated MESOS-6223: - Description: Agent does’t recover its state post a host reboot, it registers with the master and gets a new SlaveID. With partition awareness, the agents are now allowed to re-register after they have been marked Unreachable. The executors are anyway terminated on the agent when it reboots so there is no harm in letting the agent keep its SlaveID, re-register with the master and reconcile the lost executors. This is a pre-requisite for supporting persistent/restartable tasks in mesos (MESOS-3545). (was: Agent does’t recover its state post a host reboot, it registers with the master and gets a new SlaveID. With partition awareness, the agents are now allowed to re-register after they have been marked Unreachable. The executors are anyway terminated on the agent when it reboots so there is no harm in letting the agent keep its SlaveID, re-register with the master and reconcile the lost executors. This is a pre-requisite for supporting persistent/restartable tasks in mesos (https://issues.apache.org/jira/browse/MESOS-3545).) > Allow agents to re-register post a host reboot > -- > > Key: MESOS-6223 > URL: https://issues.apache.org/jira/browse/MESOS-6223 > Project: Mesos > Issue Type: Improvement > Components: slave >Reporter: Megha > > Agent does’t recover its state post a host reboot, it registers with the > master and gets a new SlaveID. With partition awareness, the agents are now > allowed to re-register after they have been marked Unreachable. The executors > are anyway terminated on the agent when it reboots so there is no harm in > letting the agent keep its SlaveID, re-register with the master and reconcile > the lost executors. This is a pre-requisite for supporting > persistent/restartable tasks in mesos (MESOS-3545). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6222) Improve organization of methods related to Resources
Yan Xu created MESOS-6222: - Summary: Improve organization of methods related to Resources Key: MESOS-6222 URL: https://issues.apache.org/jira/browse/MESOS-6222 Project: Mesos Issue Type: Improvement Components: c++ api Reporter: Yan Xu Currently the {{Resources}} class is used to dump everything loosely related to the protobuf {{Resource}} as member methods. As a result {{Resources}} has a large amount of utility methods that are not related to the {{Resources}} abstraction. Examples: {code:title=This returns a protobuf Resource and not Resources.} static Try parse( const std::string& name, const std::string& value, const std::string& role); {code} {code:title=This only looks at the protobuf too.} static bool isPersistentVolume(const Resource& resource); {code} This makes it hard to name and distinguish similar methods which work at different abstraction levels (see the {{parse(text, role)}} function below). It would be way simpler to have them as namespaced free-standing functions. {code:title=} namespace mesos { // Methods for the protobuf `Resource`. namespace resource { Try parse( const std::string& name, const std::string& value, const std::string& role); bool isPersistentVolume(const Resource& resource); // Now I can add a `parse` method for multiple resources but at the `Resource` level. Tryparse( const std::string& text, const std::string& defaultRole = "*"); } // Methods for the `Resources` abstraction. namespace resources { ... } } {code} Static member methods of Resources are still fine if they directly pertain to the {{Resources}} abstraction itself and we can use the private members/methods when useful. {code:title=e.g., Resources::flatten() uses the internal validation-free `add` method as opposed to `+=` to accumulate resource objects} Try flatten( const std::string& role, const Option& reservation = None()) const; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6221) Ability to post maintenance/schedule with better granularity
[ https://issues.apache.org/jira/browse/MESOS-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511324#comment-15511324 ] Joseph Wu commented on MESOS-6221: -- The race is definitely a possibility. The current implementation of maintenance primitives is an MVP. We're waiting for some community adoption/feedback (particularly framework support) before hardening the feature further. As part of the MVP, we decided it would be logically simpler to assume only one operator does any maintenance, including changing the schedule. (Note that there are TODOs in the codebase about having multiple schedules. That would be one way of isolating two operators.) > Ability to post maintenance/schedule with better granularity > > > Key: MESOS-6221 > URL: https://issues.apache.org/jira/browse/MESOS-6221 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: Huadong Liu > > Currently the maintenance schedule update is at cluster granularity: "To > update the maintenance schedule, the operator should first read the current > schedule, make any necessary changes, and then post the modified schedule." > http://mesos.apache.org/documentation/latest/maintenance/ > In contrast, the machine/down and up endpoints operate at host granularity. > One or a set of hosts can be moved to DOWN mode or UP mode once the schedule > exists. > Requiring to GET current schedule before POSTing an updated schedule may > create races if machine/up and maintenance/schedule update happen at > different hosts/processes, for example. > 1. mesos master has host A in maintenance down mode. > 2. process p1 tries to UP host A. > 3. process p2 tries to get the current schedule and then append host B to the > schedule. > 4. mesos master may end up have A and B in maintenance DRAIN mode although > the desired result is to have B in DRAIN mode only. > I cannot find a document to explain why the maintenance schedule has to be > updated at the cluster granularity. Although the problem can be resolved by > external synchronization, having the ability to update maintenance schedule > at hosts granularity seems a better choice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6221) Ability to post maintenance/schedule with better granularity
Huadong Liu created MESOS-6221: -- Summary: Ability to post maintenance/schedule with better granularity Key: MESOS-6221 URL: https://issues.apache.org/jira/browse/MESOS-6221 Project: Mesos Issue Type: Improvement Components: HTTP API Reporter: Huadong Liu Currently the maintenance schedule update is at cluster granularity: "To update the maintenance schedule, the operator should first read the current schedule, make any necessary changes, and then post the modified schedule." http://mesos.apache.org/documentation/latest/maintenance/ In contrast, the machine/down and up endpoints operate at host granularity. One or a set of hosts can be moved to DOWN mode or UP mode once the schedule exists. Requiring to GET current schedule before POSTing an updated schedule may create races if machine/up and maintenance/schedule update happen at different hosts/processes, for example. 1. mesos master has host A in maintenance down mode. 2. process p1 tries to UP host A. 3. process p2 tries to get the current schedule and then append host B to the schedule. 4. mesos master may end up have A and B in maintenance DRAIN mode although the desired result is to have B in DRAIN mode only. I cannot find a document to explain why the maintenance schedule has to be updated at the cluster granularity. Although the problem can be resolved by external synchronization, having the ability to update maintenance schedule at hosts granularity seems a better choice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6220) HTTP handler failures should result in 500 response rather than 503 response.
[ https://issues.apache.org/jira/browse/MESOS-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-6220: -- Shepherd: Vinod Kone > HTTP handler failures should result in 500 response rather than 503 response. > - > > Key: MESOS-6220 > URL: https://issues.apache.org/jira/browse/MESOS-6220 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler >Priority: Minor > Fix For: 1.1.0 > > > Currently, when an HTTP handler fails, libprocess will send a {{503 Service > Unavailable}} (see > [here|https://github.com/apache/mesos/blob/8322b403cbaa1ace61733b61d4325ec6ee808ffd/3rdparty/libprocess/src/process.cpp#L1232-L1234]. > However, more appropriate would be to send a {{500 Internal Server Error}} > given the documented behavior of these statuses: > *500 Internal Server Error* > A generic error message, given when an unexpected condition was encountered > and no more specific message is suitable. > *503 Service Unavailable* > The server is currently unavailable (because it is overloaded or down for > maintenance). Generally, this is a temporary state. > From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6220) HTTP handler failures should result in 500 response rather than 503 response.
[ https://issues.apache.org/jira/browse/MESOS-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511270#comment-15511270 ] Benjamin Mahler commented on MESOS-6220: Added as an API change in the CHANGELOG: {noformat} commit a640eb25a9ebc4b2409c86df5cc71bd625427001 Author: Benjamin MahlerDate: Wed Sep 21 14:36:49 2016 -0700 Added MESOS-6220 as an API change to the 1.1.0 CHANGELOG. {noformat} > HTTP handler failures should result in 500 response rather than 503 response. > - > > Key: MESOS-6220 > URL: https://issues.apache.org/jira/browse/MESOS-6220 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler >Priority: Minor > Fix For: 1.1.0 > > > Currently, when an HTTP handler fails, libprocess will send a {{503 Service > Unavailable}} (see > [here|https://github.com/apache/mesos/blob/8322b403cbaa1ace61733b61d4325ec6ee808ffd/3rdparty/libprocess/src/process.cpp#L1232-L1234]. > However, more appropriate would be to send a {{500 Internal Server Error}} > given the documented behavior of these statuses: > *500 Internal Server Error* > A generic error message, given when an unexpected condition was encountered > and no more specific message is suitable. > *503 Service Unavailable* > The server is currently unavailable (because it is overloaded or down for > maintenance). Generally, this is a temporary state. > From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6220) HTTP handler failures should result in 500 response rather than 503 response.
[ https://issues.apache.org/jira/browse/MESOS-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-6220: -- Assignee: Benjamin Mahler > HTTP handler failures should result in 500 response rather than 503 response. > - > > Key: MESOS-6220 > URL: https://issues.apache.org/jira/browse/MESOS-6220 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler >Priority: Minor > > Currently, when an HTTP handler fails, libprocess will send a {{503 Service > Unavailable}} (see > [here|https://github.com/apache/mesos/blob/8322b403cbaa1ace61733b61d4325ec6ee808ffd/3rdparty/libprocess/src/process.cpp#L1232-L1234]. > However, more appropriate would be to send a {{500 Internal Server Error}} > given the documented behavior of these statuses: > *500 Internal Server Error* > A generic error message, given when an unexpected condition was encountered > and no more specific message is suitable. > *503 Service Unavailable* > The server is currently unavailable (because it is overloaded or down for > maintenance). Generally, this is a temporary state. > From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6220) HTTP handler failures should result in 500 response rather than 503 response.
Benjamin Mahler created MESOS-6220: -- Summary: HTTP handler failures should result in 500 response rather than 503 response. Key: MESOS-6220 URL: https://issues.apache.org/jira/browse/MESOS-6220 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Benjamin Mahler Priority: Minor Currently, when an HTTP handler fails, libprocess will send a {{503 Service Unavailable}} (see [here|https://github.com/apache/mesos/blob/8322b403cbaa1ace61733b61d4325ec6ee808ffd/3rdparty/libprocess/src/process.cpp#L1232-L1234]. However, more appropriate would be to send a {{500 Internal Server Error}} given the documented behavior of these statuses: *500 Internal Server Error* A generic error message, given when an unexpected condition was encountered and no more specific message is suitable. *503 Service Unavailable* The server is currently unavailable (because it is overloaded or down for maintenance). Generally, this is a temporary state. >From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6219) Improve auto-detection of predefined resource types.
[ https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-6219: -- Description: Mesos agent currently auto-detects certain predefined resource types when they are not specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 is adding auto-detection for mount disks when the size is specified as a special value "0". Due to the limitation of the {{Resources}} abstraction the caller of {{Resources::parse()}} can't tell if a resource is intentionally specified with an empty value or unspecified. The current resource auto-detection in {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to check if a resource is specified, this is very fragile, e.g., it would think {{gpus:0}} is specified (and not auto-detected) if there is a mount disk with its root being {{/biggpush}}. It would be nice if we can have the user explicitly specify the intention to have value of a standard resource auto-detected (at least with the JSON input). was: Mesos agent currently auto-detects certain predefined resource types when they are not specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 is adding auto-detection for mount disks when the size is specified as a special value "0". Due to the limitation of the {{Resources}} abstraction the caller of {{Resources::parse()}} can't tell if a resource is intentionally specified with an empty value or unspecified. The current resource auto-detection in {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to check if resource is specified, this is very fragile, e.g., it would think {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with its root being {{/biggpush}}. It would be nice if we can have the user explicitly specify the intention to have value of a standard resource auto-detected (at least with the JSON input). > Improve auto-detection of predefined resource types. > > > Key: MESOS-6219 > URL: https://issues.apache.org/jira/browse/MESOS-6219 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Yan Xu > > Mesos agent currently auto-detects certain predefined resource types when > they are not specified. These include cpus, mem, (root) disk, gpus and also > MESOS-6062 is adding auto-detection for mount disks when the size is > specified as a special value "0". > Due to the limitation of the {{Resources}} abstraction the caller of > {{Resources::parse()}} can't tell if a resource is intentionally specified > with an empty value or unspecified. The current resource auto-detection in > {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag > to check if a resource is specified, this is very fragile, e.g., it would > think {{gpus:0}} is specified (and not auto-detected) if there is a mount > disk with its root being {{/biggpush}}. > It would be nice if we can have the user explicitly specify the intention to > have value of a standard resource auto-detected (at least with the JSON > input). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6219) Improve auto-detection of predefined resource types.
[ https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511178#comment-15511178 ] Yan Xu commented on MESOS-6219: --- One way to address this may be to have a special resource type: {code} message Value { enum Type { SCALAR = 0; RANGES = 1; SET = 2; TEXT = 3; +AUTO = 4; } {code} - We only do auto-detection for predefined resources and they are identified by names and other fields so the {{type}} is already custom defined. i.e., it doesn't make sense to have {{cpus}} with {{type=RANGES}}. When the parser sees {{type=AUTO}} for {{cpus}}, it auto-detects the value and then assigns the type. - For custom resources we don't support auto-detection anyways so {{type=AUTO}} would be invalid and the parser would bail. > Improve auto-detection of predefined resource types. > > > Key: MESOS-6219 > URL: https://issues.apache.org/jira/browse/MESOS-6219 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Yan Xu > > Mesos agent currently auto-detects certain predefined resource types when > they are not specified. These include cpus, mem, (root) disk, gpus and also > MESOS-6062 is adding auto-detection for mount disks when the size is > specified as a special value "0". > Due to the limitation of the {{Resources}} abstraction the caller of > {{Resources::parse()}} can't tell if a resource is intentionally specified > with an empty value or unspecified. The current resource auto-detection in > {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag > to check if resource is specified, this is very fragile, e.g., it would think > {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with > its root being {{/biggpush}}. > It would be nice if we can have the user explicitly specify the intention to > have value of a standard resource auto-detected (at least with the JSON > input). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6219) Improve auto-detection of predefined resource types.
[ https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-6219: -- Description: Mesos agent currently auto-detects certain predefined resource types when they are not specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 is adding auto-detection for mount disks when the size is specified as a special value "0". Due to the limitation of the {{Resources}} abstraction the caller of {{Resources::parse()}} can't tell if a resource is intentionally specified with an empty value or unspecified. The current resource auto-detection in {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to check if resource is specified, this is very fragile, e.g., it would think {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with its root being {{/biggpush}}. It would be nice if we can have the user explicitly specify the intention to have value of a standard resource auto-detected (at least with the JSON input). was: Mesos agent currently auto-detects certain standard resources when they are not specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 is adding auto-detection for mount disks when the size is specified as a special value "0". Due to the limitation of the {{Resources}} abstraction the caller of {{Resources::parse()}} can't tell if a resource is intentionally specified with an empty value or unspecified. The current resource auto-detection in {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to check if resource is specified, this is very fragile, e.g., it would think {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with its root being {{/biggpush}}. It would be nice if we can have the user explicitly specify the intention to have value of a standard resource auto-detected (at least with the JSON input). > Improve auto-detection of predefined resource types. > > > Key: MESOS-6219 > URL: https://issues.apache.org/jira/browse/MESOS-6219 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Yan Xu > > Mesos agent currently auto-detects certain predefined resource types when > they are not specified. These include cpus, mem, (root) disk, gpus and also > MESOS-6062 is adding auto-detection for mount disks when the size is > specified as a special value "0". > Due to the limitation of the {{Resources}} abstraction the caller of > {{Resources::parse()}} can't tell if a resource is intentionally specified > with an empty value or unspecified. The current resource auto-detection in > {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag > to check if resource is specified, this is very fragile, e.g., it would think > {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with > its root being {{/biggpush}}. > It would be nice if we can have the user explicitly specify the intention to > have value of a standard resource auto-detected (at least with the JSON > input). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6219) Improve auto-detection of predefined resource types.
[ https://issues.apache.org/jira/browse/MESOS-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-6219: -- Summary: Improve auto-detection of predefined resource types. (was: Improve auto-detection of built-in resources) > Improve auto-detection of predefined resource types. > > > Key: MESOS-6219 > URL: https://issues.apache.org/jira/browse/MESOS-6219 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Yan Xu > > Mesos agent currently auto-detects certain standard resources when they are > not specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 > is adding auto-detection for mount disks when the size is specified as a > special value "0". > Due to the limitation of the {{Resources}} abstraction the caller of > {{Resources::parse()}} can't tell if a resource is intentionally specified > with an empty value or unspecified. The current resource auto-detection in > {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag > to check if resource is specified, this is very fragile, e.g., it would think > {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with > its root being {{/biggpush}}. > It would be nice if we can have the user explicitly specify the intention to > have value of a standard resource auto-detected (at least with the JSON > input). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6219) Improve auto-detection of built-in resources
Yan Xu created MESOS-6219: - Summary: Improve auto-detection of built-in resources Key: MESOS-6219 URL: https://issues.apache.org/jira/browse/MESOS-6219 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Yan Xu Mesos agent currently auto-detects certain standard resources when they are not specified. These include cpus, mem, (root) disk, gpus and also MESOS-6062 is adding auto-detection for mount disks when the size is specified as a special value "0". Due to the limitation of the {{Resources}} abstraction the caller of {{Resources::parse()}} can't tell if a resource is intentionally specified with an empty value or unspecified. The current resource auto-detection in {{Containerizer::resources()}} resorts to scanning the {{--resources}} flag to check if resource is specified, this is very fragile, e.g., it would think {{gpus:0}} is specified (ant not auto-detected) if there is a mount disk with its root being {{/biggpush}}. It would be nice if we can have the user explicitly specify the intention to have value of a standard resource auto-detected (at least with the JSON input). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.
[ https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510671#comment-15510671 ] haosdent commented on MESOS-5070: - Hi, [~benjaminhindman] Thanks for your comment! {quote} (1) This implementation will never enter a pid namespace properly and there's no check that someone isn't passing in a pid namespace ... bug? {quote} I saw we didn't support enter pid namesapce in {{setns}} before. And I think {{mnt}} and {{net}} should be enough for health check although enter to all namespaces would be better. {quote} (2) This should not live in src/health-check/health_checker.cpp {quote} Yes, alexr told me we should add {{Subprocess::ChildHook::SETNS}} like [Subprocess::ChildHook::SUPERVISOR | https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/subprocess.cpp#L98] cc [~alexr] Please correct me if I understand wrong. > Introduce more flexible subprocess interface for child options. > --- > > Key: MESOS-5070 > URL: https://issues.apache.org/jira/browse/MESOS-5070 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: tech-debt > > We introduced a number of parameters to the subprocess interface with > MESOS-5049. > Adding all options explicitly to the subprocess interface makes it > inflexible. > We should investigate a flexible options, which still prevents arbitrary code > to be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra
[ https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510662#comment-15510662 ] Charles Allen commented on MESOS-6213: -- I'm really curious what changed in my build env that allowed this to pass :-/ > Build failure on macOS Sierra > - > > Key: MESOS-6213 > URL: https://issues.apache.org/jira/browse/MESOS-6213 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Charles Allen > > Building on OSX is giving the following error. > {code} > In file included from > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184: > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9: > error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first > deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() > from instead [-Werror,-Wdeprecated-declarations] > if (OSAtomicCompareAndSwap64Barrier( > ^ > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9: > note: > 'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated > here > boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t > __newValue, > ^ > {code} > Protobuf is not listed as a component so I just set it as {{build}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6218) Avoided to concat cgroup internally in subsystems.
[ https://issues.apache.org/jira/browse/MESOS-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-6218: --- Assignee: haosdent > Avoided to concat cgroup internally in subsystems. > -- > > Key: MESOS-6218 > URL: https://issues.apache.org/jira/browse/MESOS-6218 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent > > Now we using using {{path::join(flags.cgroups_root, containerId.value())}} to > concat cgroup internally in subsystems, we should avoid this and pass it to > the subsystems directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6218) Avoided to concat cgroup internally in subsystems.
[ https://issues.apache.org/jira/browse/MESOS-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-6218: Description: Now we using using {{path::join(flags.cgroups_root, containerId.value())}} to concat cgroup internally in subsystems, we should avoid this and pass it to the subsystems directly. > Avoided to concat cgroup internally in subsystems. > -- > > Key: MESOS-6218 > URL: https://issues.apache.org/jira/browse/MESOS-6218 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent > > Now we using using {{path::join(flags.cgroups_root, containerId.value())}} to > concat cgroup internally in subsystems, we should avoid this and pass it to > the subsystems directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6218) Avoided to concat cgroup internally in subsystems.
[ https://issues.apache.org/jira/browse/MESOS-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-6218: Issue Type: Improvement (was: Task) > Avoided to concat cgroup internally in subsystems. > -- > > Key: MESOS-6218 > URL: https://issues.apache.org/jira/browse/MESOS-6218 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6218) Avoided to concat cgroup internally in subsystems.
haosdent created MESOS-6218: --- Summary: Avoided to concat cgroup internally in subsystems. Key: MESOS-6218 URL: https://issues.apache.org/jira/browse/MESOS-6218 Project: Mesos Issue Type: Task Reporter: haosdent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.
[ https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510527#comment-15510527 ] Jie Yu commented on MESOS-5070: --- commit 3c730dc42e35d13dbeaf69ce391766d4ba7ba962 Author: Joerg SchadDate: Wed Sep 21 09:50:43 2016 -0700 Converted watchdog option into childhook in Mesos [2/2]. Review: https://reviews.apache.org/r/52121/ commit 6d58c241716244d29a932440eabed31dccb639cf Author: Joerg Schad Date: Wed Sep 21 09:50:39 2016 -0700 Converted watchdog option into a childhook in libprocess [1/2]. Review: https://reviews.apache.org/r/52120/ commit f4be028f5eb8c04a2c13a58863896ad4f571d541 Author: Joerg Schad Date: Wed Sep 21 09:50:36 2016 -0700 Renamed Hook to ParentHook in Mesos [2/2]. Review: https://reviews.apache.org/r/52018/ commit 6a4a4f1a29301c1bac74042138f6bb3428adc9f3 Author: Joerg Schad Date: Wed Sep 21 09:50:33 2016 -0700 Renamed Hook to parent Hook in libprocess [1/2]. Review: https://reviews.apache.org/r/52017/ commit 1db3bbb1692406f6395a21624d2041f220eca744 Author: Joerg Schad Date: Wed Sep 21 09:50:30 2016 -0700 Replaces Hook::None() by {} in Mesos [2/2]. Review: https://reviews.apache.org/r/52016/ commit 2af7e5ebae976735d263f45e540c886111d27982 Author: Joerg Schad Date: Wed Sep 21 09:50:26 2016 -0700 Used {} instead of Hook::None() in libprocess [1/2]. Review: https://reviews.apache.org/r/52015/ commit 059f47bfe82c2c589cefbf0bcab131697cb0d9f9 Author: Joerg Schad Date: Wed Sep 21 09:50:19 2016 -0700 Used ChildHooks in Mesos [2/2]. We now use the new ChildHooks instead of explicit options such as setsid. Review: https://reviews.apache.org/r/45492/ commit 5ce0e46aeb083de1af09d53364ac7260441e9e94 Author: Joerg Schad Date: Wed Sep 21 09:50:15 2016 -0700 Refactored subprocess options [1/2]. Previously the subprocess interface supported a several options for the child process such as setsid. In order to make the interface more flexible we refactored such options into a vector of ChildHooks. In order not to allow arbitrary code inside a ChildHook it has to be constructed via pre-defined factory methods. Review: https://reviews.apache.org/r/45491/ > Introduce more flexible subprocess interface for child options. > --- > > Key: MESOS-5070 > URL: https://issues.apache.org/jira/browse/MESOS-5070 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: tech-debt > > We introduced a number of parameters to the subprocess interface with > MESOS-5049. > Adding all options explicitly to the subprocess interface makes it > inflexible. > We should investigate a flexible options, which still prevents arbitrary code > to be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.
[ https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510522#comment-15510522 ] Jie Yu commented on MESOS-5070: --- That's still tricky because ns::enter is in src/linux/ns.hpp. So we need to pull ns functions to stout to make it work. > Introduce more flexible subprocess interface for child options. > --- > > Key: MESOS-5070 > URL: https://issues.apache.org/jira/browse/MESOS-5070 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: tech-debt > > We introduced a number of parameters to the subprocess interface with > MESOS-5049. > Adding all options explicitly to the subprocess interface makes it > inflexible. > We should investigate a flexible options, which still prevents arbitrary code > to be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.
[ https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510520#comment-15510520 ] Jie Yu commented on MESOS-5070: --- I think we should hide parent_hooks, child_hooks in the subprocess interface on windows platforms (i.e., #ifdef) so the windows code don't accidentally touch it, expecting it to work. In long term, I agree with [~js84] that: since we don't allow arbitrary child hooks, we may be able to support that on windows. We just need to convert child hooks to proper 'CreateProcess' parameter. > Introduce more flexible subprocess interface for child options. > --- > > Key: MESOS-5070 > URL: https://issues.apache.org/jira/browse/MESOS-5070 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: tech-debt > > We introduced a number of parameters to the subprocess interface with > MESOS-5049. > Adding all options explicitly to the subprocess interface makes it > inflexible. > We should investigate a flexible options, which still prevents arbitrary code > to be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5070) Introduce more flexible subprocess interface for child options.
[ https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510435#comment-15510435 ] Joerg Schad edited comment on MESOS-5070 at 9/21/16 4:51 PM: - The current state is that they only work on linux (windows also ignores the explicit options right now and as you pointed out does not care about parentHooks). If we are are talking about adding windows support in the future childHooks have one advantage over parentHooks in my opinion: While parentHooks can be arbitrary functions, childHooks are constrained (via the factory methods) to a small set of predefined hooks to which we could add some form of id and enable windows to implement its own version of it. Maybe [~jieyu] can add more details here. was (Author: js84): The current state is that they only work on linux (windows also ignores the explicit options right now and as you pointed out does not care about parentHooks). If we are are talking about adding windows support in the future childHooks have one advantage over parentHooks in my opinion: While parentHooks can be arbitrary functions, childHooks are constrained (via the factory methods) to a small set of predefined hooks -to which we could add some form of id and enable windows to implement its own version of it-. Maybe [~jieyu] can add more details here. > Introduce more flexible subprocess interface for child options. > --- > > Key: MESOS-5070 > URL: https://issues.apache.org/jira/browse/MESOS-5070 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: tech-debt > > We introduced a number of parameters to the subprocess interface with > MESOS-5049. > Adding all options explicitly to the subprocess interface makes it > inflexible. > We should investigate a flexible options, which still prevents arbitrary code > to be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6217) PAGE_SIZE was not declared in PPC64LE
[ https://issues.apache.org/jira/browse/MESOS-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-6217: Story Points: 1 Labels: cgroups ppc64 (was: ) > PAGE_SIZE was not declared in PPC64LE > - > > Key: MESOS-6217 > URL: https://issues.apache.org/jira/browse/MESOS-6217 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > Labels: cgroups, ppc64 > > When compile Mesos in PPC64LE, get this error > {code} > ../../src/slave/containerizer/mesos/isolators/gpu/isolator.cpp -fPIC -DPIC > -o > slave/containerizer/mesos/isolators/gpu/.libs/libmesos_no_3rdparty_la-isolator.o > ../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp: > In member function 'virtual process::Future > mesos::internal::slave::MemorySubsystem::update(const mesos::ContainerID&, > const mesos::Resources&)': > ../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp:230:55: > error: 'PAGE_SIZE' was not declared in this scope >Bytes initialLimit(static_cast(LONG_MAX / PAGE_SIZE * > PAGE_SIZE)); >^ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.
[ https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510447#comment-15510447 ] Benjamin Hindman commented on MESOS-5070: - [~alexr], [~jieyu], [~haosdent], regarding {{mesos::internal::health::cloneWithSetns}} function: (1) This implementation will never enter a pid namespace properly and there's no check that someone isn't passing in a pid namespace ... bug? (2) This should not live in {{src/health-check/health_checker.cpp}} as it's a generic function that others probably want to reuse. In fact, it's very reminiscent to the {{ns::enter}} function we recently wrote for the nested containerization stuff that was later replaced with {{ns::clone}}, and I'd rather us reintroduce a generic {{ns::enter}} that lots of people can use rather than implement one-offs throughout the code base. > Introduce more flexible subprocess interface for child options. > --- > > Key: MESOS-5070 > URL: https://issues.apache.org/jira/browse/MESOS-5070 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: tech-debt > > We introduced a number of parameters to the subprocess interface with > MESOS-5049. > Adding all options explicitly to the subprocess interface makes it > inflexible. > We should investigate a flexible options, which still prevents arbitrary code > to be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.
[ https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510435#comment-15510435 ] Joerg Schad commented on MESOS-5070: The current state is that they only work on linux (windows also ignores the explicit options right now and as you pointed out does not care about parentHooks). If we are are talking about adding windows support in the future childHooks have one advantage over parentHooks in my opinion: While parentHooks can be arbitrary functions, childHooks are constrained (via the factory methods) to a small set of predefined hooks -to which we could add some form of id and enable windows to implement its own version of it-. Maybe [~jieyu] can add more details here. > Introduce more flexible subprocess interface for child options. > --- > > Key: MESOS-5070 > URL: https://issues.apache.org/jira/browse/MESOS-5070 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: tech-debt > > We introduced a number of parameters to the subprocess interface with > MESOS-5049. > Adding all options explicitly to the subprocess interface makes it > inflexible. > We should investigate a flexible options, which still prevents arbitrary code > to be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6217) PAGE_SIZE was not declared in PPC64LE
haosdent created MESOS-6217: --- Summary: PAGE_SIZE was not declared in PPC64LE Key: MESOS-6217 URL: https://issues.apache.org/jira/browse/MESOS-6217 Project: Mesos Issue Type: Bug Reporter: haosdent Assignee: haosdent When compile Mesos in PPC64LE, get this error {code} ../../src/slave/containerizer/mesos/isolators/gpu/isolator.cpp -fPIC -DPIC -o slave/containerizer/mesos/isolators/gpu/.libs/libmesos_no_3rdparty_la-isolator.o ../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp: In member function 'virtual process::Future mesos::internal::slave::MemorySubsystem::update(const mesos::ContainerID&, const mesos::Resources&)': ../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp:230:55: error: 'PAGE_SIZE' was not declared in this scope Bytes initialLimit(static_cast(LONG_MAX / PAGE_SIZE * PAGE_SIZE)); ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6063) Track recovered and prepared subsystems for a container
[ https://issues.apache.org/jira/browse/MESOS-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-6063: Description: Currently, when we restart Mesos Agent with different cgroups subsystems, the exist containers would recover failed on newly added subsystems. In this case, we ignore them and continue to perform {{usage}}, {{status}} and {{cleanup}} on them. It would be better that we track recovered and prepared subsystems for a container. Then ignore perform {{update}}, {{wait}}, {{usage}}, {{status}} on them. (was: Currently, when we restart Mesos Agent with different cgroups subsystems, the exist containers would recover failed on newly added subsystems. In this case, we ignore them and continue to perform `usage`, `status` and `cleanup` on them. It would be better that we track recovered and prepared subsystems for a container. Then ignore perform `update`, `wait`, `usage`, `status` on them.) > Track recovered and prepared subsystems for a container > --- > > Key: MESOS-6063 > URL: https://issues.apache.org/jira/browse/MESOS-6063 > Project: Mesos > Issue Type: Improvement > Components: cgroups >Reporter: haosdent >Assignee: haosdent > Labels: cgroups > > Currently, when we restart Mesos Agent with different cgroups subsystems, the > exist containers would recover failed on newly added subsystems. In this > case, we ignore them and continue to perform {{usage}}, {{status}} and > {{cleanup}} on them. It would be better that we track recovered and prepared > subsystems for a container. Then ignore perform {{update}}, {{wait}}, > {{usage}}, {{status}} on them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5070) Introduce more flexible subprocess interface for child options.
[ https://issues.apache.org/jira/browse/MESOS-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510413#comment-15510413 ] Benjamin Hindman commented on MESOS-5070: - [~joerg84]: was just looking over some of the reviews for this. Was the assumption that adding child hooks would only be for Linux? Even the current parent hooks don't work for Windows IIUC and it's unlikely that we'll EVER be able to do child hooks on Windows ... > Introduce more flexible subprocess interface for child options. > --- > > Key: MESOS-5070 > URL: https://issues.apache.org/jira/browse/MESOS-5070 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: tech-debt > > We introduced a number of parameters to the subprocess interface with > MESOS-5049. > Adding all options explicitly to the subprocess interface makes it > inflexible. > We should investigate a flexible options, which still prevents arbitrary code > to be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2564) Kill superfluous forward declaration comments.
[ https://issues.apache.org/jira/browse/MESOS-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510409#comment-15510409 ] Alexander Rukletsov commented on MESOS-2564: https://reviews.apache.org/r/32608/ > Kill superfluous forward declaration comments. > -- > > Key: MESOS-2564 > URL: https://issues.apache.org/jira/browse/MESOS-2564 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Priority: Minor > Labels: easyfix, newbie > > We often prepend forward declarations with a comment, which is pretty > useless, e.g.: > {code} > // Forward declarations. > class LogStorageProcess; > {code} > or > {code} > // Forward declarations. > namespace registry { > class Slaves; > } > class Authorizer; > class WhitelistWatcher; > {code} > This JIRA aims to clean up such comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
[ https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510107#comment-15510107 ] Qian Zhang commented on MESOS-6215: --- RR: https://reviews.apache.org/r/52118/ > Add support for opaque whiteout (.wh..wh..opq) in provisioner > - > > Key: MESOS-6215 > URL: https://issues.apache.org/jira/browse/MESOS-6215 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Attachments: whiteout.diff > > > In a Docker image, there can be a opaque whiteout entry (a file with the name > {{.wh..wh..opq}}) under a directory which indicates all siblings under that > directory should be removed. But currently Mesos provisioner does not support > to handle such opaque whiteout entry which will cause launching container > with some Docker images fails, e.g.: > {code} > $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=rabbitmq --command="sleep 100" > I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 > I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at > master@192.168.122.171:5050 > Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- > Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to remove whiteout file > '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': > No such file or directory' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} > Check OCI image spec for more details about opaque whiteout: > https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
[ https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509895#comment-15509895 ] Qian Zhang commented on MESOS-6215: --- That is because we wrongly cut the {{.wh.}} from {{.wh..wh..opq}} with {{substr()}} in Mesos provisioner's code: https://github.com/apache/mesos/blob/1.0.1/src/slave/containerizer/mesos/provisioner/provisioner.cpp#L359:L360 > Add support for opaque whiteout (.wh..wh..opq) in provisioner > - > > Key: MESOS-6215 > URL: https://issues.apache.org/jira/browse/MESOS-6215 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Attachments: whiteout.diff > > > In a Docker image, there can be a opaque whiteout entry (a file with the name > {{.wh..wh..opq}}) under a directory which indicates all siblings under that > directory should be removed. But currently Mesos provisioner does not support > to handle such opaque whiteout entry which will cause launching container > with some Docker images fails, e.g.: > {code} > $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=rabbitmq --command="sleep 100" > I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 > I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at > master@192.168.122.171:5050 > Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- > Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to remove whiteout file > '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': > No such file or directory' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} > Check OCI image spec for more details about opaque whiteout: > https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
[ https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509466#comment-15509466 ] Stéphane Cottin commented on MESOS-6215: >From the example on top : lists/partial/.wh..opq': No such file or directory > Add support for opaque whiteout (.wh..wh..opq) in provisioner > - > > Key: MESOS-6215 > URL: https://issues.apache.org/jira/browse/MESOS-6215 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Attachments: whiteout.diff > > > In a Docker image, there can be a opaque whiteout entry (a file with the name > {{.wh..wh..opq}}) under a directory which indicates all siblings under that > directory should be removed. But currently Mesos provisioner does not support > to handle such opaque whiteout entry which will cause launching container > with some Docker images fails, e.g.: > {code} > $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=rabbitmq --command="sleep 100" > I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 > I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at > master@192.168.122.171:5050 > Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- > Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to remove whiteout file > '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': > No such file or directory' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} > Check OCI image spec for more details about opaque whiteout: > https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
[ https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509466#comment-15509466 ] Stéphane Cottin edited comment on MESOS-6215 at 9/21/16 10:11 AM: -- >From the example on top : {{lists/partial/.wh..opq': No such file or >directory}} was (Author: kaalh): >From the example on top : lists/partial/.wh..opq': No such file or directory > Add support for opaque whiteout (.wh..wh..opq) in provisioner > - > > Key: MESOS-6215 > URL: https://issues.apache.org/jira/browse/MESOS-6215 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Attachments: whiteout.diff > > > In a Docker image, there can be a opaque whiteout entry (a file with the name > {{.wh..wh..opq}}) under a directory which indicates all siblings under that > directory should be removed. But currently Mesos provisioner does not support > to handle such opaque whiteout entry which will cause launching container > with some Docker images fails, e.g.: > {code} > $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=rabbitmq --command="sleep 100" > I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 > I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at > master@192.168.122.171:5050 > Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- > Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to remove whiteout file > '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': > No such file or directory' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} > Check OCI image spec for more details about opaque whiteout: > https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6209) Containers that use the Mesos containerizer but don't want to provision a container image fail to validate.
[ https://issues.apache.org/jira/browse/MESOS-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509423#comment-15509423 ] Jan Schlicht commented on MESOS-6209: - Looks like JIRA created this ticket twice. MESOS-6208 is the canonical ticket. > Containers that use the Mesos containerizer but don't want to provision a > container image fail to validate. > --- > > Key: MESOS-6209 > URL: https://issues.apache.org/jira/browse/MESOS-6209 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Mesos HEAD, change was introduced with > e65f580bf0cbea64cedf521cf169b9b4c9f85454 >Reporter: Jan Schlicht > > Tasks using features like volumes or CNI in their containers, have to define > these in {{TaskInfo.container}}. When these tasks don't want/need to > provision a container image, neither {{ContainerInfo.docker}} nor > {{ContainerInfo.mesos}} will be set. Nevertheless, the container type in > {{ContainerInfo.type}} needs to be set, because it is a required field. > In that case, the recently introduced validation rules in > {{master/validation.cpp}} ({{validateContainerInfo}} will fail, which isn't > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
[ https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509242#comment-15509242 ] Qian Zhang commented on MESOS-6215: --- Thanks [~kaalh], but I think the filename with Docker image is also {{.wh..wh..opq}}, please check the code below for details: https://github.com/docker/docker/blob/v1.12.1/pkg/archive/whiteouts.go#L23 > Add support for opaque whiteout (.wh..wh..opq) in provisioner > - > > Key: MESOS-6215 > URL: https://issues.apache.org/jira/browse/MESOS-6215 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Attachments: whiteout.diff > > > In a Docker image, there can be a opaque whiteout entry (a file with the name > {{.wh..wh..opq}}) under a directory which indicates all siblings under that > directory should be removed. But currently Mesos provisioner does not support > to handle such opaque whiteout entry which will cause launching container > with some Docker images fails, e.g.: > {code} > $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=rabbitmq --command="sleep 100" > I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 > I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at > master@192.168.122.171:5050 > Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- > Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to remove whiteout file > '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': > No such file or directory' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} > Check OCI image spec for more details about opaque whiteout: > https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6216) LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv
[ https://issues.apache.org/jira/browse/MESOS-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-6216: Description: {{LibeventSSLSocketImpl::create}} is called whenever a potentially ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} is used to set up SSL-related libprocess environment variables {{LIBPROCESS_SSL_*}}. Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, any calling of functions like {{os::getenv}} (or via {{os::environment}}) concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} performs unsynchronized r/w access to the same data structure in the runtime. We usually perform most setup of the environment before we start the libprocess runtime with {{process::initialize}} from a {{main}} function, see e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It appears that we should move the setup of libprocess' SSL environment variables to a similar spot. was: {{LibeventSSLSocketImpl::create}} is called whenever a potentially ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} is used to set up SSL-related libprocess environment variables {{LIBPROCESS_SSL_*}}. Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, any code calling functions like {{os::getenv}} (via or {{os::environment}}) concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} perform unsynchronized r/w access to the same data structure in the runtime. We usually perform most setup of the environment before we start the libprocess runtime with {{process::initialize}} from a {{main}} function, see e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It appears that we should move the setup of libprocess' SSL environment variables to a similar spot. > LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv > -- > > Key: MESOS-6216 > URL: https://issues.apache.org/jira/browse/MESOS-6216 > Project: Mesos > Issue Type: Bug > Components: security >Reporter: Benjamin Bannier > Attachments: build.log > > > {{LibeventSSLSocketImpl::create}} is called whenever a potentially > ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which > calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} > is used to set up SSL-related libprocess environment variables > {{LIBPROCESS_SSL_*}}. > Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, > any calling of functions like {{os::getenv}} (or via {{os::environment}}) > concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} > performs unsynchronized r/w access to the same data structure in the runtime. > We usually perform most setup of the environment before we start the > libprocess runtime with {{process::initialize}} from a {{main}} function, see > e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It > appears that we should move the setup of libprocess' SSL environment > variables to a similar spot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6216) LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv
[ https://issues.apache.org/jira/browse/MESOS-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-6216: Summary: LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv (was: LibeventSSLSocketImpl::create is not thread-safe, but used as if it were) > LibeventSSLSocketImpl::create is not safe to call concurrently with os::getenv > -- > > Key: MESOS-6216 > URL: https://issues.apache.org/jira/browse/MESOS-6216 > Project: Mesos > Issue Type: Bug > Components: security >Reporter: Benjamin Bannier > Attachments: build.log > > > {{LibeventSSLSocketImpl::create}} is called whenever a potentially > ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which > calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} > is used to set up SSL-related libprocess environment variables > {{LIBPROCESS_SSL_*}}. > Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, > any code calling functions like {{os::getenv}} (via or {{os::environment}}) > concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} > perform unsynchronized r/w access to the same data structure in the runtime. > We usually perform most setup of the environment before we start the > libprocess runtime with {{process::initialize}} from a {{main}} function, see > e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It > appears that we should move the setup of libprocess' SSL environment > variables to a similar spot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6213) Build failure on macOS Sierra
[ https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509208#comment-15509208 ] Jan Schlicht edited comment on MESOS-6213 at 9/21/16 8:33 AM: -- Still fails here. Mesos uses protobuf generated classes that use functions that have been marked as deprecated beginning with macOS Sierra. Normally this would result in a warning, but Mesos is compiled with {{-Werror}}, hence the build will fail. was (Author: nfnt): Still fails here. Mesos uses protobuf generated classes that use functions that have been marked as deprecated beginning with macOS Sierra. Normally this would result in a warning, but Mesos is compiled with {{-Werror}}, hence the build will fail. This has been [reported for Protocol Buffers|https://github.com/google/protobuf/issues/74]. > Build failure on macOS Sierra > - > > Key: MESOS-6213 > URL: https://issues.apache.org/jira/browse/MESOS-6213 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Charles Allen > > Building on OSX is giving the following error. > {code} > In file included from > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184: > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9: > error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first > deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() > from instead [-Werror,-Wdeprecated-declarations] > if (OSAtomicCompareAndSwap64Barrier( > ^ > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9: > note: > 'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated > here > boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t > __newValue, > ^ > {code} > Protobuf is not listed as a component so I just set it as {{build}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra
[ https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509208#comment-15509208 ] Jan Schlicht commented on MESOS-6213: - Still fails here. Mesos uses protobuf generated classes that use functions that have been marked as deprecated beginning with macOS Sierra. Normally this would result in a warning, but Mesos is compiled with {{-Werror}}, hence the build will fail. This has been [reported for Protocol Buffers|https://github.com/google/protobuf/issues/74]. > Build failure on macOS Sierra > - > > Key: MESOS-6213 > URL: https://issues.apache.org/jira/browse/MESOS-6213 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Charles Allen > > Building on OSX is giving the following error. > {code} > In file included from > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184: > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9: > error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first > deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() > from instead [-Werror,-Wdeprecated-declarations] > if (OSAtomicCompareAndSwap64Barrier( > ^ > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9: > note: > 'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated > here > boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t > __newValue, > ^ > {code} > Protobuf is not listed as a component so I just set it as {{build}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6216) LibeventSSLSocketImpl::create is not thread-safe, but used as if it were
[ https://issues.apache.org/jira/browse/MESOS-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-6216: Attachment: build.log Attached a trimmed build log showing a likely related issue, https://builds.apache.org/job/Mesos/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)/2658/changes. This build was configured with {{--enable-libevent --enable-ssl}} on centos-7. > LibeventSSLSocketImpl::create is not thread-safe, but used as if it were > > > Key: MESOS-6216 > URL: https://issues.apache.org/jira/browse/MESOS-6216 > Project: Mesos > Issue Type: Bug > Components: security >Reporter: Benjamin Bannier > Attachments: build.log > > > {{LibeventSSLSocketImpl::create}} is called whenever a potentially > ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which > calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} > is used to set up SSL-related libprocess environment variables > {{LIBPROCESS_SSL_*}}. > Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, > any code calling functions like {{os::getenv}} (via or {{os::environment}}) > concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} > perform unsynchronized r/w access to the same data structure in the runtime. > We usually perform most setup of the environment before we start the > libprocess runtime with {{process::initialize}} from a {{main}} function, see > e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It > appears that we should move the setup of libprocess' SSL environment > variables to a similar spot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6216) LibeventSSLSocketImpl::create is not thread-safe, but used as if it were
Benjamin Bannier created MESOS-6216: --- Summary: LibeventSSLSocketImpl::create is not thread-safe, but used as if it were Key: MESOS-6216 URL: https://issues.apache.org/jira/browse/MESOS-6216 Project: Mesos Issue Type: Bug Components: security Reporter: Benjamin Bannier {{LibeventSSLSocketImpl::create}} is called whenever a potentially ssl-enabled socket is created. It in turn calls {{openssl::initialize}} which calls a function {{reinitialize}} using {{os::setenv}}. Here {{os::setenv}} is used to set up SSL-related libprocess environment variables {{LIBPROCESS_SSL_*}}. Since {{os::setenv}} is not thread-safe just like the {{::setenv}} it wraps, any code calling functions like {{os::getenv}} (via or {{os::environment}}) concurrently with the first invocation of {{LibeventSSLSocketImpl::create}} perform unsynchronized r/w access to the same data structure in the runtime. We usually perform most setup of the environment before we start the libprocess runtime with {{process::initialize}} from a {{main}} function, see e.g., {{src/slave/main.cpp}} or {{src/master/main.cpp}} and others. It appears that we should move the setup of libprocess' SSL environment variables to a similar spot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra
[ https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509154#comment-15509154 ] Jan Schlicht commented on MESOS-6213: - I ran into the same problem after updating to macOS Sierra. The log indicates that this is due to some deprecated functions. I wouldn't expect that a reboot would solve it, but will try it out. For now my workaround was compiling with {{-Wno-deprecated-declarations}}. > Build failure on macOS Sierra > - > > Key: MESOS-6213 > URL: https://issues.apache.org/jira/browse/MESOS-6213 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Charles Allen > > Building on OSX is giving the following error. > {code} > In file included from > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184: > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9: > error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first > deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() > from instead [-Werror,-Wdeprecated-declarations] > if (OSAtomicCompareAndSwap64Barrier( > ^ > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9: > note: > 'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated > here > boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t > __newValue, > ^ > {code} > Protobuf is not listed as a component so I just set it as {{build}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6213) Build failure on macOS Sierra
[ https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht updated MESOS-6213: Summary: Build failure on macOS Sierra (was: Build failure on OSX) > Build failure on macOS Sierra > - > > Key: MESOS-6213 > URL: https://issues.apache.org/jira/browse/MESOS-6213 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Charles Allen > > Building on OSX is giving the following error. > {code} > In file included from > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184: > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9: > error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first > deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() > from instead [-Werror,-Wdeprecated-declarations] > if (OSAtomicCompareAndSwap64Barrier( > ^ > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9: > note: > 'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated > here > boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t > __newValue, > ^ > {code} > Protobuf is not listed as a component so I just set it as {{build}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
[ https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509136#comment-15509136 ] Stéphane Cottin commented on MESOS-6215: With docker images the filename seems to be {{.wh..opq}}, not {{.wh..wh..opq}} like in the OCI Image spec, anyway, the purpose is the same. > Add support for opaque whiteout (.wh..wh..opq) in provisioner > - > > Key: MESOS-6215 > URL: https://issues.apache.org/jira/browse/MESOS-6215 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Attachments: whiteout.diff > > > In a Docker image, there can be a opaque whiteout entry (a file with the name > {{.wh..wh..opq}}) under a directory which indicates all siblings under that > directory should be removed. But currently Mesos provisioner does not support > to handle such opaque whiteout entry which will cause launching container > with some Docker images fails, e.g.: > {code} > $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=rabbitmq --command="sleep 100" > I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 > I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at > master@192.168.122.171:5050 > Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- > Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to remove whiteout file > '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': > No such file or directory' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} > Check OCI image spec for more details about opaque whiteout: > https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
[ https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stéphane Cottin updated MESOS-6215: --- Attachment: whiteout.diff Temporary workaround for anyone blocked by this issue. > Add support for opaque whiteout (.wh..wh..opq) in provisioner > - > > Key: MESOS-6215 > URL: https://issues.apache.org/jira/browse/MESOS-6215 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > Attachments: whiteout.diff > > > In a Docker image, there can be a opaque whiteout entry (a file with the name > {{.wh..wh..opq}}) under a directory which indicates all siblings under that > directory should be removed. But currently Mesos provisioner does not support > to handle such opaque whiteout entry which will cause launching container > with some Docker images fails, e.g.: > {code} > $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=rabbitmq --command="sleep 100" > I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 > I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at > master@192.168.122.171:5050 > Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- > Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to remove whiteout file > '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': > No such file or directory' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} > Check OCI image spec for more details about opaque whiteout: > https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6002) The whiteout file cannot be removed correctly using aufs backend.
[ https://issues.apache.org/jira/browse/MESOS-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509074#comment-15509074 ] Qian Zhang commented on MESOS-6002: --- [~philwinder] and [~kaalh], I think what you reported is a different issue than what [~gilbert] reported in this ticket. I have created another ticket for your issue, please check https://issues.apache.org/jira/browse/MESOS-6215 for more details. > The whiteout file cannot be removed correctly using aufs backend. > - > > Key: MESOS-6002 > URL: https://issues.apache.org/jira/browse/MESOS-6002 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 14, Ubuntu 12 > Or any os with aufs module >Reporter: Gilbert Song >Assignee: Qian Zhang > Labels: aufs, backend, containerizer > Attachments: whiteout.diff > > > The whiteout file is not removed correctly when using the aufs backend in > unified containerizer. It can be verified by this unit test with the aufs > manually specified. > {noformat} > [20:11:24] : [Step 10/10] [ RUN ] > ProvisionerDockerPullerTest.ROOT_INTERNET_CURL_Whiteout > [20:11:24]W: [Step 10/10] I0805 20:11:24.986734 24295 cluster.cpp:155] > Creating default 'local' authorizer > [20:11:25]W: [Step 10/10] I0805 20:11:25.001153 24295 leveldb.cpp:174] > Opened db in 14.308627ms > [20:11:25]W: [Step 10/10] I0805 20:11:25.003731 24295 leveldb.cpp:181] > Compacted db in 2.558329ms > [20:11:25]W: [Step 10/10] I0805 20:11:25.003749 24295 leveldb.cpp:196] > Created db iterator in 3086ns > [20:11:25]W: [Step 10/10] I0805 20:11:25.003754 24295 leveldb.cpp:202] > Seeked to beginning of db in 595ns > [20:11:25]W: [Step 10/10] I0805 20:11:25.003758 24295 leveldb.cpp:271] > Iterated through 0 keys in the db in 314ns > [20:11:25]W: [Step 10/10] I0805 20:11:25.003769 24295 replica.cpp:776] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [20:11:25]W: [Step 10/10] I0805 20:11:25.004086 24315 recover.cpp:451] > Starting replica recovery > [20:11:25]W: [Step 10/10] I0805 20:11:25.004251 24312 recover.cpp:477] > Replica is in EMPTY status > [20:11:25]W: [Step 10/10] I0805 20:11:25.004546 24314 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > __req_res__(5640)@172.30.2.105:36006 > [20:11:25]W: [Step 10/10] I0805 20:11:25.004607 24312 recover.cpp:197] > Received a recover response from a replica in EMPTY status > [20:11:25]W: [Step 10/10] I0805 20:11:25.004762 24313 recover.cpp:568] > Updating replica status to STARTING > [20:11:25]W: [Step 10/10] I0805 20:11:25.004776 24314 master.cpp:375] > Master 21665992-d47e-402f-a00c-6f8fab613019 (ip-172-30-2-105.mesosphere.io) > started on 172.30.2.105:36006 > [20:11:25]W: [Step 10/10] I0805 20:11:25.004787 24314 master.cpp:377] Flags > at startup: --acls="" --agent_ping_timeout="15secs" > --agent_reregister_timeout="10mins" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --authenticate_http_readwrite="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/0z753P/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" > --registry_strict="true" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/0z753P/master" --zk_session_timeout="10secs" > [20:11:25]W: [Step 10/10] I0805 20:11:25.004920 24314 master.cpp:427] > Master only allowing authenticated frameworks to register > [20:11:25]W: [Step 10/10] I0805 20:11:25.004930 24314 master.cpp:441] > Master only allowing authenticated agents to register > [20:11:25]W: [Step 10/10] I0805 20:11:25.004935 24314 master.cpp:454] > Master only allowing authenticated HTTP frameworks to register > [20:11:25]W: [Step 10/10] I0805 20:11:25.004942 24314 credentials.hpp:37] > Loading credentials for authentication from '/tmp/0z753P/credentials' > [20:11:25]W: [Step 10/10] I0805 20:11:25.005018 24314 master.cpp:499] Using > default 'crammd5' authenticator > [20:11:25]W: [Step 10/10] I0805 20:11:25.005101 24314 http.cpp:883] Using > default 'basic'
[jira] [Updated] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
[ https://issues.apache.org/jira/browse/MESOS-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang updated MESOS-6215: -- Shepherd: Jie Yu > Add support for opaque whiteout (.wh..wh..opq) in provisioner > - > > Key: MESOS-6215 > URL: https://issues.apache.org/jira/browse/MESOS-6215 > Project: Mesos > Issue Type: Bug >Reporter: Qian Zhang >Assignee: Qian Zhang > > In a Docker image, there can be a opaque whiteout entry (a file with the name > {{.wh..wh..opq}}) under a directory which indicates all siblings under that > directory should be removed. But currently Mesos provisioner does not support > to handle such opaque whiteout entry which will cause launching container > with some Docker images fails, e.g.: > {code} > $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test > --docker_image=rabbitmq --command="sleep 100" > I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 > I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at > master@192.168.122.171:5050 > Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- > Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to remove whiteout file > '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': > No such file or directory' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} > Check OCI image spec for more details about opaque whiteout: > https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6215) Add support for opaque whiteout (.wh..wh..opq) in provisioner
Qian Zhang created MESOS-6215: - Summary: Add support for opaque whiteout (.wh..wh..opq) in provisioner Key: MESOS-6215 URL: https://issues.apache.org/jira/browse/MESOS-6215 Project: Mesos Issue Type: Bug Reporter: Qian Zhang Assignee: Qian Zhang In a Docker image, there can be a opaque whiteout entry (a file with the name {{.wh..wh..opq}}) under a directory which indicates all siblings under that directory should be removed. But currently Mesos provisioner does not support to handle such opaque whiteout entry which will cause launching container with some Docker images fails, e.g.: {code} $ sudo src/mesos-execute --master=192.168.122.171:5050 --name=test --docker_image=rabbitmq --command="sleep 100" I0921 09:22:05.167716 15522 scheduler.cpp:176] Version: 1.1.0 I0921 09:22:05.172436 15541 scheduler.cpp:465] New master detected at master@192.168.122.171:5050 Subscribed with ID 7ab88509-c068-46b3-b8be-4817e5170a7e- Submitted task 'test' to agent '7ab88509-c068-46b3-b8be-4817e5170a7e-S0' Received status update TASK_FAILED for task 'test' message: 'Failed to launch container: Failed to remove whiteout file '/opt/mesos/provisioner/containers/2c4ed860-6256-4fa7-899b-9989d856dab7/backends/copy/rootfses/62e38280-1fd5-4fa7-8707-b19bdc24ae96/var/lib/apt/lists/partial/.wh..opq': No such file or directory' source: SOURCE_AGENT reason: REASON_CONTAINER_LAUNCH_FAILED {code} Check OCI image spec for more details about opaque whiteout: https://github.com/opencontainers/image-spec/blob/master/layer.md#opaque-whiteout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early
[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508943#comment-15508943 ] haosdent commented on MESOS-6180: - Awesome! Thanks a lot for your helps! > Several tests are flaky, with futures timing out early > -- > > Key: MESOS-6180 > URL: https://issues.apache.org/jira/browse/MESOS-6180 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Greg Mann >Assignee: haosdent > Labels: mesosphere, tests > Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, > CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log, > RoleTest.ImplicitRoleRegister.txt, > flaky-containerizer-pid-namespace-backward.txt, > flaky-containerizer-pid-namespace-forward.txt > > > Following the merging of a large patch chain, it was noticed on our internal > CI that several tests had become flaky, with a similar pattern in the > failures: the tests fail early when a future times out. Often, this occurs > when a test cluster is being spun up and one of the offer futures times out. > This has been observed in the following tests: > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward > * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch > * RoleTest.ImplicitRoleRegister > * SlaveRecoveryTest/0.MultipleFrameworks > * SlaveRecoveryTest/0.ReconcileShutdownFramework > * SlaveTest.ContainerizerUsageFailure > * MesosSchedulerDriverTest.ExplicitAcknowledgements > * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164) > * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165) > * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166) > See the linked JIRAs noted above for individual tickets addressing a couple > of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)