[jira] [Commented] (MESOS-4428) Get only running tasks from mesos api
[ https://issues.apache.org/jira/browse/MESOS-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106542#comment-15106542 ] Guangya Liu commented on MESOS-4428: I think that you can take a look at MESOS-3307 > Get only running tasks from mesos api > - > > Key: MESOS-4428 > URL: https://issues.apache.org/jira/browse/MESOS-4428 > Project: Mesos > Issue Type: Wish > Components: json api >Reporter: Tymofii >Priority: Trivial > > We're using /state.json for service discovery in our environment. Our > mesas-consul bridge reads /state.json from current leader and then parses it > to register all running tasks in Consul. > When using Spark framework it generates a lot of tasks, which all goes to the > /state.json file as finished. The file itself can grow very large in couple > days of work. > Is there any way to get only running tasks from mesos leader right now? > If there's not, can you add such possibility? > Or maybe you'll suggest using different approach for service discovery? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3361) Update MesosContainerizer to dynamically pick/enable isolators
[ https://issues.apache.org/jira/browse/MESOS-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-3361: -- Shepherd: (was: Niklas Quarfot Nielsen) > Update MesosContainerizer to dynamically pick/enable isolators > -- > > Key: MESOS-3361 > URL: https://issues.apache.org/jira/browse/MESOS-3361 > Project: Mesos > Issue Type: Task >Reporter: Kapil Arya >Assignee: Kapil Arya > Labels: mesosphere > > This would allow the frameworks to opt-in/opt-out of network isolation per > container. Thus, one can launch some containers with their own IPs while > other containers still share the host IP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3358) Add TaskStatus label decorator hooks for Master
[ https://issues.apache.org/jira/browse/MESOS-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-3358: -- Shepherd: (was: Niklas Quarfot Nielsen) > Add TaskStatus label decorator hooks for Master > --- > > Key: MESOS-3358 > URL: https://issues.apache.org/jira/browse/MESOS-3358 > Project: Mesos > Issue Type: Task >Reporter: Kapil Arya >Assignee: Kapil Arya > Labels: mesosphere > > The hook will be triggered when Master receives TaskStatus message from Agent > or when the Master itself generates a TASK_LOST status. The hook should also > provide a list of the previous TaskStatuses to the module. > The use case is to allow a "cleanup" module to release IPs if an agent is > lost. The previous statuses will contain the IP address(es) to be released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3362) Allow Isolators to advertise "capabilities" via SlaveInfo
[ https://issues.apache.org/jira/browse/MESOS-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-3362: -- Shepherd: (was: Niklas Quarfot Nielsen) > Allow Isolators to advertise "capabilities" via SlaveInfo > - > > Key: MESOS-3362 > URL: https://issues.apache.org/jira/browse/MESOS-3362 > Project: Mesos > Issue Type: Task >Reporter: Kapil Arya >Assignee: Kapil Arya > Labels: mesosphere > > A network-isolator module can thus advertise that it can assign per-container > IP and can provide network-isolation. > The SlaveInfo protobuf will be extended to include "Capabilities" similar to > FrameworkInfo::Capabilities. > The isolator interface needs to be extended to create `info()` that return a > `IsolatorInfo` message. The `IsolatorInfo` message can include "Capabilities" > to be sent to Frameworks as part of SlaveInfo. > The Isolator::info() interface will be used by Slave during initialization to > compile SlaveInfo::Capabilities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3740) LIBPROCESS_IP not passed to Docker containers
[ https://issues.apache.org/jira/browse/MESOS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-3740: -- Shepherd: (was: Niklas Quarfot Nielsen) > LIBPROCESS_IP not passed to Docker containers > - > > Key: MESOS-3740 > URL: https://issues.apache.org/jira/browse/MESOS-3740 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0 > Environment: Mesos 0.24.1 >Reporter: Cody Maloney > Labels: mesosphere > > Docker containers aren't currently passed all the same environment variables > that Mesos Containerizer tasks are. See: > https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L254 > for all the environment variables explicitly set for mesos containers. > While some of them don't necessarily make sense for docker containers, when > the docker has inside of it a libprocess process (A mesos framework > scheduler) and is using {{--net=host}} the task needs to have LIBPROCESS_IP > set otherwise the same sort of problems that happen because of MESOS-3553 can > happen (libprocess will try to guess the machine's IP address with likely bad > results in a number of operating environment). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3585) Add a test module for ip-per-container support
[ https://issues.apache.org/jira/browse/MESOS-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-3585: -- Shepherd: (was: Niklas Quarfot Nielsen) > Add a test module for ip-per-container support > -- > > Key: MESOS-3585 > URL: https://issues.apache.org/jira/browse/MESOS-3585 > Project: Mesos > Issue Type: Task >Reporter: Kapil Arya >Assignee: Kapil Arya > Labels: mesosphere > > With the addition of {{NetworkInfo}} to allow frameworks to request > IP-per-container for their tasks, we should add a simple module that mimics > the behavior of a real network-isolation module for testing purposes. We can > then add this module in {{src/examples}} and write some tests against it. > This module can also serve as a template module for third-party network > isolation provides for building their own network isolator modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2646) Update Master to send revocable resources in separate offers
[ https://issues.apache.org/jira/browse/MESOS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106566#comment-15106566 ] Niklas Quarfot Nielsen commented on MESOS-2646: --- [~JamesYongQiaoWang] Sorry about the delay. Do you still have capacity for some oversubscription work? > Update Master to send revocable resources in separate offers > > > Key: MESOS-2646 > URL: https://issues.apache.org/jira/browse/MESOS-2646 > Project: Mesos > Issue Type: Improvement >Reporter: Vinod Kone >Assignee: Yongqiao Wang > Labels: twitter > Attachments: code-diff.txt > > > Master will send separate offers for revocable and non-revocable/regular > resources. This allows master to rescind revocable offers (e.g, when a new > oversubscribed resources estimate comes from the slave) without impacting > regular offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2688) Slave should kill revocable tasks if oversubscription is disabled
[ https://issues.apache.org/jira/browse/MESOS-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106561#comment-15106561 ] Niklas Quarfot Nielsen commented on MESOS-2688: --- [~bmahler] - so, is that an OK for doing it on the slave? :) > Slave should kill revocable tasks if oversubscription is disabled > - > > Key: MESOS-2688 > URL: https://issues.apache.org/jira/browse/MESOS-2688 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Jie Yu > Labels: twitter > > If oversubscription is disabled on a restarted slave (that had it previously > enabled), it should kill revocable tasks. > Slave knows this information from the Resources of a container that it > checkpoints and recovers. > Add a new reason OVERSUBSCRIPTION_DISABLED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2695) Add master flag to enable/disable oversubscription
[ https://issues.apache.org/jira/browse/MESOS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106565#comment-15106565 ] Niklas Quarfot Nielsen commented on MESOS-2695: --- [~vi...@twitter.com] should we mark as 'won't fix' for now? > Add master flag to enable/disable oversubscription > -- > > Key: MESOS-2695 > URL: https://issues.apache.org/jira/browse/MESOS-2695 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone > Labels: twitter > > This flag lets an operator control cluster level oversubscription. > The master should send revocable offers to framework if this flag is enabled > and the framework opts in to receive them. > Master should ignore revocable resources from slaves if the flag is disabled. > Need tests for all these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4427) Ensure ip_address in state.json (from NetworkInfo) is valid
Sargun Dhillon created MESOS-4427: - Summary: Ensure ip_address in state.json (from NetworkInfo) is valid Key: MESOS-4427 URL: https://issues.apache.org/jira/browse/MESOS-4427 Project: Mesos Issue Type: Bug Reporter: Sargun Dhillon Priority: Critical We have seen a master state.json where the state.json has a field that looks similar to: ``` ---REDACTED--- { "container": { "docker": { "force_pull_image": false, "image": "REDACTED", "network": "HOST", "privileged": false }, "type": "DOCKER" }, "executor_id": "", "framework_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-", "id": "ping-as-a-service.c2d1c17a-be22-11e5-b053-002590e56e25", "name": "ping-as-a-service", "resources": { "cpus": 0.1, "disk": 0, "mem": 64, "ports": "[7907-7907]" }, "slave_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-S76043", "state": "TASK_RUNNING", "statuses": [ { "container_status": { "network_infos": [ { "ip_address": "", "ip_addresses": [ { "ip_address": "" } ] } ] }, "labels": [ { "key": "Docker.NetworkSettings.IPAddress", "value": "" } ], "state": "TASK_RUNNING", "timestamp": 1453149270.95511 } ] } ---REDACTED--- ``` This is invalid, and it mesos-core should filter it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4427) Ensure ip_address in state.json (from NetworkInfo) is valid
[ https://issues.apache.org/jira/browse/MESOS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sargun Dhillon updated MESOS-4427: -- Description: We have seen a master state.json where the state.json has a field that looks similar to: ---REDACTED--- {code:json} { "container": { "docker": { "force_pull_image": false, "image": "REDACTED", "network": "HOST", "privileged": false }, "type": "DOCKER" }, "executor_id": "", "framework_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-", "id": "ping-as-a-service.c2d1c17a-be22-11e5-b053-002590e56e25", "name": "ping-as-a-service", "resources": { "cpus": 0.1, "disk": 0, "mem": 64, "ports": "[7907-7907]" }, "slave_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-S76043", "state": "TASK_RUNNING", "statuses": [ { "container_status": { "network_infos": [ { "ip_address": "", "ip_addresses": [ { "ip_address": "" } ] } ] }, "labels": [ { "key": "Docker.NetworkSettings.IPAddress", "value": "" } ], "state": "TASK_RUNNING", "timestamp": 1453149270.95511 } ] } {code} ---REDACTED--- This is invalid, and it mesos-core should filter it. was: We have seen a master state.json where the state.json has a field that looks similar to: ``` ---REDACTED--- { "container": { "docker": { "force_pull_image": false, "image": "REDACTED", "network": "HOST", "privileged": false }, "type": "DOCKER" }, "executor_id": "", "framework_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-", "id": "ping-as-a-service.c2d1c17a-be22-11e5-b053-002590e56e25", "name": "ping-as-a-service", "resources": { "cpus": 0.1, "disk": 0, "mem": 64, "ports": "[7907-7907]" }, "slave_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-S76043", "state": "TASK_RUNNING", "statuses": [ { "container_status": { "network_infos": [ { "ip_address": "", "ip_addresses": [ { "ip_address": "" } ] } ] }, "labels": [ { "key": "Docker.NetworkSettings.IPAddress", "value": "" } ], "state": "TASK_RUNNING", "timestamp": 1453149270.95511 } ] } ---REDACTED--- ``` This is invalid, and it mesos-core should filter it. > Ensure ip_address in state.json (from NetworkInfo) is valid > --- > > Key: MESOS-4427 > URL: https://issues.apache.org/jira/browse/MESOS-4427 > Project: Mesos > Issue Type: Bug >Reporter: Sargun Dhillon >Priority: Critical > Labels: mesosphere > > We have seen a master state.json where the state.json has a field that looks > similar to: > ---REDACTED--- > {code:json} > { > "container": { > "docker": { > "force_pull_image": false, > "image": "REDACTED", > "network": "HOST", > "privileged": false > }, > "type": "DOCKER" > }, > "executor_id": "", > "framework_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-", > "id": "ping-as-a-service.c2d1c17a-be22-11e5-b053-002590e56e25", > "name": "ping-as-a-service", > "resources": { > "cpus": 0.1, > "disk": 0, > "mem": 64, > "ports": "[7907-7907]" > }, > "slave_id": "9f0e50ea-54b0-44e3-a451-c69e0c1a58fb-S76043", > "state": "TASK_RUNNING", > "statuses": [ > { > "container_status": { > "network_infos": [ > { > "ip_address": "", > "ip_addresses": [ > { > "ip_address": "" > } > ] > } > ] > }, > "labels": [ > { > "key": "Docker.NetworkSettings.IPAddress", > "value": "" > } > ], > "state": "TASK_RUNNING", >
[jira] [Created] (MESOS-4428) Get only running tasks from mesos api
Tymofii created MESOS-4428: -- Summary: Get only running tasks from mesos api Key: MESOS-4428 URL: https://issues.apache.org/jira/browse/MESOS-4428 Project: Mesos Issue Type: Wish Components: json api Reporter: Tymofii Priority: Trivial We're using /state.json for service discovery in our environment. Our mesas-consul bridge reads /state.json from current leader and then parses it to register all running tasks in Consul. When using Spark framework it generates a lot of tasks, which all goes to the /state.json file as finished. The file itself can grow very large in couple days of work. Is there any way to get only running tasks from mesos leader right now? If there's not, can you add such possibility? Or maybe you'll suggest using different approach for service discovery? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4411) Traverse all roles for quota allocation
[ https://issues.apache.org/jira/browse/MESOS-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-4411: --- Sprint: Mesosphere Sprint 27 > Traverse all roles for quota allocation > --- > > Key: MESOS-4411 > URL: https://issues.apache.org/jira/browse/MESOS-4411 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Critical > Labels: mesosphere > > There might be a bug in how resources are allocated to multiple quota'ed > roles if one role's quota is met. We need to investigate this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2646) Update Master to send revocable resources in separate offers
[ https://issues.apache.org/jira/browse/MESOS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106625#comment-15106625 ] Yongqiao Wang commented on MESOS-2646: -- [~nnielsen], yes, I think I have. and do you mean I can start to fix this ticker now? > Update Master to send revocable resources in separate offers > > > Key: MESOS-2646 > URL: https://issues.apache.org/jira/browse/MESOS-2646 > Project: Mesos > Issue Type: Improvement >Reporter: Vinod Kone >Assignee: Yongqiao Wang > Labels: twitter > Attachments: code-diff.txt > > > Master will send separate offers for revocable and non-revocable/regular > resources. This allows master to rescind revocable offers (e.g, when a new > oversubscribed resources estimate comes from the slave) without impacting > regular offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-2296: -- Issue Type: Epic (was: Task) > Implement the Events stream on slave for Call endpoint > -- > > Key: MESOS-2296 > URL: https://issues.apache.org/jira/browse/MESOS-2296 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone >Assignee: Anand Mazumdar > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4255) Add mechanism for testing recovery of HTTP based executors
[ https://issues.apache.org/jira/browse/MESOS-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-4255: -- Sprint: (was: Mesosphere Sprint 26) > Add mechanism for testing recovery of HTTP based executors > -- > > Key: MESOS-4255 > URL: https://issues.apache.org/jira/browse/MESOS-4255 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > > Currently, the slave process generates a process ID every time it is > initialized via {{process::ID::generate}} function call. This is a problem > for testing HTTP executors as it can't retry if there is a disconnection > after an agent restart since the prefix is incremented. > {code} > Agent PID before: > slave(1)@127.0.0.1:43915 > Agent PID after restart: > slave(2)@127.0.0.1:43915 > {code} > There are a couple of ways to fix this: > - Add a constructor to {{Slave}} exclusively for testing that passes on a > fixed {{ID}} instead of relying on {{ID::generate}}. > - Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as > the URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate > to {{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to > (1), we can default to the last known active ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4434) Install 3rdparty package boost, glog, protobuf and picojson when installing Mesos
Kapil Arya created MESOS-4434: - Summary: Install 3rdparty package boost, glog, protobuf and picojson when installing Mesos Key: MESOS-4434 URL: https://issues.apache.org/jira/browse/MESOS-4434 Project: Mesos Issue Type: Bug Components: build, modules Reporter: Kapil Arya Mesos modules depend on having these packages installed with the exact version as Mesos was compiled with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4410) Introduce protobuf for quota set request.
[ https://issues.apache.org/jira/browse/MESOS-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-4410: Priority: Blocker (was: Major) > Introduce protobuf for quota set request. > - > > Key: MESOS-4410 > URL: https://issues.apache.org/jira/browse/MESOS-4410 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Blocker > Labels: mesosphere > > To document quota request JSON schema and simplify request processing, > introduce a {{QuotaRequest}} protobuf wrapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4185) Revisit the "System Requirements" for all systems in the "Getting Started" guide
[ https://issues.apache.org/jira/browse/MESOS-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-4185: --- Assignee: (was: Kevin Klues) > Revisit the "System Requirements" for all systems in the "Getting Started" > guide > > > Key: MESOS-4185 > URL: https://issues.apache.org/jira/browse/MESOS-4185 > Project: Mesos > Issue Type: Documentation >Reporter: Kevin Klues >Priority: Minor > > The "System Requirements" section of our "Getting Started" guide needs an > overhaul. Much of the information is outdated, and could likely be distilled > down to a simpler set of dependencies (especially for Centos 6.6 and Centos > 7.1). We should take a good hard look at these and see if all of the > dependencies listed are necessary anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4429) Add oversubscription benchmark/stress/test framework
[ https://issues.apache.org/jira/browse/MESOS-4429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107556#comment-15107556 ] Bartek Plotka commented on MESOS-4429: -- Let's start a `doc` to define the scope and input/output in details: https://docs.google.com/document/d/1VyjbSXyvxyS95asFjzV5A19B_vcIAiUqgz3y_pMvtPs/edit?usp=sharing (: Some notes on Serenity framework, [~nnielsen] mentioned: As you can see it can be controlled via JSON file (quite similar to marathon's REST API input https://mesosphere.github.io/marathon/docs/rest-api.html#post-v2-apps). IMO it gives useful ability to store previous `tasks` and build certain reusable scenarios. One of the interesting features in this framework is ability to stress slave with different kind of tasks using logic similar to `shares`. For instance you can specify that tasks of type A will be run 3 times more often then tasks of type B (type A task shares = 3 & type B task shares = 1). As a result the framework will be spawning as many as possible tasks of both types in such defined "distribution". It also support targeting the tasks to particular the host. It could be a good starting point for us. > Add oversubscription benchmark/stress/test framework > > > Key: MESOS-4429 > URL: https://issues.apache.org/jira/browse/MESOS-4429 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen > > To evaluate the function and quality of oversubscription modules, we could > ship a test framework which can: > 1) Launch on oversubscribed and non-oversubscribed resources in a controlled > manner. For example, register as two different frameworks and see that > resources from slack resources of one framework can be used by the other. > 2) Measure time to react for different scenarios. For example, measure the > time it takes from slack appearing on a slave to the offer being issued with > revocable resources. The time to react for changing usage patterns e.g. time > to reclaim oversubscribed resources when regular tasks need them back. > 3) Count the number of offer rescind, preemptions, etc. to deem the stability > of the policy. > 4) Be able to measure % extra work being able to run. > 5) Work across different resource dimensions as cpu time, memory, network, > caches. > [~Bartek Plotka] has been working on something similar for Serenity in > https://github.com/mesosphere/serenity/tree/master/src/framework which we can > reuse as a base. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-920) Set GLOG_drop_log_memory=false in environment prior to logging initialization.
[ https://issues.apache.org/jira/browse/MESOS-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya reassigned MESOS-920: Assignee: Kapil Arya > Set GLOG_drop_log_memory=false in environment prior to logging initialization. > -- > > Key: MESOS-920 > URL: https://issues.apache.org/jira/browse/MESOS-920 > Project: Mesos > Issue Type: Improvement > Components: technical debt >Affects Versions: 0.15.0, 0.16.0 >Reporter: Benjamin Mahler >Assignee: Kapil Arya > > We've observed issues where the masters are slow to respond. Two perf traces > collected while the masters were slow to respond: > {noformat} > 25.84% [kernel][k] default_send_IPI_mask_sequence_phys > 20.44% [kernel][k] native_write_msr_safe > 4.54% [kernel][k] _raw_spin_lock > 2.95% libc-2.5.so [.] _int_malloc > 1.82% libc-2.5.so [.] malloc > 1.55% [kernel][k] apic_timer_interrupt > 1.36% libc-2.5.so [.] _int_free > {noformat} > {noformat} > 29.03% [kernel][k] default_send_IPI_mask_sequence_phys > 9.64% [kernel][k] _raw_spin_lock > 7.38% [kernel][k] native_write_msr_safe > 2.43% libc-2.5.so [.] _int_malloc > 2.05% libc-2.5.so [.] _int_free > 1.67% [kernel][k] apic_timer_interrupt > 1.58% libc-2.5.so [.] malloc > {noformat} > These have been found to be attributed to the posix_fadvise calls made by > glog. We can disable these via the environment: > {noformat} > GLOG_DEFINE_bool(drop_log_memory, true, "Drop in-memory buffers of log > contents. " > "Logs can grow very quickly and they are rarely read before > they " > "need to be evicted from memory. Instead, drop them from > memory " > "as soon as they are flushed to disk."); > {noformat} > {code} > if (FLAGS_drop_log_memory) { > if (file_length_ >= logging::kPageSize) { > // don't evict the most recent page > uint32 len = file_length_ & ~(logging::kPageSize - 1); > posix_fadvise(fileno(file_), 0, len, POSIX_FADV_DONTNEED); > } > } > {code} > We should set GLOG_drop_log_memory=false prior to making our call to > google::InitGoogleLogging, to avoid others running into this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4429) Add oversubscription benchmark/stress/test framework
[ https://issues.apache.org/jira/browse/MESOS-4429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107556#comment-15107556 ] Bartek Plotka edited comment on MESOS-4429 at 1/19/16 10:10 PM: Let's start a `doc` to define the scope and input/output in details: https://docs.google.com/document/d/1VyjbSXyvxyS95asFjzV5A19B_vcIAiUqgz3y_pMvtPs/edit?usp=sharing (: Some notes on the Serenity framework which was mentioned by [~nnielsen]: As you can see it can be controlled via JSON file (quite similar to marathon's REST API input https://mesosphere.github.io/marathon/docs/rest-api.html#post-v2-apps). IMO it gives useful ability to store previous `tasks` and build certain reusable scenarios. One of the interesting features in this framework is ability to stress slave with different kind of tasks using logic similar to `shares`. For instance you can specify that tasks of type A will be run 3 times more often then tasks of type B (type A task shares = 3 & type B task shares = 1). As a result the framework will be spawning as many as possible tasks of both types in such defined "distribution". It also support targeting the tasks to particular the host. It could be a good starting point for us. was (Author: bartek plotka): Let's start a `doc` to define the scope and input/output in details: https://docs.google.com/document/d/1VyjbSXyvxyS95asFjzV5A19B_vcIAiUqgz3y_pMvtPs/edit?usp=sharing (: Some notes on Serenity framework, [~nnielsen] mentioned: As you can see it can be controlled via JSON file (quite similar to marathon's REST API input https://mesosphere.github.io/marathon/docs/rest-api.html#post-v2-apps). IMO it gives useful ability to store previous `tasks` and build certain reusable scenarios. One of the interesting features in this framework is ability to stress slave with different kind of tasks using logic similar to `shares`. For instance you can specify that tasks of type A will be run 3 times more often then tasks of type B (type A task shares = 3 & type B task shares = 1). As a result the framework will be spawning as many as possible tasks of both types in such defined "distribution". It also support targeting the tasks to particular the host. It could be a good starting point for us. > Add oversubscription benchmark/stress/test framework > > > Key: MESOS-4429 > URL: https://issues.apache.org/jira/browse/MESOS-4429 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen > > To evaluate the function and quality of oversubscription modules, we could > ship a test framework which can: > 1) Launch on oversubscribed and non-oversubscribed resources in a controlled > manner. For example, register as two different frameworks and see that > resources from slack resources of one framework can be used by the other. > 2) Measure time to react for different scenarios. For example, measure the > time it takes from slack appearing on a slave to the offer being issued with > revocable resources. The time to react for changing usage patterns e.g. time > to reclaim oversubscribed resources when regular tasks need them back. > 3) Count the number of offer rescind, preemptions, etc. to deem the stability > of the policy. > 4) Be able to measure % extra work being able to run. > 5) Work across different resource dimensions as cpu time, memory, network, > caches. > [~Bartek Plotka] has been working on something similar for Serenity in > https://github.com/mesosphere/serenity/tree/master/src/framework which we can > reuse as a base. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4429) Add oversubscription benchmark/stress/test framework
[ https://issues.apache.org/jira/browse/MESOS-4429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107556#comment-15107556 ] Bartek Plotka edited comment on MESOS-4429 at 1/19/16 10:10 PM: Let's start a `doc` to define the scope and input/output in details: https://docs.google.com/document/d/1VyjbSXyvxyS95asFjzV5A19B_vcIAiUqgz3y_pMvtPs/edit?usp=sharing (: Some notes on the mentioned Serenity framework: As you can see it can be controlled via JSON file (quite similar to marathon's REST API input https://mesosphere.github.io/marathon/docs/rest-api.html#post-v2-apps). IMO it gives useful ability to store previous `tasks` and build certain reusable scenarios. One of the interesting features in this framework is ability to stress slave with different kind of tasks using logic similar to `shares`. For instance you can specify that tasks of type A will be run 3 times more often then tasks of type B (type A task shares = 3 & type B task shares = 1). As a result the framework will be spawning as many as possible tasks of both types in such defined "distribution". It also support targeting the tasks to particular the host. It could be a good starting point for us. was (Author: bartek plotka): Let's start a `doc` to define the scope and input/output in details: https://docs.google.com/document/d/1VyjbSXyvxyS95asFjzV5A19B_vcIAiUqgz3y_pMvtPs/edit?usp=sharing (: Some notes on the Serenity framework which was mentioned by [~nnielsen]: As you can see it can be controlled via JSON file (quite similar to marathon's REST API input https://mesosphere.github.io/marathon/docs/rest-api.html#post-v2-apps). IMO it gives useful ability to store previous `tasks` and build certain reusable scenarios. One of the interesting features in this framework is ability to stress slave with different kind of tasks using logic similar to `shares`. For instance you can specify that tasks of type A will be run 3 times more often then tasks of type B (type A task shares = 3 & type B task shares = 1). As a result the framework will be spawning as many as possible tasks of both types in such defined "distribution". It also support targeting the tasks to particular the host. It could be a good starting point for us. > Add oversubscription benchmark/stress/test framework > > > Key: MESOS-4429 > URL: https://issues.apache.org/jira/browse/MESOS-4429 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen > > To evaluate the function and quality of oversubscription modules, we could > ship a test framework which can: > 1) Launch on oversubscribed and non-oversubscribed resources in a controlled > manner. For example, register as two different frameworks and see that > resources from slack resources of one framework can be used by the other. > 2) Measure time to react for different scenarios. For example, measure the > time it takes from slack appearing on a slave to the offer being issued with > revocable resources. The time to react for changing usage patterns e.g. time > to reclaim oversubscribed resources when regular tasks need them back. > 3) Count the number of offer rescind, preemptions, etc. to deem the stability > of the policy. > 4) Be able to measure % extra work being able to run. > 5) Work across different resource dimensions as cpu time, memory, network, > caches. > [~Bartek Plotka] has been working on something similar for Serenity in > https://github.com/mesosphere/serenity/tree/master/src/framework which we can > reuse as a base. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4425) Introduce filtering test abstractions for HTTP events to libprocess
[ https://issues.apache.org/jira/browse/MESOS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-4425: -- Sprint: Mesosphere Sprint 27 Story Points: 3 > Introduce filtering test abstractions for HTTP events to libprocess > --- > > Key: MESOS-4425 > URL: https://issues.apache.org/jira/browse/MESOS-4425 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > > We need a test abstraction for {{HttpEvent}} similar to the already existing > one's for {{DispatchEvent}}, {{MessageEvent}} in libprocess. > The abstraction can look similar in semantics to the already existing > {{FUTURE_DISPATCH}}/{{FUTURE_MESSAGE}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4429) Add oversubscription benchmark/stress/test framework
[ https://issues.apache.org/jira/browse/MESOS-4429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-4429: -- Description: To evaluate the function and quality of oversubscription modules, we could ship a test framework which can: 1) Launch on oversubscribed and non-oversubscribed resources in a controlled manner. For example, register as two different frameworks and see that resources from slack resources of one framework can be used by the other. 2) Measure time to react for different scenarios. For example, measure the time it takes from slack appearing on a slave to the offer being issued with revocable resources. The time to react for changing usage patterns e.g. time to reclaim oversubscribed resources when regular tasks need them back. 3) Count the number of offer rescind, preemptions, etc. to deem the stability of the policy. 4) Be able to measure % extra work being able to run. 5) Work across different resource dimensions as cpu time, memory, network, caches. [~Bartek Plotka] has been working on something similar for Serenity in https://github.com/mesosphere/serenity/tree/master/src/framework which we can reuse as a base. was: To evaluate the function and quality of oversubscription modules, we could ship a test framework which can: 1) Launch on oversubscribed and non-oversubscribed resources in a controlled manner. For example, register as two different frameworks and see that resources from slack resources of one framework can be used by the other. 2) Measure time to react for different scenarios. For example, measure the time it takes from slack appearing on a slave to the offer being issued with revocable resources. The time to react for changing usage patterns e.g. time to reclaim oversubscribed resources when regular tasks need them back. 3) Count the number of offer rescind, preemptions, etc. to deem the stability of the policy. 4) Be able to measure % extra work being able to run. 5) Work across different resource dimensions as cpu time, memory, network, caches. [~Bartek Plotka] has been working on something similar for Serenity in https://github.com/mesosphere/serenity/tree/master/src/framework which we can reuse as a base. > Add oversubscription benchmark/stress/test framework > > > Key: MESOS-4429 > URL: https://issues.apache.org/jira/browse/MESOS-4429 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen > > To evaluate the function and quality of oversubscription modules, we could > ship a test framework which can: > 1) Launch on oversubscribed and non-oversubscribed resources in a controlled > manner. For example, register as two different frameworks and see that > resources from slack resources of one framework can be used by the other. > 2) Measure time to react for different scenarios. For example, measure the > time it takes from slack appearing on a slave to the offer being issued with > revocable resources. The time to react for changing usage patterns e.g. time > to reclaim oversubscribed resources when regular tasks need them back. > 3) Count the number of offer rescind, preemptions, etc. to deem the stability > of the policy. > 4) Be able to measure % extra work being able to run. > 5) Work across different resource dimensions as cpu time, memory, network, > caches. > [~Bartek Plotka] has been working on something similar for Serenity in > https://github.com/mesosphere/serenity/tree/master/src/framework which we can > reuse as a base. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4173) HealthCheckTest.CheckCommandTimeout is slow
[ https://issues.apache.org/jira/browse/MESOS-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107354#comment-15107354 ] Isabel Jimenez commented on MESOS-4173: --- commit f139333db8264f25173c88e5a3f0db76680f3c52 Author: Timothy ChenDate: Wed Jan 13 13:23:50 2016 -0800 Reduced HealthCheckTest.CheckCommandTimeout test duration. Review: https://reviews.apache.org/r/40956/ > HealthCheckTest.CheckCommandTimeout is slow > --- > > Key: MESOS-4173 > URL: https://issues.apache.org/jira/browse/MESOS-4173 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{HealthCheckTest.CheckCommandTimeout}} test takes more than {{15s}}! to > finish on my Mac OS 10.10.4: > {code} > HealthCheckTest.CheckCommandTimeout (15483 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2296: -- Issue Type: Task (was: Epic) > Implement the Events stream on slave for Call endpoint > -- > > Key: MESOS-2296 > URL: https://issues.apache.org/jira/browse/MESOS-2296 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Anand Mazumdar > Labels: mesosphere > Fix For: 0.27.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4005) Support workdir runtime configuration from image
[ https://issues.apache.org/jira/browse/MESOS-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107064#comment-15107064 ] Gilbert Song commented on MESOS-4005: - Used to be blocked by docker v1 parse using protobuf parse. Fixed. > Support workdir runtime configuration from image > - > > Key: MESOS-4005 > URL: https://issues.apache.org/jira/browse/MESOS-4005 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Timothy Chen >Assignee: Gilbert Song > Labels: mesosphere, unified-containerizer-mvp > > We need to support workdir runtime configuration returned from image such as > Dockerfile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4369) Enhance DockerExecuter to support Docker's user-defined networks
[ https://issues.apache.org/jira/browse/MESOS-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezra Silvera updated MESOS-4369: Summary: Enhance DockerExecuter to support Docker's user-defined networks (was: Enhance DockContainerizer to support Docker network created with Docker CLI) > Enhance DockerExecuter to support Docker's user-defined networks > > > Key: MESOS-4369 > URL: https://issues.apache.org/jira/browse/MESOS-4369 > Project: Mesos > Issue Type: Improvement > Components: docker >Reporter: Qian Zhang >Assignee: Ezra Silvera > > Currently DockerContainerizer supports the following network options which > are Docker built-in networks: > {code} > message DockerInfo { > ... > // Network options. > enum Network { > HOST = 1; > BRIDGE = 2; > NONE = 3; > } > ... > {code} > However, since docker 1.9, Docker now supports user-defined networks (both > local and overlays) - e.g., {{docker network create --driver bridge > my-network}},. The user can then create containers that need to be attached > to these networks e.g., {{docker run --net=my-network}}, > We need to enhance DockerExecuter to support such network option so that the > Docker container that can connect into such network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4431) Sharing of persistent volumes via reference counting
Anindya Sinha created MESOS-4431: Summary: Sharing of persistent volumes via reference counting Key: MESOS-4431 URL: https://issues.apache.org/jira/browse/MESOS-4431 Project: Mesos Issue Type: Improvement Components: general Affects Versions: 0.25.0 Reporter: Anindya Sinha Assignee: Anindya Sinha Add capability for specific resources to be shared amongst tasks within or across frameworks/roles. Enable this functionality for persistent volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4432) Condense (redundant) log messages related to task launch/status/finish
Kapil Arya created MESOS-4432: - Summary: Condense (redundant) log messages related to task launch/status/finish Key: MESOS-4432 URL: https://issues.apache.org/jira/browse/MESOS-4432 Project: Mesos Issue Type: Bug Reporter: Kapil Arya As can be seen from the following snippet, there were about "25" different log entries for a task from launch to finish. This seems a bit too much. {code} $ grep " 7062 " /run/log/mesos/mesos-master.INFO I0113 23:42:39.464856 15109 master.hpp:176] Adding task 7062 with resources cpus(*):0.008 on slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 (10.0.1.112) I0113 23:42:39.465308 15109 master.cpp:3245] Launching task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- (No Executor Framework) at scheduler-009f25ee-1afc-4c20-88c7-d85c46d4da41@10.0.4.15:35526 with resources cpus(*):0.008 on slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 at slave(1)@10.0.1.112:5051 (10.0.1.112) I0113 23:43:04.300138 15110 master.cpp:4414] Status update TASK_RUNNING (UUID: 174415c6-cf82-400a-90fc-31d7dfbf4fdd) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- from slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 at slave(1)@10.0.1.112:5051 (10.0.1.112) I0113 23:43:04.300900 15110 master.cpp:4462] Forwarding status update TASK_RUNNING (UUID: 174415c6-cf82-400a-90fc-31d7dfbf4fdd) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- I0113 23:43:04.301697 15110 master.cpp:6066] Updating the state of task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- (latest state: TASK_RUNNING, status update state: TASK_RUNNING) I0113 23:43:17.932242 15110 master.cpp:3571] Processing ACKNOWLEDGE call 174415c6-cf82-400a-90fc-31d7dfbf4fdd for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- (No Executor Framework) at scheduler-009f25ee-1afc-4c20-88c7-d85c46d4da41@10.0.4.15:35526 on slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 I0113 23:43:29.625159 15110 master.cpp:4414] Status update TASK_RUNNING (UUID: 174415c6-cf82-400a-90fc-31d7dfbf4fdd) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- from slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 at slave(1)@10.0.1.112:5051 (10.0.1.112) I0113 23:43:29.626286 15110 master.cpp:4462] Forwarding status update TASK_RUNNING (UUID: 174415c6-cf82-400a-90fc-31d7dfbf4fdd) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- I0113 23:43:29.627462 15110 master.cpp:6066] Updating the state of task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- (latest state: TASK_RUNNING, status update state: TASK_RUNNING) I0113 23:44:00.408326 15110 master.cpp:3571] Processing ACKNOWLEDGE call 174415c6-cf82-400a-90fc-31d7dfbf4fdd for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- (No Executor Framework) at scheduler-009f25ee-1afc-4c20-88c7-d85c46d4da41@10.0.4.15:35526 on slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 I0113 23:46:51.557840 15109 master.cpp:4414] Status update TASK_FINISHED (UUID: 33941ab4-117f-4f7c-92eb-19717298bd20) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- from slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 at slave(1)@10.0.1.112:5051 (10.0.1.112) I0113 23:46:51.562408 15109 master.cpp:4462] Forwarding status update TASK_FINISHED (UUID: 33941ab4-117f-4f7c-92eb-19717298bd20) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- I0113 23:46:51.564661 15109 master.cpp:6066] Updating the state of task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- (latest state: TASK_FINISHED, status update state: TASK_FINISHED) I0113 23:48:29.797819 15109 master.cpp:4414] Status update TASK_FINISHED (UUID: 33941ab4-117f-4f7c-92eb-19717298bd20) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- from slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 at slave(1)@10.0.1.112:5051 (10.0.1.112) I0113 23:48:29.803653 15109 master.cpp:4462] Forwarding status update TASK_FINISHED (UUID: 33941ab4-117f-4f7c-92eb-19717298bd20) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- I0113 23:48:29.806558 15109 master.cpp:6066] Updating the state of task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- (latest state: TASK_FINISHED, status update state: TASK_FINISHED) I0113 23:50:39.551422 15109 master.cpp:4414] Status update TASK_FINISHED (UUID: 33941ab4-117f-4f7c-92eb-19717298bd20) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- from slave 87f9cced-992e-4d35-9b9b-5a89b9563bba-S1 at slave(1)@10.0.1.112:5051 (10.0.1.112) I0113 23:50:39.558599 15109 master.cpp:4462] Forwarding status update TASK_FINISHED (UUID: 33941ab4-117f-4f7c-92eb-19717298bd20) for task 7062 of framework afb66c28-eddf-4e4e-8b7a-fe822a04eef8- I0113
[jira] [Created] (MESOS-4433) Implement a callback testing interface for the Executor Library
Anand Mazumdar created MESOS-4433: - Summary: Implement a callback testing interface for the Executor Library Key: MESOS-4433 URL: https://issues.apache.org/jira/browse/MESOS-4433 Project: Mesos Issue Type: Task Reporter: Anand Mazumdar Assignee: Anand Mazumdar Currently, we do not have a mocking based callback interface for the executor library. This should look similar to the ongoing work for MESOS-3339 i.e. the corresponding issue for the scheduler library. The interface should allow us to set expectations like we do for the driver. An example: {code} EXPECT_CALL(executor, connected()) .Times(1) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4369) Enhance DockerExecuter to support Docker's user-defined networks
[ https://issues.apache.org/jira/browse/MESOS-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107184#comment-15107184 ] Ezra Silvera commented on MESOS-4369: - See code in https://reviews.apache.org/r/42516/ > Enhance DockerExecuter to support Docker's user-defined networks > > > Key: MESOS-4369 > URL: https://issues.apache.org/jira/browse/MESOS-4369 > Project: Mesos > Issue Type: Improvement > Components: docker >Reporter: Qian Zhang >Assignee: Ezra Silvera > > Currently DockerContainerizer supports the following network options which > are Docker built-in networks: > {code} > message DockerInfo { > ... > // Network options. > enum Network { > HOST = 1; > BRIDGE = 2; > NONE = 3; > } > ... > {code} > However, since docker 1.9, Docker now supports user-defined networks (both > local and overlays) - e.g., {{docker network create --driver bridge > my-network}},. The user can then create containers that need to be attached > to these networks e.g., {{docker run --net=my-network}}, > We need to enhance DockerExecuter to support such network option so that the > Docker container that can connect into such network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4369) Enhance DockerExecuter to support Docker's user-defined networks
[ https://issues.apache.org/jira/browse/MESOS-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107184#comment-15107184 ] Ezra Silvera edited comment on MESOS-4369 at 1/19/16 6:51 PM: -- See details in https://reviews.apache.org/r/42516/ was (Author: ezrasilvera): See code in https://reviews.apache.org/r/42516/ > Enhance DockerExecuter to support Docker's user-defined networks > > > Key: MESOS-4369 > URL: https://issues.apache.org/jira/browse/MESOS-4369 > Project: Mesos > Issue Type: Improvement > Components: docker >Reporter: Qian Zhang >Assignee: Ezra Silvera > > Currently DockerContainerizer supports the following network options which > are Docker built-in networks: > {code} > message DockerInfo { > ... > // Network options. > enum Network { > HOST = 1; > BRIDGE = 2; > NONE = 3; > } > ... > {code} > However, since docker 1.9, Docker now supports user-defined networks (both > local and overlays) - e.g., {{docker network create --driver bridge > my-network}},. The user can then create containers that need to be attached > to these networks e.g., {{docker run --net=my-network}}, > We need to enhance DockerExecuter to support such network option so that the > Docker container that can connect into such network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3889) Modify Oversubscription documentation to explicitly forbid the QoS Controller from killing executors running on optimistically offered resources.
[ https://issues.apache.org/jira/browse/MESOS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-3889: - Description: The oversubcription documentation currently assumes that oversubscribed resources ({{USAGE_SLACK}}) are the only type of revocable resources. Optimistic offers will add a second type of revocable resource ({{ALLOCATION_SLACK}}) that should not be acted upon by oversubscription components. For example, the [oversubscription doc|http://mesos.apache.org/documentation/latest/oversubscription/] says the following: {quote} NOTE: If any resource used by a task or executor is revocable, the whole container is treated as a revocable container and can therefore be killed or throttled by the QoS Controller. {quote} which we may amend to something like: {quote} NOTE: If any resource used by a task or executor is revocable usage slack, the whole container is treated as an oversubscribed container and can therefore be killed or throttled by the QoS Controller. {quote} > Modify Oversubscription documentation to explicitly forbid the QoS Controller > from killing executors running on optimistically offered resources. > - > > Key: MESOS-3889 > URL: https://issues.apache.org/jira/browse/MESOS-3889 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Klaus Ma > Labels: mesosphere > > The oversubcription documentation currently assumes that oversubscribed > resources ({{USAGE_SLACK}}) are the only type of revocable resources. > Optimistic offers will add a second type of revocable resource > ({{ALLOCATION_SLACK}}) that should not be acted upon by oversubscription > components. > For example, the [oversubscription > doc|http://mesos.apache.org/documentation/latest/oversubscription/] says the > following: > {quote} > NOTE: If any resource used by a task or executor is revocable, the whole > container is treated as a revocable container and can therefore be killed or > throttled by the QoS Controller. > {quote} > which we may amend to something like: > {quote} > NOTE: If any resource used by a task or executor is revocable usage slack, > the whole container is treated as an oversubscribed container and can > therefore be killed or throttled by the QoS Controller. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4111) Provide a means for libprocess users to exit while ensuring messages are flushed.
[ https://issues.apache.org/jira/browse/MESOS-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107194#comment-15107194 ] Joseph Wu commented on MESOS-4111: -- {{process::finalize}} only waits for the event queue on all processes to finish. (It does this by putting a {{TerminateEvent}} at the back of the queue.) Writes to a socket (or any FD), do not have events. So you'd need to augment {{process::finalize}} to clean up and flush sockets too. This [patch|https://reviews.apache.org/r/40266] is part of a chain to do something similar. > Provide a means for libprocess users to exit while ensuring messages are > flushed. > - > > Key: MESOS-4111 > URL: https://issues.apache.org/jira/browse/MESOS-4111 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Benjamin Mahler >Priority: Minor > > Currently after a {{send}} there is no way to ensure that the message is > flushed on the socket before terminating. We work around this by inserting > {{os::sleep}} calls (see MESOS-243, MESOS-4106). > There are a number of approaches to this: > (1) Return a Future from send that notifies when the message is flushed from > the system. > (2) Call process::finalize before exiting. This would require that > process::finalize flushes all of the outstanding data on any active sockets, > which may block. > Regardless of the approach, there needs to be a timer if we want to guarantee > termination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-4390) Shared Volumes Design Doc
[ https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anindya Sinha updated MESOS-4390: - Comment: was deleted (was: Design document updated for review... https://docs.google.com/document/d/18O4SH3H4BQriW6CTrg3TlQTiVC-rBRsMePhz99Y_bss/edit) > Shared Volumes Design Doc > - > > Key: MESOS-4390 > URL: https://issues.apache.org/jira/browse/MESOS-4390 > Project: Mesos > Issue Type: Task >Reporter: Adam B >Assignee: Anindya Sinha > Labels: mesosphere > > Review & Approve design doc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4436) Propose design doc for fixed-point scalar resources
Neil Conway created MESOS-4436: -- Summary: Propose design doc for fixed-point scalar resources Key: MESOS-4436 URL: https://issues.apache.org/jira/browse/MESOS-4436 Project: Mesos Issue Type: Task Components: general Reporter: Neil Conway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4438) Add 'dependency' message to 'AppcImageManifest' protobuf.
Jojy Varghese created MESOS-4438: Summary: Add 'dependency' message to 'AppcImageManifest' protobuf. Key: MESOS-4438 URL: https://issues.apache.org/jira/browse/MESOS-4438 Project: Mesos Issue Type: Task Components: containerization Reporter: Jojy Varghese Assignee: Jojy Varghese AppcImageManifest protobuf currently lacks 'dependencies' which is necessary for image discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess
[ https://issues.apache.org/jira/browse/MESOS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-3570: -- Sprint: Mesosphere Sprint 27 Story Points: 3 > Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess > > > Key: MESOS-3570 > URL: https://issues.apache.org/jira/browse/MESOS-3570 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Vinod Kone > Labels: mesosphere, newbie > > Currently, the scheduler library sends calls in order by chaining them and > sending them only when it has received a response for the earlier call. This > was done because there was no HTTP Pipelining abstraction in Libprocess > {{process::post}}. > However once {{MESOS-3332}} is resolved, we should be now able to use the new > abstraction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess
[ https://issues.apache.org/jira/browse/MESOS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone reassigned MESOS-3570: - Assignee: Vinod Kone > Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess > > > Key: MESOS-3570 > URL: https://issues.apache.org/jira/browse/MESOS-3570 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Vinod Kone > Labels: mesosphere, newbie > > Currently, the scheduler library sends calls in order by chaining them and > sending them only when it has received a response for the earlier call. This > was done because there was no HTTP Pipelining abstraction in Libprocess > {{process::post}}. > However once {{MESOS-3332}} is resolved, we should be now able to use the new > abstraction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4228) Use std::is_bind_expression to reroute the result of std::bind.
[ https://issues.apache.org/jira/browse/MESOS-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107697#comment-15107697 ] Michael Park commented on MESOS-4228: - {noformat} commit a5a42d5e861ee848c79370ed2408f8382ab1010a Author: Michael ParkDate: Tue Jan 5 18:07:26 2016 -0800 Used `std::is_bind_expression` to SFINAE correctly. Review: https://reviews.apache.org/r/41460 {noformat} > Use std::is_bind_expression to reroute the result of std::bind. > --- > > Key: MESOS-4228 > URL: https://issues.apache.org/jira/browse/MESOS-4228 > Project: Mesos > Issue Type: Task > Components: libprocess >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > The Standard (C++11 through 17) does not require {{std::bind}}'s function > call operator to SFINAE, and VS 2015's doesn't. {{std::is_bind_expression}} > can be used to manually reroute bind expressions to the 1-arg overload, where > (conveniently) the argument will be ignored if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4220) Introduce result_of with C++14 semantics to stout.
[ https://issues.apache.org/jira/browse/MESOS-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107700#comment-15107700 ] Michael Park commented on MESOS-4220: - {noformat} commit 1565096f2fba4e6ac83e6ee44a81e0290b8f7f58 Author: Michael ParkDate: Sat Dec 12 11:29:36 2015 -0500 Used SFINAE-friendly `result_of` in libprocess. Review: https://reviews.apache.org/r/41462 {noformat} {noformat} commit 576fa0ee11f81006950094d4e35d231e7cb11472 Author: Michael Park Date: Sat Dec 12 11:29:12 2015 -0500 Added SFINAE-friendly `result_of` in stout. Review: https://reviews.apache.org/r/41461 {noformat} > Introduce result_of with C++14 semantics to stout. > -- > > Key: MESOS-4220 > URL: https://issues.apache.org/jira/browse/MESOS-4220 > Project: Mesos > Issue Type: Task > Components: stout >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > The {{std::result_of}} in VS 2015 Update 1 implements C++11 semantics which > does not allow it to be used in SFINAE contexts. > Introduce a C++14 {{std::result_of}} into stout until we get to VS 2014 > Update 2, at which point we can switch back to simply using > {{std::result_of}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4221) Invoke _Deferred's implicit conversion operator explicitly.
[ https://issues.apache.org/jira/browse/MESOS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107699#comment-15107699 ] Michael Park commented on MESOS-4221: - {noformat} commit b15161eea964c196276b51ef24763fee2f409d57 Author: Michael ParkDate: Tue Dec 15 02:54:56 2015 + Invoked `_Deferred`'s `operator F()` explicitly. Review: https://reviews.apache.org/r/41459 {noformat} > Invoke _Deferred's implicit conversion operator explicitly. > --- > > Key: MESOS-4221 > URL: https://issues.apache.org/jira/browse/MESOS-4221 > Project: Mesos > Issue Type: Task > Components: libprocess >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > As of VS 2015 Update 1, MSVC implements C++11 semantics for > {{std::function}}'s {{Callable}} constructor which does not SFINAE. In the > short term, we call the implicit conversion operator from {{_Deferred}} to > {{std::function}} explicitly. > Going forward, I propose to make {{_Deferred}} callable which will bring us > to a state where {{process::defer}} is similar to {{std::bind}} in that the > objects returned from them are "implementation-defined" (i.e., {{_Deferred}} > and something like {{_Bind}}), and that they were both callable. {{Deferred}} > and {{std::function}} are similar in that they perform type-erasure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4191) Investigate switching to fixed point scalar resources
[ https://issues.apache.org/jira/browse/MESOS-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4191: --- Summary: Investigate switching to fixed point scalar resources (was: Design doc for fixed point resources) > Investigate switching to fixed point scalar resources > - > > Key: MESOS-4191 > URL: https://issues.apache.org/jira/browse/MESOS-4191 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere, resources > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4371) Enhance DockContainerizer to support Docker volume created with Docker CLI
[ https://issues.apache.org/jira/browse/MESOS-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang reassigned MESOS-4371: - Assignee: Qian Zhang > Enhance DockContainerizer to support Docker volume created with Docker CLI > -- > > Key: MESOS-4371 > URL: https://issues.apache.org/jira/browse/MESOS-4371 > Project: Mesos > Issue Type: Improvement > Components: docker, volumes >Reporter: Qian Zhang >Assignee: Qian Zhang > > In Docker, user can create a volume with Docker CLI, e.g., {{docker volume > create --name my-volume}}, we need to enhance DockerContainerizer to make the > Docker container launched by it can use such volume. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"
[ https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-4421: -- Assignee: Neil Conway > Document that /reserve, /create-volumes endpoints can return misleading > "success" > - > > Key: MESOS-4421 > URL: https://issues.apache.org/jira/browse/MESOS-4421 > Project: Mesos > Issue Type: Task > Components: documentation, master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: documentation, endpoint, mesosphere, persistent-volumes, > reservations > > The docs for the {{/reserve}} endpoint say: > {noformat} > 200 OK: Success (the requested resources have been reserved). > {noformat} > This is not true: the master returns {{200}} when the request has been > validated and a {{CheckpointResourcesMessage}} has been sent to the agent, > but the master does not attempt to verify that the message has been received > or that the agent successfully checkpointed. Same behavior applies to > {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}. > We should _either_: > 1. Accurately document what {{200}} return code means. > 2. Change the implementation to wait for the agent's next checkpoint to > succeed (and to include the effect of the operation) before returning success > to the HTTP client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3273) EventCall Test Framework is flaky
[ https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107747#comment-15107747 ] Vinod Kone commented on MESOS-3273: --- commit 147895b4bd6c421ac15db043b8d243c07e44fd7c Author: Vinod KoneDate: Tue Jan 19 16:05:59 2016 -0800 Temporarily disabled EventCallFramework test due to MESOS-3273. > EventCall Test Framework is flaky > - > > Key: MESOS-3273 > URL: https://issues.apache.org/jira/browse/MESOS-3273 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.0 > Environment: > https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull >Reporter: Vinod Kone >Assignee: Vinod Kone > Labels: flaky-test, mesosphere, tech-debt > Attachments: asan.log > > > Observed this on ASF CI. h/t [~haosd...@gmail.com] > Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master. > {code} > [ RUN ] ExamplesTest.EventCallFramework > Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx' > I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the > driver is aborted! > Shutting down > Sending SIGTERM to process tree at pid 26061 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26062 > Shutting down > Killing the following process trees: > [ > ] > Sending SIGTERM to process tree at pid 26063 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26098 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26099 > Killing the following process trees: > [ > ] > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on > 172.17.2.10:60249 for 16 cpus > I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR > I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0 > I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms > I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms > I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns > I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in > 8429ns > I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the > db in 4219ns > I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery > I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status > I0813 19:55:17.181970 26126 master.cpp:378] Master > 20150813-195517-167907756-60249-26100 (297daca2d01a) started on > 172.17.2.10:60249 > I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: > --acls="permissive: false > register_frameworks { > principals { > type: SOME > values: "test-principal" > } > roles { > type: SOME > values: "*" > } > } > run_tasks { > principals { > type: SOME > values: "test-principal" > } > users { > type: SOME > values: "mesos" > } > } > " --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" > --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_slave_ping_timeouts="5" --quiet="false" > --recovery_slave_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" > --registry_strict="false" --root_submissions="true" > --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" > --zk_session_timeout="10secs" > I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated > frameworks to register > I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated > slaves to register > I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for > authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' > W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials > file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. > It is recommended that your credentials file is NOT accessible by others. > I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received > a
[jira] [Updated] (MESOS-4437) Disable the test RegistryClientTest.BadTokenServerAddress.
[ https://issues.apache.org/jira/browse/MESOS-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jojy Varghese updated MESOS-4437: - Labels: mesosphere (was: ) > Disable the test RegistryClientTest.BadTokenServerAddress. > -- > > Key: MESOS-4437 > URL: https://issues.apache.org/jira/browse/MESOS-4437 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jojy Varghese >Assignee: Jojy Varghese > Labels: mesosphere > > As we are retiring registry client, disable this test which looks flaky. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4437) Disable the test RegistryClientTest.BadTokenServerAddress.
Jojy Varghese created MESOS-4437: Summary: Disable the test RegistryClientTest.BadTokenServerAddress. Key: MESOS-4437 URL: https://issues.apache.org/jira/browse/MESOS-4437 Project: Mesos Issue Type: Task Components: containerization Reporter: Jojy Varghese Assignee: Jojy Varghese As we are retiring registry client, disable this test which looks flaky. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4439) Fix appc CachedImage image validation
[ https://issues.apache.org/jira/browse/MESOS-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jojy Varghese updated MESOS-4439: - Description: Currently image validation is done assuming that the image's filename will have digest (SHA-512) information. This is not part of the spec (https://github.com/appc/spec/blob/master/spec/discovery.md). The spec specifies the tuple as unique identifier for discovering an image. was: Currently image validation is done assuming that the image's filename will have digest (SHA-512) information. This is not part of the spec (https://github.com/appc/spec/blob/master/spec/discovery.md). The spec specifies the tuple as unique identifier for discovering an image. > Fix appc CachedImage image validation > - > > Key: MESOS-4439 > URL: https://issues.apache.org/jira/browse/MESOS-4439 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jojy Varghese >Assignee: Jojy Varghese > Labels: mesosphere, unified-containerizer-mvp > > Currently image validation is done assuming that the image's filename will > have digest (SHA-512) information. This is not part of the spec > (https://github.com/appc/spec/blob/master/spec/discovery.md). > > The spec specifies the tuple as unique identifier > for discovering an image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4439) Fix appc CachedImage image validation
Jojy Varghese created MESOS-4439: Summary: Fix appc CachedImage image validation Key: MESOS-4439 URL: https://issues.apache.org/jira/browse/MESOS-4439 Project: Mesos Issue Type: Task Components: containerization Reporter: Jojy Varghese Assignee: Jojy Varghese Currently image validation is done assuming that the image's filename will have digest (SHA-512) information. This is not part of the spec (https://github.com/appc/spec/blob/master/spec/discovery.md). The spec specifies the tuple as unique identifier for discovering an image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3763) Need for http::put request method
[ https://issues.apache.org/jira/browse/MESOS-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107899#comment-15107899 ] Adam B commented on MESOS-3763: --- [~jvanremoortere] Are you going to have time to shepherd this patch, or should I take it over and commit it? > Need for http::put request method > - > > Key: MESOS-3763 > URL: https://issues.apache.org/jira/browse/MESOS-3763 > Project: Mesos > Issue Type: Task >Reporter: Joerg Schad >Assignee: Yongqiao Wang >Priority: Minor > Labels: mesosphere > > As we decided to create a more restful api for managing Quota request. > Therefore we also want to use the HTTP put request and hence need to enable > the libprocess/http to send put request besides get and post requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4435) Update `Master::Http::stateSummary` to use `jsonify`.
Michael Park created MESOS-4435: --- Summary: Update `Master::Http::stateSummary` to use `jsonify`. Key: MESOS-4435 URL: https://issues.apache.org/jira/browse/MESOS-4435 Project: Mesos Issue Type: Task Components: master Reporter: Michael Park Assignee: Michael Park Update {{state-summary}} to use {{jsonify}} to stay consistent with {{state}} HTTP endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"
[ https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4421: --- Sprint: Mesosphere Sprint 27 Story Points: 3 > Document that /reserve, /create-volumes endpoints can return misleading > "success" > - > > Key: MESOS-4421 > URL: https://issues.apache.org/jira/browse/MESOS-4421 > Project: Mesos > Issue Type: Task > Components: documentation, master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: documentation, endpoint, mesosphere, persistent-volumes, > reservations > > The docs for the {{/reserve}} endpoint say: > {noformat} > 200 OK: Success (the requested resources have been reserved). > {noformat} > This is not true: the master returns {{200}} when the request has been > validated and a {{CheckpointResourcesMessage}} has been sent to the agent, > but the master does not attempt to verify that the message has been received > or that the agent successfully checkpointed. Same behavior applies to > {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}. > We should _either_: > 1. Accurately document what {{200}} return code means. > 2. Change the implementation to wait for the agent's next checkpoint to > succeed (and to include the effect of the operation) before returning success > to the HTTP client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"
[ https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4421: --- Shepherd: Jie Yu > Document that /reserve, /create-volumes endpoints can return misleading > "success" > - > > Key: MESOS-4421 > URL: https://issues.apache.org/jira/browse/MESOS-4421 > Project: Mesos > Issue Type: Task > Components: documentation, master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: documentation, endpoint, mesosphere, persistent-volumes, > reservations > > The docs for the {{/reserve}} endpoint say: > {noformat} > 200 OK: Success (the requested resources have been reserved). > {noformat} > This is not true: the master returns {{200}} when the request has been > validated and a {{CheckpointResourcesMessage}} has been sent to the agent, > but the master does not attempt to verify that the message has been received > or that the agent successfully checkpointed. Same behavior applies to > {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}. > We should _either_: > 1. Accurately document what {{200}} return code means. > 2. Change the implementation to wait for the agent's next checkpoint to > succeed (and to include the effect of the operation) before returning success > to the HTTP client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4111) Provide a means for libprocess users to exit while ensuring messages are flushed.
[ https://issues.apache.org/jira/browse/MESOS-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107856#comment-15107856 ] haosdent commented on MESOS-4111: - oh, thank you very much. > Provide a means for libprocess users to exit while ensuring messages are > flushed. > - > > Key: MESOS-4111 > URL: https://issues.apache.org/jira/browse/MESOS-4111 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Benjamin Mahler >Priority: Minor > > Currently after a {{send}} there is no way to ensure that the message is > flushed on the socket before terminating. We work around this by inserting > {{os::sleep}} calls (see MESOS-243, MESOS-4106). > There are a number of approaches to this: > (1) Return a Future from send that notifies when the message is flushed from > the system. > (2) Call process::finalize before exiting. This would require that > process::finalize flushes all of the outstanding data on any active sockets, > which may block. > Regardless of the approach, there needs to be a timer if we want to guarantee > termination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4349) GMock warning in SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor
[ https://issues.apache.org/jira/browse/MESOS-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108131#comment-15108131 ] Timothy Chen commented on MESOS-4349: - commit d31f9152a7250583c51f2e0568aa0b5a09cc88e9 Author: Neil ConwayDate: Tue Jan 19 22:27:14 2016 -0800 Fixed more tests that didn't set a shutdown expect for MockExecutor. Specifically, the following tests: MasterTest.OfferNotRescindedOnceUsed OversubscriptionTest.FetchResourceUsageFromMonitor OversubscriptionTest.QoSFetchResourceUsageFromMonitor SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor Review: https://reviews.apache.org/r/42265/ > GMock warning in SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor > > > Key: MESOS-4349 > URL: https://issues.apache.org/jira/browse/MESOS-4349 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere, tests > Fix For: 0.27.0 > > > {noformat} > [ RUN ] SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor > GMOCK WARNING: > Uninteresting mock function call - returning directly. > Function call: shutdown(0x7fe189cae850) > Stack trace: > [ OK ] SlaveTest.ContainerUpdatedBeforeTaskReachesExecutor (51 ms) > {noformat} > Occurs non-deterministically for me on OSX 10.10, perhaps one run in ten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4389) Master "roles" endpoint only shows active role
[ https://issues.apache.org/jira/browse/MESOS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108147#comment-15108147 ] Fan Du commented on MESOS-4389: --- Based on the code review, it's by design, it doesn't matter much though to use it. Just a random puzzle :) > Master "roles" endpoint only shows active role > -- > > Key: MESOS-4389 > URL: https://issues.apache.org/jira/browse/MESOS-4389 > Project: Mesos > Issue Type: Improvement > Components: HTTP API, master >Reporter: Fan Du > > Register two slaves to master with role "busybox" and "ubuntu" respectively, > then running marthon with role "busybox", after this check master "roles" > endpoints, it can only get default and active role, could this be improved to > show all available roles for easily checking? > {code} > { > "roles": [ > { > "frameworks": [], > "name": "*", > "resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > }, > "weight": 1.0 > }, > { > "frameworks": [ > "2caebb14-161f-4941-b8ab-8990cef01ac0-" > ], > "name": "busybox", > "resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > }, > "weight": 1.0 > } > ] > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4435) Update `Master::Http::stateSummary` to use `jsonify`.
[ https://issues.apache.org/jira/browse/MESOS-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108003#comment-15108003 ] Michael Park commented on MESOS-4435: - https://reviews.apache.org/r/42543/ https://reviews.apache.org/r/42546/ > Update `Master::Http::stateSummary` to use `jsonify`. > - > > Key: MESOS-4435 > URL: https://issues.apache.org/jira/browse/MESOS-4435 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Michael Park >Assignee: Michael Park > > Update {{state-summary}} to use {{jsonify}} to stay consistent with {{state}} > HTTP endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4339) Add weight support for framework sorter
[ https://issues.apache.org/jira/browse/MESOS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108142#comment-15108142 ] Fan Du commented on MESOS-4339: --- [~adam-mesos] and [~bbannier] Based on the proposal documentation from MESOS-4284, it's well justified to enable weighted DRF framework sorter in a multi-role scenario, to keep the allocation decision fair across roles and frameworks. Although the work to support weighted DRF framework sorter is independent with that of multi-role frameworks in its design logic(which is what I thought before incompletely) but, the former needed to be done *AFTER* multi-role frameworks apparently in implementation. So I'm wondering if you don't mind, I would still like to contribute this ticket to multi-role frameworks. > Add weight support for framework sorter > --- > > Key: MESOS-4339 > URL: https://issues.apache.org/jira/browse/MESOS-4339 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Fan Du >Assignee: Fan Du > > Current framework sorter doesn't take into account of weights when sorting > framework belonging to a particular role, i.e., all frameworks has equal > weights as 1. Considering the role weight is controlled by the operator, > enable the framework weight does not impact the role level allocation > decision from any greedy frameworks, but it will be beneficial to some > framework who could get more resources within a specific role. > The framework weight will come from message FrameworkInfo when it got > registered, and FrameworkSorters will "add" framework with weight, > this will eventually result a weighted framework sorting flow when master > make the finally allocation decision. > Please review this ticket which I will work on if it's considered acceptable. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4146) Distinguish usage slack and allocation slack revocable resources
[ https://issues.apache.org/jira/browse/MESOS-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055424#comment-15055424 ] Guangya Liu edited comment on MESOS-4146 at 1/20/16 7:43 AM: - https://reviews.apache.org/r/41333/ https://reviews.apache.org/r/41334/ https://reviews.apache.org/r/42547/ was (Author: gyliu): https://reviews.apache.org/r/41333/ https://reviews.apache.org/r/41334/ > Distinguish usage slack and allocation slack revocable resources > > > Key: MESOS-4146 > URL: https://issues.apache.org/jira/browse/MESOS-4146 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu > > The API revocable() can now return resources which are revocable including > both allocation slack and usage slack, it is better add two new APIs to > return revocable resources for both allocation slack and usage slack. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4440) Clean get/post/deleteRequest func and let the caller to use the general funcion.
Yongqiao Wang created MESOS-4440: Summary: Clean get/post/deleteRequest func and let the caller to use the general funcion. Key: MESOS-4440 URL: https://issues.apache.org/jira/browse/MESOS-4440 Project: Mesos Issue Type: Bug Reporter: Yongqiao Wang Assignee: Yongqiao Wang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4440) Clean get/post/deleteRequest func and let the caller to use the general funcion.
[ https://issues.apache.org/jira/browse/MESOS-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108162#comment-15108162 ] Yongqiao Wang commented on MESOS-4440: -- In MESOS-3763 ticket, we have exposed the internal::http::request function in the header, so it needs to clean the other instances of post/get to use the http::request method in this ticket. > Clean get/post/deleteRequest func and let the caller to use the general > funcion. > > > Key: MESOS-4440 > URL: https://issues.apache.org/jira/browse/MESOS-4440 > Project: Mesos > Issue Type: Bug >Reporter: Yongqiao Wang >Assignee: Yongqiao Wang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2930) Allow the Resource Estimator to express over-allocation of revocable resources.
[ https://issues.apache.org/jira/browse/MESOS-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106688#comment-15106688 ] Niklas Quarfot Nielsen commented on MESOS-2930: --- Hi [~bmahler] - sorry for the super tardy reply. For Serenity, the Estimator and QoS controllers acts as edges on a shared pipeline of filters (lives in it's own actor). In short, the estimator pushes usage statistics in and awaits estimates, whereas the QoS controller awaits corrections from the pipeline. > Allow the Resource Estimator to express over-allocation of revocable > resources. > --- > > Key: MESOS-2930 > URL: https://issues.apache.org/jira/browse/MESOS-2930 > Project: Mesos > Issue Type: Improvement > Components: slave >Reporter: Benjamin Mahler >Assignee: Klaus Ma > > Currently the resource estimator returns the amount of oversubscription > resources that are available, since resources cannot be negative, this allows > the resource estimator to express the following: > (1) Return empty resources: We are fully allocated for oversubscription > resources. > (2) Return non-empty resources: We are under-allocated for oversubscription > resources. In other words, some are available. > However, there is an additional situation that we cannot express: > (3) Analogous to returning non-empty "negative" resources: We are > over-allocated for oversubscription resources. Do not re-offer any of the > over-allocated oversubscription resources that are recovered. > Without (3), the slave can only shrink the total pool of oversubscription > resources by returning (1) as resources are recovered, until the pool is > shrunk to the desired size. However, this approach is only best-effort, it's > possible for a framework to launch more tasks in the window of time (15 > seconds by default) that the slave polls the estimator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3889) Modify Oversubscription documentation to explicitly forbid the QoS Controller from killing executors running on optimistically offered resources.
[ https://issues.apache.org/jira/browse/MESOS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106748#comment-15106748 ] Niklas Quarfot Nielsen commented on MESOS-3889: --- [~hartem] [~klaus1982] Can you add a bit of context on this ticket? :) > Modify Oversubscription documentation to explicitly forbid the QoS Controller > from killing executors running on optimistically offered resources. > - > > Key: MESOS-3889 > URL: https://issues.apache.org/jira/browse/MESOS-3889 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Klaus Ma > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2930) Allow the Resource Estimator to express over-allocation of revocable resources.
[ https://issues.apache.org/jira/browse/MESOS-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106863#comment-15106863 ] Klaus Ma edited comment on MESOS-2930 at 1/19/16 3:21 PM: -- [~nnielsen], there's a case we need to handle regrading this JIRA: * T1: in cluster, {{cpus=2}}: on is revocable and the other one is nonRevocable * T2: {{framework1}} get offer {{cpus=2}}, but did NOT launch tasks * T3: Estimator update empty resources; {{slave.total}} is updated to {{cpus=1}} in {{HierarchicalAllocatorProcess::updateSlave}} * T4: in {{allocate()}}, slave.total (cpus=1) < slave.allocated (cpus=2), the resources {{cpus=1}} will re-offer to framework because {{operator-}} will return first item if {{subtractable}} is false. Any comments? was (Author: klaus1982): [~nnielsen], there's a case we need to handle regrading this JIRA: * T1: in cluster, {{cpus=2}}: on is revocable and the other one is nonRevocable * T2: {{framework1}} get offer {{cpus=2}}, but did NOT launch tasks * T3: Estimator update empty resources; {{slave.total}} is updated to {{cpus=1}} in {{HierarchicalAllocatorProcess::updateSlave}} * T4: in {{allocate()}}, slave.total (cpus=1) < slave.allocated (cpus=2), the resources {{cpus=1}} will re-offer to framework because {{operator-}} will return first item is {{subtractable}} is false. Any comments? > Allow the Resource Estimator to express over-allocation of revocable > resources. > --- > > Key: MESOS-2930 > URL: https://issues.apache.org/jira/browse/MESOS-2930 > Project: Mesos > Issue Type: Improvement > Components: slave >Reporter: Benjamin Mahler >Assignee: Klaus Ma > > Currently the resource estimator returns the amount of oversubscription > resources that are available, since resources cannot be negative, this allows > the resource estimator to express the following: > (1) Return empty resources: We are fully allocated for oversubscription > resources. > (2) Return non-empty resources: We are under-allocated for oversubscription > resources. In other words, some are available. > However, there is an additional situation that we cannot express: > (3) Analogous to returning non-empty "negative" resources: We are > over-allocated for oversubscription resources. Do not re-offer any of the > over-allocated oversubscription resources that are recovered. > Without (3), the slave can only shrink the total pool of oversubscription > resources by returning (1) as resources are recovered, until the pool is > shrunk to the desired size. However, this approach is only best-effort, it's > possible for a framework to launch more tasks in the window of time (15 > seconds by default) that the slave polls the estimator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2930) Allow the Resource Estimator to express over-allocation of revocable resources.
[ https://issues.apache.org/jira/browse/MESOS-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106863#comment-15106863 ] Klaus Ma commented on MESOS-2930: - [~nnielsen], there's a case we need to handle regrading this JIRA: * T1: in cluster, {{cpus=2}}: on is revocable and the other one is nonRevocable * T2: {{framework1}} get offer {{cpus=2}}, but did NOT launch tasks * T3: Estimator update empty resources; {{slave.total}} is updated to {{cpus=1}} in {{HierarchicalAllocatorProcess::updateSlave}} * T4: in {{allocate()}}, slave.total (cpus=1) < slave.allocated (cpus=2), the resources {{cpus=1}} will re-offer to framework because {{operator-}} will return first item is {{subtractable}} is false. Any comments? > Allow the Resource Estimator to express over-allocation of revocable > resources. > --- > > Key: MESOS-2930 > URL: https://issues.apache.org/jira/browse/MESOS-2930 > Project: Mesos > Issue Type: Improvement > Components: slave >Reporter: Benjamin Mahler >Assignee: Klaus Ma > > Currently the resource estimator returns the amount of oversubscription > resources that are available, since resources cannot be negative, this allows > the resource estimator to express the following: > (1) Return empty resources: We are fully allocated for oversubscription > resources. > (2) Return non-empty resources: We are under-allocated for oversubscription > resources. In other words, some are available. > However, there is an additional situation that we cannot express: > (3) Analogous to returning non-empty "negative" resources: We are > over-allocated for oversubscription resources. Do not re-offer any of the > over-allocated oversubscription resources that are recovered. > Without (3), the slave can only shrink the total pool of oversubscription > resources by returning (1) as resources are recovered, until the pool is > shrunk to the desired size. However, this approach is only best-effort, it's > possible for a framework to launch more tasks in the window of time (15 > seconds by default) that the slave polls the estimator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4429) Add oversubscription benchmark/stress/test framework
Niklas Quarfot Nielsen created MESOS-4429: - Summary: Add oversubscription benchmark/stress/test framework Key: MESOS-4429 URL: https://issues.apache.org/jira/browse/MESOS-4429 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen To evaluate the function and quality of oversubscription modules, we could ship a test framework which can: 1) Launch on oversubscribed and non-oversubscribed resources in a controlled manner. For example, register as two different frameworks and see that resources from slack resources of one framework can be used by the other. 2) Measure time to react for different scenarios. For example, measure the time it takes from slack appearing on a slave to the offer being issued with revocable resources. The time to react for changing usage patterns e.g. time to reclaim oversubscribed resources when regular tasks need them back. 3) Count the number of offer rescind, preemptions, etc. to deem the stability of the policy. 4) Be able to measure % extra work being able to run. 5) Work across different resource dimensions as cpu time, memory, network, caches. [~Bartek Plotka] has been working on something similar for Serenity in https://github.com/mesosphere/serenity/tree/master/src/framework which we can reuse as a base. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4102) Quota doesn't allocate resources on slave joining.
[ https://issues.apache.org/jira/browse/MESOS-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-4102: --- Assignee: Klaus Ma (was: Alexander Rukletsov) > Quota doesn't allocate resources on slave joining. > -- > > Key: MESOS-4102 > URL: https://issues.apache.org/jira/browse/MESOS-4102 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Neil Conway >Assignee: Klaus Ma >Priority: Blocker > Labels: mesosphere, quota > Attachments: quota_absent_framework_test-1.patch > > > See attached patch. {{framework1}} is not allocated any resources, despite > the fact that the resources on {{agent2}} can safely be allocated to it > without risk of violating {{quota1}}. If I understand the intended quota > behavior correctly, this doesn't seem intended. > Note that if the framework is added _after_ the slaves are added, the > resources on {{agent2}} are allocated to {{framework1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4369) Enhance DockContainerizer to support Docker network created with Docker CLI
[ https://issues.apache.org/jira/browse/MESOS-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezra Silvera updated MESOS-4369: Description: Currently DockerContainerizer supports the following network options which are Docker built-in networks: {code} message DockerInfo { ... // Network options. enum Network { HOST = 1; BRIDGE = 2; NONE = 3; } ... {code} However, since docker 1.9, Docker now supports user-defined networks (both local and overlays) - e.g., {{docker network create --driver bridge my-network}},. The user can then create containers that need to be attached to these networks e.g., {{docker run --net=my-network}}, We need to enhance DockerExecuter to support such network option so that the Docker container that can connect into such network. was: Currently DockerContainerizer supports the following network options which are Docker built-in networks: {code} message DockerInfo { ... // Network options. enum Network { HOST = 1; BRIDGE = 2; NONE = 3; } ... {code} However, with Docker CLI, user can create a customized network, e.g., {{docker network create my-network}}, we need to enhance DockerContainerizer to support such network so that the Docker container that user creates in Mesos with DockerContainerizer can connect into such network. > Enhance DockContainerizer to support Docker network created with Docker CLI > --- > > Key: MESOS-4369 > URL: https://issues.apache.org/jira/browse/MESOS-4369 > Project: Mesos > Issue Type: Improvement > Components: docker >Reporter: Qian Zhang >Assignee: Ezra Silvera > > Currently DockerContainerizer supports the following network options which > are Docker built-in networks: > {code} > message DockerInfo { > ... > // Network options. > enum Network { > HOST = 1; > BRIDGE = 2; > NONE = 3; > } > ... > {code} > However, since docker 1.9, Docker now supports user-defined networks (both > local and overlays) - e.g., {{docker network create --driver bridge > my-network}},. The user can then create containers that need to be attached > to these networks e.g., {{docker run --net=my-network}}, > We need to enhance DockerExecuter to support such network option so that the > Docker container that can connect into such network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4430) Identify and change logging level for message that don't contain specific task/framework/slave info
Kapil Arya created MESOS-4430: - Summary: Identify and change logging level for message that don't contain specific task/framework/slave info Key: MESOS-4430 URL: https://issues.apache.org/jira/browse/MESOS-4430 Project: Mesos Issue Type: Bug Reporter: Kapil Arya The idea is to identify message such as: {code} mesos-slave[37891]: I0117 15:20:15.357344 37941 slave.cpp:4200] Received oversubscribable resources from the resource estimator mesos-slave[37891]: I0117 15:20:30.357959 37957 slave.cpp:4186] Querying resource estimator for oversubscribable resources {code} and remove them from default logging level. These messages don't provide any value to the sysadmin, etc., and fill up logs. In one incident, we observed over 12K lines of such message in the log over a 33hr run of a cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4422) Use adaptor::reverse for reverse iteration in the code base.
[ https://issues.apache.org/jira/browse/MESOS-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106926#comment-15106926 ] haosdent commented on MESOS-4422: - A naive question, I saw [~tnachen] add reverse_foreach in [r42379|https://reviews.apache.org/r/42379/] before. So why we give up that proposal? > Use adaptor::reverse for reverse iteration in the code base. > > > Key: MESOS-4422 > URL: https://issues.apache.org/jira/browse/MESOS-4422 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: haosdent > > It would be good to be consistent on our looping structure. > Currently, we use foreach for forward iteration and use rbegin/rend for > reverse iteration. We recently added adaptor::reverse > (https://reviews.apache.org/r/42450) in stout, which allows us to do: > {noformat} > vector input = {}; > foreach (int i, adaptor::reverse(input)) { > ... > } > {noformat} > We should cleanup our code to consistently use this structure on reverse > iteration. > {noformat} > jie$ grep -R rbegin src > src/common/protobuf_utils.cpp: for (auto status = task.statuses().rbegin(); > src/slave/containerizer/mesos/containerizer.cpp: for (auto it = > isolators.crbegin(); it != isolators.crend(); ++it) { > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4422) Use adaptor::reverse for reverse iteration in the code base.
[ https://issues.apache.org/jira/browse/MESOS-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4422: --- Assignee: haosdent > Use adaptor::reverse for reverse iteration in the code base. > > > Key: MESOS-4422 > URL: https://issues.apache.org/jira/browse/MESOS-4422 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: haosdent > > It would be good to be consistent on our looping structure. > Currently, we use foreach for forward iteration and use rbegin/rend for > reverse iteration. We recently added adaptor::reverse > (https://reviews.apache.org/r/42450) in stout, which allows us to do: > {noformat} > vector input = {}; > foreach (int i, adaptor::reverse(input)) { > ... > } > {noformat} > We should cleanup our code to consistently use this structure on reverse > iteration. > {noformat} > jie$ grep -R rbegin src > src/common/protobuf_utils.cpp: for (auto status = task.statuses().rbegin(); > src/slave/containerizer/mesos/containerizer.cpp: for (auto it = > isolators.crbegin(); it != isolators.crend(); ++it) { > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1718) Command executor can overcommit the slave.
[ https://issues.apache.org/jira/browse/MESOS-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107782#comment-15107782 ] Vinod Kone commented on MESOS-1718: --- I don't have cycles to shepherd this at the moment :( > Command executor can overcommit the slave. > -- > > Key: MESOS-1718 > URL: https://issues.apache.org/jira/browse/MESOS-1718 > Project: Mesos > Issue Type: Bug > Components: slave >Reporter: Benjamin Mahler >Assignee: Klaus Ma > > Currently we give a small amount of resources to the command executor, in > addition to resources used by the command task: > https://github.com/apache/mesos/blob/0.20.0-rc1/src/slave/slave.cpp#L2448 > {code: title=} > ExecutorInfo Slave::getExecutorInfo( > const FrameworkID& frameworkId, > const TaskInfo& task) > { > ... > // Add an allowance for the command executor. This does lead to a > // small overcommit of resources. > executor.mutable_resources()->MergeFrom( > Resources::parse( > "cpus:" + stringify(DEFAULT_EXECUTOR_CPUS) + ";" + > "mem:" + stringify(DEFAULT_EXECUTOR_MEM.megabytes())).get()); > ... > } > {code} > This leads to an overcommit of the slave. Ideally, for command tasks we can > "transfer" all of the task resources to the executor at the slave / isolation > level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1718) Command executor can overcommit the slave.
[ https://issues.apache.org/jira/browse/MESOS-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107811#comment-15107811 ] Klaus Ma commented on MESOS-1718: - NP :). Your comments are always helpful. I'll find a volunteer for this JIRA. > Command executor can overcommit the slave. > -- > > Key: MESOS-1718 > URL: https://issues.apache.org/jira/browse/MESOS-1718 > Project: Mesos > Issue Type: Bug > Components: slave >Reporter: Benjamin Mahler >Assignee: Klaus Ma > > Currently we give a small amount of resources to the command executor, in > addition to resources used by the command task: > https://github.com/apache/mesos/blob/0.20.0-rc1/src/slave/slave.cpp#L2448 > {code: title=} > ExecutorInfo Slave::getExecutorInfo( > const FrameworkID& frameworkId, > const TaskInfo& task) > { > ... > // Add an allowance for the command executor. This does lead to a > // small overcommit of resources. > executor.mutable_resources()->MergeFrom( > Resources::parse( > "cpus:" + stringify(DEFAULT_EXECUTOR_CPUS) + ";" + > "mem:" + stringify(DEFAULT_EXECUTOR_MEM.megabytes())).get()); > ... > } > {code} > This leads to an overcommit of the slave. Ideally, for command tasks we can > "transfer" all of the task resources to the executor at the slave / isolation > level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4417) Prevent allocator from crashing on successful recovery.
[ https://issues.apache.org/jira/browse/MESOS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-4417: --- Summary: Prevent allocator from crashing on successful recovery. (was: Refactor allocator recovery.) > Prevent allocator from crashing on successful recovery. > --- > > Key: MESOS-4417 > URL: https://issues.apache.org/jira/browse/MESOS-4417 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Blocker > Labels: mesosphere > > There might be a bug that may crash the master as pointed out by [~bmahler] > in https://reviews.apache.org/r/4/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4382) Change the `principal` in `ReservationInfo` to optional
[ https://issues.apache.org/jira/browse/MESOS-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-4382: - Sprint: Mesosphere Sprint 26 (was: Mesosphere Sprint 27) > Change the `principal` in `ReservationInfo` to optional > --- > > Key: MESOS-4382 > URL: https://issues.apache.org/jira/browse/MESOS-4382 > Project: Mesos > Issue Type: Improvement >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere, reservations > > With the addition of HTTP endpoints for {{/reserve}} and {{/unreserve}}, it > is now desirable to allow dynamic reservations without a principal, in the > case where HTTP authentication is disabled. To allow for this, we will change > the {{principal}} field in {{ReservationInfo}} from required to optional. For > backwards-compatibility, however, the master should currently invalidate any > {{ReservationInfo}} messages that do not have this field set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4102) Quota doesn't allocate resources on slave joining.
[ https://issues.apache.org/jira/browse/MESOS-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107026#comment-15107026 ] Alexander Rukletsov commented on MESOS-4102: https://reviews.apache.org/r/42510/ > Quota doesn't allocate resources on slave joining. > -- > > Key: MESOS-4102 > URL: https://issues.apache.org/jira/browse/MESOS-4102 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Neil Conway >Assignee: Klaus Ma >Priority: Blocker > Labels: mesosphere, quota > Attachments: quota_absent_framework_test-1.patch > > > See attached patch. {{framework1}} is not allocated any resources, despite > the fact that the resources on {{agent2}} can safely be allocated to it > without risk of violating {{quota1}}. If I understand the intended quota > behavior correctly, this doesn't seem intended. > Note that if the framework is added _after_ the slaves are added, the > resources on {{agent2}} are allocated to {{framework1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-4102) Quota doesn't allocate resources on slave joining.
[ https://issues.apache.org/jira/browse/MESOS-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-4102: --- Comment: was deleted (was: https://reviews.apache.org/r/42510/) > Quota doesn't allocate resources on slave joining. > -- > > Key: MESOS-4102 > URL: https://issues.apache.org/jira/browse/MESOS-4102 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Neil Conway >Assignee: Klaus Ma >Priority: Blocker > Labels: mesosphere, quota > Attachments: quota_absent_framework_test-1.patch > > > See attached patch. {{framework1}} is not allocated any resources, despite > the fact that the resources on {{agent2}} can safely be allocated to it > without risk of violating {{quota1}}. If I understand the intended quota > behavior correctly, this doesn't seem intended. > Note that if the framework is added _after_ the slaves are added, the > resources on {{agent2}} are allocated to {{framework1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4102) Quota doesn't allocate resources on slave joining.
[ https://issues.apache.org/jira/browse/MESOS-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106854#comment-15106854 ] Alexander Rukletsov edited comment on MESOS-4102 at 1/19/16 5:17 PM: - https://reviews.apache.org/r/42289 https://reviews.apache.org/r/42510/ was (Author: alexr): https://reviews.apache.org/r/42289 > Quota doesn't allocate resources on slave joining. > -- > > Key: MESOS-4102 > URL: https://issues.apache.org/jira/browse/MESOS-4102 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Neil Conway >Assignee: Klaus Ma >Priority: Blocker > Labels: mesosphere, quota > Attachments: quota_absent_framework_test-1.patch > > > See attached patch. {{framework1}} is not allocated any resources, despite > the fact that the resources on {{agent2}} can safely be allocated to it > without risk of violating {{quota1}}. If I understand the intended quota > behavior correctly, this doesn't seem intended. > Note that if the framework is added _after_ the slaves are added, the > resources on {{agent2}} are allocated to {{framework1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4411) Traverse all roles for quota allocation
[ https://issues.apache.org/jira/browse/MESOS-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107025#comment-15107025 ] Alexander Rukletsov commented on MESOS-4411: https://reviews.apache.org/r/42511/ > Traverse all roles for quota allocation > --- > > Key: MESOS-4411 > URL: https://issues.apache.org/jira/browse/MESOS-4411 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Critical > Labels: mesosphere > > There might be a bug in how resources are allocated to multiple quota'ed > roles if one role's quota is met. We need to investigate this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4411) Traverse all roles for quota allocation.
[ https://issues.apache.org/jira/browse/MESOS-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-4411: --- Priority: Blocker (was: Critical) Summary: Traverse all roles for quota allocation. (was: Traverse all roles for quota allocation) > Traverse all roles for quota allocation. > > > Key: MESOS-4411 > URL: https://issues.apache.org/jira/browse/MESOS-4411 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Blocker > Labels: mesosphere > > There might be a bug in how resources are allocated to multiple quota'ed > roles if one role's quota is met. We need to investigate this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)