[jira] [Commented] (MESOS-2501) Doxygen style for libprocess
[ https://issues.apache.org/jira/browse/MESOS-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588033#comment-14588033 ] Joerg Schad commented on MESOS-2501: https://reviews.apache.org/r/35509/ Doxygen style for libprocess Key: MESOS-2501 URL: https://issues.apache.org/jira/browse/MESOS-2501 Project: Mesos Issue Type: Documentation Components: libprocess Reporter: Bernd Mathiske Assignee: Joerg Schad Original Estimate: 7m Remaining Estimate: 7m Create a description of the Doxygen style to use for libprocess documentation. It is expected that this will later also become the Doxygen style for stout and Mesos, but we are working on libprocess only for now. Possible outcome: a file named docs/doxygen-style.md We hope for much input and expect a lot of discussion! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2394) Create styleguide for documentation
[ https://issues.apache.org/jira/browse/MESOS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588035#comment-14588035 ] Joerg Schad commented on MESOS-2394: https://reviews.apache.org/r/35510/ Create styleguide for documentation --- Key: MESOS-2394 URL: https://issues.apache.org/jira/browse/MESOS-2394 Project: Mesos Issue Type: Documentation Reporter: Joerg Schad Assignee: Joerg Schad Priority: Minor As of right now different pages in our documentation use quite different styles. Consider for example the different emphasis for NOTE: * {noformat} NOTE: http://mesos.apache.org/documentation/latest/slave-recovery/{noformat} * {noformat}*NOTE*: http://mesos.apache.org/documentation/latest/upgrades/ {noformat} Would be great to establish a common style for the documentation! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2874) Convert PortMappingStatistics to use automatic JSON encoding/decoding
Paul Brett created MESOS-2874: - Summary: Convert PortMappingStatistics to use automatic JSON encoding/decoding Key: MESOS-2874 URL: https://issues.apache.org/jira/browse/MESOS-2874 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Simplify PortMappingStatistics by using JSON::Protocol and protobuf::parse to convert ResourceStatistics to/from line format. This change will simplify the implementation of MESOS-2332. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2874) Convert PortMappingStatistics to use automatic JSON encoding/decoding
[ https://issues.apache.org/jira/browse/MESOS-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2874: -- Component/s: test isolation Convert PortMappingStatistics to use automatic JSON encoding/decoding - Key: MESOS-2874 URL: https://issues.apache.org/jira/browse/MESOS-2874 Project: Mesos Issue Type: Bug Components: isolation, test Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Simplify PortMappingStatistics by using JSON::Protocol and protobuf::parse to convert ResourceStatistics to/from line format. This change will simplify the implementation of MESOS-2332. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2246) Improve slave health-checking
[ https://issues.apache.org/jira/browse/MESOS-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588361#comment-14588361 ] Joe Smith commented on MESOS-2246: -- [~vinodkone] [~jieyu] given the tickets in this epic are completed, can this be resolved? Improve slave health-checking - Key: MESOS-2246 URL: https://issues.apache.org/jira/browse/MESOS-2246 Project: Mesos Issue Type: Epic Components: master, slave Reporter: Dominic Hamon In the event of a network partition, or other systemic issues, we may see widespread slave removal. There are several approaches we can take to mitigate this issue including, but not limited to: . rate limit the slave removal . change how we do health checking to not rely on a single point of view . work with frameworks to determine SLA of running services before removing the slave . manual control to allow operator intervention -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2653) Slave should act on correction events from QoS controller
[ https://issues.apache.org/jira/browse/MESOS-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2653: -- Shepherd: Jie Yu Slave should act on correction events from QoS controller - Key: MESOS-2653 URL: https://issues.apache.org/jira/browse/MESOS-2653 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Niklas Quarfot Nielsen Labels: mesosphere Slave might want to kill revocable tasks based on correction events from the QoS controller. The QoS controller communicates corrections through a stream (or process::Queue) to the slave which corrections it needs to carry out, in order to mitigate interference with production tasks. The correction is communicated through a message: [code] message QoSCorrection { enum CorrectionType { KillExecutor = 1 // KillTask = 2 // Resize, throttle task } optional string reason = X; optional ExecutorID executor_id = X; // optional TaskID task_id = X; } [/code] And the slave will setup a handler to process these events. Initially, only executor termination is supported and cause the slave to issue 'containerizer-destroy()'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2704) Add tests for QoS controller corrections
[ https://issues.apache.org/jira/browse/MESOS-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-2704: -- Shepherd: Jie Yu Add tests for QoS controller corrections Key: MESOS-2704 URL: https://issues.apache.org/jira/browse/MESOS-2704 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Niklas Quarfot Nielsen Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2246) Improve slave health-checking
[ https://issues.apache.org/jira/browse/MESOS-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588449#comment-14588449 ] Vinod Kone commented on MESOS-2246: --- I think we solved the first part of the problem, rate limiting slave removals. We still haven't solved improving the scalability of health checks and being SLA aware. Since they latter can be epics in themselves we can resolve this and open new ones. Improve slave health-checking - Key: MESOS-2246 URL: https://issues.apache.org/jira/browse/MESOS-2246 Project: Mesos Issue Type: Epic Components: master, slave Reporter: Dominic Hamon In the event of a network partition, or other systemic issues, we may see widespread slave removal. There are several approaches we can take to mitigate this issue including, but not limited to: . rate limit the slave removal . change how we do health checking to not rely on a single point of view . work with frameworks to determine SLA of running services before removing the slave . manual control to allow operator intervention -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2876) Log process list when OOM killing a container
Ian Downes created MESOS-2876: - Summary: Log process list when OOM killing a container Key: MESOS-2876 URL: https://issues.apache.org/jira/browse/MESOS-2876 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.22.1 Reporter: Ian Downes Priority: Minor When the kernel notifies us of OOM killing a process in a container we currently log the memory statistics of remaining processes. We could also log information about all the remaining processes as it may be helpful in debugging the cause of the OOM. The notification is asynchronous and we may detect the executor terminating first so this will be best effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2613) Change docker rm command
[ https://issues.apache.org/jira/browse/MESOS-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588771#comment-14588771 ] Vaibhav Khanduja commented on MESOS-2613: - The docker plans to support docker volume extension starting version 1.9 (I guess) ... with volume extension supported docker rm shall call into plugin module implementing volume support to delete the persistent volumes ... Change docker rm command Key: MESOS-2613 URL: https://issues.apache.org/jira/browse/MESOS-2613 Project: Mesos Issue Type: Improvement Components: containerization, docker Reporter: Mike Michel Priority: Minor Right now it seems Mesos is using „docker rm –f ID“ to delete containers so bind mounts are not deleted. This means thousands of dirs in /var/lib/docker/vfs/dir I would like to have the option to change it to „docker rm –f –v ID“ This deletes bind mounts but not persistant volumes. Best, Mike -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-999) Slave should wait() and start executor registration timeout after launch
[ https://issues.apache.org/jira/browse/MESOS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588765#comment-14588765 ] Yan Xu commented on MESOS-999: -- [~idownes]: In [~nsuneja]'s reviews there is a new flags {{--executor_launch_timeout}} to guard against the launcher taking forever to prepare the executor. I think this is an appropriate approach even though this timeout feels like something specific to each container / provisioner, a single upper bound which is configurable by the cluster operator seems sufficient. [~nsuneja] would you like to revive your reviews? Otherwise I can take it over and push it forward. Slave should wait() and start executor registration timeout after launch - Key: MESOS-999 URL: https://issues.apache.org/jira/browse/MESOS-999 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.18.0 Reporter: Ian Downes Assignee: Yan Xu Priority: Minor Labels: twitter The current code will start launch a container and wait on it before the launch is complete. We should do this only after the container has successfully launched. Likewise for the executor registration timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2875) Add containerId to ResourceUsage to enable QoS controller to target a container
Niklas Quarfot Nielsen created MESOS-2875: - Summary: Add containerId to ResourceUsage to enable QoS controller to target a container Key: MESOS-2875 URL: https://issues.apache.org/jira/browse/MESOS-2875 Project: Mesos Issue Type: Improvement Reporter: Niklas Quarfot Nielsen We should ensure that we are addressing the _container_ which the QoS controller intended to kill. Without this check, we may run into a scenario where the executor has terminated and one with the same id has started in the interim i.e. running in a different container than the one the QoS controller targeted. This most likely requires us to add containerId to the ResourceUsage message and encode the containerID in the QoS Correction message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2613) Change docker rm command
[ https://issues.apache.org/jira/browse/MESOS-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588771#comment-14588771 ] Vaibhav Khanduja edited comment on MESOS-2613 at 6/16/15 9:02 PM: -- The docker plans to support docker volume extension starting version 1.9 (I guess) ... with volume extension supported docker rm shall call into plugin module implementing volume support to delete the persistent volumes ... https://github.com/docker/docker/pull/13161 was (Author: vaibhavkhanduja): The docker plans to support docker volume extension starting version 1.9 (I guess) ... with volume extension supported docker rm shall call into plugin module implementing volume support to delete the persistent volumes ... Change docker rm command Key: MESOS-2613 URL: https://issues.apache.org/jira/browse/MESOS-2613 Project: Mesos Issue Type: Improvement Components: containerization, docker Reporter: Mike Michel Priority: Minor Right now it seems Mesos is using „docker rm –f ID“ to delete containers so bind mounts are not deleted. This means thousands of dirs in /var/lib/docker/vfs/dir I would like to have the option to change it to „docker rm –f –v ID“ This deletes bind mounts but not persistant volumes. Best, Mike -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-1859) src/examples/docker_no_executor_framework.cpp uses wrong ContainerInfo
[ https://issues.apache.org/jira/browse/MESOS-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed MESOS-1859. --- Resolution: Fixed commit 3d63f63156f73d375d8a85a7c485b72c0f739ab0 Author: Jay Buffington jayb...@apple.com Date: Wed May 6 23:03:23 2015 + Fixed docker-no-executor-framework example to send ContainerInfo. src/examples/docker_no_executor_framework.cpp uses wrong ContainerInfo -- Key: MESOS-1859 URL: https://issues.apache.org/jira/browse/MESOS-1859 Project: Mesos Issue Type: Bug Components: docker Reporter: Kevin Matzen Priority: Minor src/examples/docker_no_executor_framework.cpp sets up the docker image using: CommandInfo::ContainerInfo* container = task.mutable_command()-mutable_container(); container-set_image(docker:///busybox); As far as I can tell, the slave expects it to be configured as follows: ContainerInfo* container = task.mutable_container(); container-set_type(ContainerInfo::DOCKER); container-mutable_docker()-set_image(busybox); Did I understand correctly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1859) src/examples/docker_no_executor_framework.cpp uses wrong ContainerInfo
[ https://issues.apache.org/jira/browse/MESOS-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588790#comment-14588790 ] Timothy Chen commented on MESOS-1859: - Yes it looks like it's fixed, I'll close this. src/examples/docker_no_executor_framework.cpp uses wrong ContainerInfo -- Key: MESOS-1859 URL: https://issues.apache.org/jira/browse/MESOS-1859 Project: Mesos Issue Type: Bug Components: docker Reporter: Kevin Matzen Priority: Minor src/examples/docker_no_executor_framework.cpp sets up the docker image using: CommandInfo::ContainerInfo* container = task.mutable_command()-mutable_container(); container-set_image(docker:///busybox); As far as I can tell, the slave expects it to be configured as follows: ContainerInfo* container = task.mutable_container(); container-set_type(ContainerInfo::DOCKER); container-mutable_docker()-set_image(busybox); Did I understand correctly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master
[ https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588838#comment-14588838 ] Marco Massenzio commented on MESOS-1988: By all means, if you have the headway and the will, feel free to take this over. thanks! *Marco Massenzio* *Distributed Systems Engineer* On Tue, Jun 16, 2015 at 1:04 PM, Anand Mazumdar (JIRA) j...@apache.org Scheduler driver should not generate TASK_LOST when disconnected from master Key: MESOS-1988 URL: https://issues.apache.org/jira/browse/MESOS-1988 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: twitter Currently, the driver replies to launchTasks() with TASK_LOST if it detects that it is disconnected from the master. After MESOS-1972 lands, this will be the only place where driver generates TASK_LOST. See MESOS-1972 for more context. This fix is targeted for 0.22.0 to give frameworks time to implement reconciliation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2794) Implement filesystem isolators
[ https://issues.apache.org/jira/browse/MESOS-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588635#comment-14588635 ] Stephan Erb commented on MESOS-2794: Ian, is there a document describing the filesystem isolation or jail story implemented here? I am seeing the different tickets, but I am not sure if I have understood the big picture. Implement filesystem isolators -- Key: MESOS-2794 URL: https://issues.apache.org/jira/browse/MESOS-2794 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.22.1 Reporter: Ian Downes Assignee: Ian Downes Labels: twitter Move persistent volume support from Mesos containerizer to separate filesystem isolators, including support for container rootfs, where possible. Use symlinks for posix systems without container rootfs. Use bind mounts for Linux with/without container rootfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master
[ https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588666#comment-14588666 ] Anand Mazumdar commented on MESOS-1988: --- It seems that the library (src/scheduler/scheduler.cpp) already does the right thing by dropping calls silently. I would go ahead and nuke the second overload(...) that took a TaskInfo as a argument as its no longer being used in the code. Scheduler driver should not generate TASK_LOST when disconnected from master Key: MESOS-1988 URL: https://issues.apache.org/jira/browse/MESOS-1988 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: twitter Currently, the driver replies to launchTasks() with TASK_LOST if it detects that it is disconnected from the master. After MESOS-1972 lands, this will be the only place where driver generates TASK_LOST. See MESOS-1972 for more context. This fix is targeted for 0.22.0 to give frameworks time to implement reconciliation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1246) Convert stout hashmap and hashset to use std:: or std::tr1 instead of boost
[ https://issues.apache.org/jira/browse/MESOS-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-1246: -- Assignee: (was: Dominic Hamon) Convert stout hashmap and hashset to use std:: or std::tr1 instead of boost --- Key: MESOS-1246 URL: https://issues.apache.org/jira/browse/MESOS-1246 Project: Mesos Issue Type: Task Components: stout, technical debt Reporter: Dominic Hamon Priority: Minor Currently, parts of Mesos are failing to build with older compilers (g++ 4.1.2) due to duplicate definitions of 'ref' from boost and std::tr1. We're going to hit these issues more regularly so wherever we can replace boost with std:: or std::tr1 we should. This also sets us up nicely for the move to C++11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master
[ https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589000#comment-14589000 ] Vinod Kone commented on MESOS-1988: --- commit fa0d564a9fd813d51917d21daaf25fe34c4154ee Author: Anand Mazumdar mazumdar.an...@gmail.com Date: Tue Jun 16 16:24:46 2015 -0700 Removed unused drop(...) overload in scheduler library. The library (src/scheduler/scheduler.cpp) already does the right thing of not doing anything and just dropping requests when it is disconnected from the master. This change just deletes the unused overload that is no longer being used. Review: https://reviews.apache.org/r/35538 Scheduler driver should not generate TASK_LOST when disconnected from master Key: MESOS-1988 URL: https://issues.apache.org/jira/browse/MESOS-1988 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: twitter Currently, the driver replies to launchTasks() with TASK_LOST if it detects that it is disconnected from the master. After MESOS-1972 lands, this will be the only place where driver generates TASK_LOST. See MESOS-1972 for more context. This fix is targeted for 0.22.0 to give frameworks time to implement reconciliation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master
[ https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588993#comment-14588993 ] Anand Mazumdar commented on MESOS-1988: --- Deleted OverLoad for review here : https://reviews.apache.org/r/35538 Left: - Send email to dev mailing list to apprise them of the change in driver in (0.24?) - Delete the relevant fragment of code that returns TASK_LOST from sched/sched.cpp. Scheduler driver should not generate TASK_LOST when disconnected from master Key: MESOS-1988 URL: https://issues.apache.org/jira/browse/MESOS-1988 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: twitter Currently, the driver replies to launchTasks() with TASK_LOST if it detects that it is disconnected from the master. After MESOS-1972 lands, this will be the only place where driver generates TASK_LOST. See MESOS-1972 for more context. This fix is targeted for 0.22.0 to give frameworks time to implement reconciliation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2664) Modernize the codebase to C++11
[ https://issues.apache.org/jira/browse/MESOS-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588998#comment-14588998 ] Benjamin Mahler commented on MESOS-2664: Another question, not related to C++11, are we able to enable [Wswitch-enum|https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html] as a warning? Would love to have a compile time warning if we miss any switch cases, rather than crashing in the default case at run-time. If we relied on it, we wouldn't even need the defaults, but that might not be clear to those reading the code. Modernize the codebase to C++11 --- Key: MESOS-2664 URL: https://issues.apache.org/jira/browse/MESOS-2664 Project: Mesos Issue Type: Epic Components: technical debt Reporter: Michael Park Assignee: Michael Park Labels: mesosphere Since [this commit|https://github.com/apache/mesos/commit/0f5c78fad3423181f7227027eb42d162811514e7], we officially require GCC-4.8+ and Clang-3.5+. This means that we now have full C++11 support and therefore can start to modernize our codebase to be more readable, safer and efficient! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2704) Add tests for QoS controller corrections
[ https://issues.apache.org/jira/browse/MESOS-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen resolved MESOS-2704. --- Resolution: Fixed commit 02160a1ceb7d54b851b623e669e6a648be5471c1 Author: Niklas Nielsen n...@qni.dk Date: Tue Jun 16 17:02:46 2015 -0700 Added QoS kill executor correction test. Review: https://reviews.apache.org/r/34721 Add tests for QoS controller corrections Key: MESOS-2704 URL: https://issues.apache.org/jira/browse/MESOS-2704 Project: Mesos Issue Type: Task Reporter: Niklas Quarfot Nielsen Assignee: Niklas Quarfot Nielsen Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2653) Slave should act on correction events from QoS controller
[ https://issues.apache.org/jira/browse/MESOS-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen resolved MESOS-2653. --- Resolution: Fixed commit 8cbbf84068e02cfb5899de8255ac6227713cf7e0 Author: Niklas Nielsen n...@qni.dk Date: Tue Jun 16 17:02:35 2015 -0700 Added kill executor correction to slave. Review: https://reviews.apache.org/r/34720 commit 1a7d815cdf4fb959d0b30782dc3024000e44fba8 Author: Niklas Nielsen n...@qni.dk Date: Tue Jun 16 17:02:26 2015 -0700 Added REASON_EXECUTOR_PREEMPTED as status reason. Review: https://reviews.apache.org/r/34719 Slave should act on correction events from QoS controller - Key: MESOS-2653 URL: https://issues.apache.org/jira/browse/MESOS-2653 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Niklas Quarfot Nielsen Labels: mesosphere Slave might want to kill revocable tasks based on correction events from the QoS controller. The QoS controller communicates corrections through a stream (or process::Queue) to the slave which corrections it needs to carry out, in order to mitigate interference with production tasks. The correction is communicated through a message: [code] message QoSCorrection { enum CorrectionType { KillExecutor = 1 // KillTask = 2 // Resize, throttle task } optional string reason = X; optional ExecutorID executor_id = X; // optional TaskID task_id = X; } [/code] And the slave will setup a handler to process these events. Initially, only executor termination is supported and cause the slave to issue 'containerizer-destroy()'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1648) mesos-slave and mesos-master should have a --pidfile option
[ https://issues.apache.org/jira/browse/MESOS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589142#comment-14589142 ] Benjamin Mahler commented on MESOS-1648: Yes, looks like a duplicate, mind closing one and updating the descriptions? I'd suggest breaking the patches up. As a first approach, we can add this to the common logging flags, and write the file during logging initialization. I'd imagine this would be good enough for a lot of people. Locking and removal of the file would be good to tackle in further patches ({{finalize}} does not execute in most production setups, rather, the slave gets a signal, or calls exit). I'd imagine we may want to {{unlink}} from within an {{atexit}} handler and our common [signal handler|https://github.com/apache/mesos/blob/02160a1ceb7d54b851b623e669e6a648be5471c1/src/logging/logging.cpp#L74]), curious what other projects do as well. mesos-slave and mesos-master should have a --pidfile option --- Key: MESOS-1648 URL: https://issues.apache.org/jira/browse/MESOS-1648 Project: Mesos Issue Type: Improvement Components: master, slave Reporter: Tobias Weingartner Assignee: Greg Mann Labels: newbie, twitter Right now we use a number of wrapper scripts to try and keep up a {{/var/run/mesos/mesos-slave.pid}} in order to be able to monitor the process. It would be nice if this extra (somewhat fragile) wrapper was not necessary. Having a {{--pidfile}} option would eliminate some of this pain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2841) FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata
[ https://issues.apache.org/jira/browse/MESOS-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589318#comment-14589318 ] James DeFelice commented on MESOS-2841: --- Agree that it's useful for API or UI (http(s) things). URI's don't have to be URL's though, they could just be unique names that announce capabilities. For example: {code} // define a handle that describes a storage driver capability of a framework storageCap = mesos.Handle{ uri: urn:mesos:capability:storage, // declares specific capability of framework labels: { { version, v1 }, { version, v1.1 } }, // supported capability API versions visibility: { VisibilityMesos }, // only mesos masters and slaves can see this handle } // define a handle that describes a plugin capability of a framework mesosDnsPlugin = mesos.Handle{ uri: urn:mesos:discovery:plugin, // declares specific capability of framework // supported capability API versions, name of the plugin, location of plugin discovery metadata labels: { { version, v1 }, { name, k8s-services-discovery-plugin }, { location: http://hostname/service/kubernetes/mesos/discovery } }, visibility: { VisibilityCluster }, // visible within the mesos cluster, not outside the cluster } {code} /cc [~joerg84] FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata -- Key: MESOS-2841 URL: https://issues.apache.org/jira/browse/MESOS-2841 Project: Mesos Issue Type: Epic Reporter: James DeFelice Labels: mesosphere A framework instance may offer specific capabilities to the cluster: storage, smartly-balanced request handling across deployed tasks, access to 3rd party services outside of the cluster, etc. These capabilities may or may not be utilized by all, or even most mesos clusters. However, it should be possible for processes running in the cluster to discover capabilities or features of frameworks in order to achieve a higher level of functionality and a more seamless integration experience across the cluster. A rich discovery API attached to the FrameworkInfo could result in some form of early lock-in: there are probably many ways to realize cross-framework integration and external services integration that we haven't considered yet. Rather than over-specify a discovery info message type at the framework level I think FrameworkInfo should expose a **very generic** way to supply metadata for interested consumers (other processes, tasks, etc). Adding a Labels field to FrameworkInfo reuses an existing message type and seems to fit well with the overall intent: attaching generic metadata to a framework instance. These labels should be visible when querying a mesos master's state.json endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2841) FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata
[ https://issues.apache.org/jira/browse/MESOS-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587365#comment-14587365 ] James DeFelice edited comment on MESOS-2841 at 6/17/15 4:46 AM: [[edited]] What about exposing Endpoints that are labeled? {code} type Endpoint struct { uri string// something identifiable, possibly a URL labels Labels // describe the uri visibility Visibility // who sees this info } {code} I think frameworks that offer resources is a cool idea. Not sure how well it aligns with discovery. Unless you were talking about how to advertise resource capabilities through some endpoint? Maybe Endpoint isn't quite right. Perhaps Handle? Service (already overloaded term)? was (Author: jdef): What about exposing Endpoints that are labeled? type Endpoint struct { uri string // something locatable labels Labels // describe the uri visibility Visibility // who sees this info } I think frameworks that offer resources is a cool idea. Not sure how well it aligns with discovery. Unless you were talking about how to advertise resource capabilities through some endpoint? Maybe endpoint isn't quite right. Perhaps Handle? Service (already overloaded term)? FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata -- Key: MESOS-2841 URL: https://issues.apache.org/jira/browse/MESOS-2841 Project: Mesos Issue Type: Epic Reporter: James DeFelice Labels: mesosphere A framework instance may offer specific capabilities to the cluster: storage, smartly-balanced request handling across deployed tasks, access to 3rd party services outside of the cluster, etc. These capabilities may or may not be utilized by all, or even most mesos clusters. However, it should be possible for processes running in the cluster to discover capabilities or features of frameworks in order to achieve a higher level of functionality and a more seamless integration experience across the cluster. A rich discovery API attached to the FrameworkInfo could result in some form of early lock-in: there are probably many ways to realize cross-framework integration and external services integration that we haven't considered yet. Rather than over-specify a discovery info message type at the framework level I think FrameworkInfo should expose a **very generic** way to supply metadata for interested consumers (other processes, tasks, etc). Adding a Labels field to FrameworkInfo reuses an existing message type and seems to fit well with the overall intent: attaching generic metadata to a framework instance. These labels should be visible when querying a mesos master's state.json endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2841) FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata
[ https://issues.apache.org/jira/browse/MESOS-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589093#comment-14589093 ] Ben Whitehead commented on MESOS-2841: -- I think something like that would probably be useful for API, UI (http compatible things) but there are others that still wouldn't be covered. I also don't know what form this would take in terms of mesos message changes. Perhaps [~kozyraki] has some guidance on this. FrameworkInfo should include a Labels field to support arbitrary, lightweight metadata -- Key: MESOS-2841 URL: https://issues.apache.org/jira/browse/MESOS-2841 Project: Mesos Issue Type: Epic Reporter: James DeFelice Labels: mesosphere A framework instance may offer specific capabilities to the cluster: storage, smartly-balanced request handling across deployed tasks, access to 3rd party services outside of the cluster, etc. These capabilities may or may not be utilized by all, or even most mesos clusters. However, it should be possible for processes running in the cluster to discover capabilities or features of frameworks in order to achieve a higher level of functionality and a more seamless integration experience across the cluster. A rich discovery API attached to the FrameworkInfo could result in some form of early lock-in: there are probably many ways to realize cross-framework integration and external services integration that we haven't considered yet. Rather than over-specify a discovery info message type at the framework level I think FrameworkInfo should expose a **very generic** way to supply metadata for interested consumers (other processes, tasks, etc). Adding a Labels field to FrameworkInfo reuses an existing message type and seems to fit well with the overall intent: attaching generic metadata to a framework instance. These labels should be visible when querying a mesos master's state.json endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1648) mesos-slave and mesos-master should have a --pidfile option
[ https://issues.apache.org/jira/browse/MESOS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589058#comment-14589058 ] Greg Mann commented on MESOS-1648: -- After a bit of digging, my plan of action is the following: -Create a new test that launches a Master Slave with --pidfile set, checking for appropriate PID files after launch, and checking that they are removed after shutdown -Add --pidfile flags to master/slave -Add code in Master/Slave::initialize() and Master/Slave::finalize() to create and remove the PID file. It looks like os::write() and os::rm() would be suitable to accomplish this. If all this looks appropriate, I'll get started. mesos-slave and mesos-master should have a --pidfile option --- Key: MESOS-1648 URL: https://issues.apache.org/jira/browse/MESOS-1648 Project: Mesos Issue Type: Improvement Components: master, slave Reporter: Tobias Weingartner Assignee: Greg Mann Labels: newbie, twitter Right now we use a number of wrapper scripts to try and keep up a {{/var/run/mesos/mesos-slave.pid}} in order to be able to monitor the process. It would be nice if this extra (somewhat fragile) wrapper was not necessary. Having a {{--pidfile}} option would eliminate some of this pain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-1306) Support Framework API Rate Limiting on Master
[ https://issues.apache.org/jira/browse/MESOS-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone resolved MESOS-1306. --- Resolution: Fixed Fix Version/s: 0.20.0 Resolving this as most of the work is done. Informing frameworks will be done as part of http api. Support Framework API Rate Limiting on Master - Key: MESOS-1306 URL: https://issues.apache.org/jira/browse/MESOS-1306 Project: Mesos Issue Type: Epic Components: master Reporter: Yan Xu Assignee: Yan Xu Fix For: 0.20.0 h2. Motivation In a multi-framework environment where frameworks have different SLA requirements (production vs. development, service vs. batch), it'll be nice if Mesos can protect the high-SLA frameworks' throughput by limiting other frameworks' QPS. h2. Requirements - The rate limit configuration should survive Master failover. - Should support online tuning of rate limits. - Should provide a way for operators to monitor the framework QPS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2246) Improve slave health-checking
[ https://issues.apache.org/jira/browse/MESOS-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone resolved MESOS-2246. --- Resolution: Fixed Fix Version/s: 0.22.0 Assignee: Vinod Kone Improve slave health-checking - Key: MESOS-2246 URL: https://issues.apache.org/jira/browse/MESOS-2246 Project: Mesos Issue Type: Epic Components: master, slave Reporter: Dominic Hamon Assignee: Vinod Kone Fix For: 0.22.0 In the event of a network partition, or other systemic issues, we may see widespread slave removal. There are several approaches we can take to mitigate this issue including, but not limited to: . rate limit the slave removal . change how we do health checking to not rely on a single point of view . work with frameworks to determine SLA of running services before removing the slave . manual control to allow operator intervention -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2705) Add correct format template declarations to the styleguide
[ https://issues.apache.org/jira/browse/MESOS-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587667#comment-14587667 ] Joerg Schad commented on MESOS-2705: +1 for the question (see my comment in the review) Add correct format template declarations to the styleguide -- Key: MESOS-2705 URL: https://issues.apache.org/jira/browse/MESOS-2705 Project: Mesos Issue Type: Documentation Reporter: Alexander Rojas Assignee: Alexander Rojas The general rule to format templates is to declare them as: {code} template typename T // notice the space between template and class Foo { … }; {code} However, the style is not documented anywhere nor it is inherited from the Google style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2545) Developer guide for libprocess
[ https://issues.apache.org/jira/browse/MESOS-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587723#comment-14587723 ] Joerg Schad commented on MESOS-2545: https://reviews.apache.org/r/34936/ https://reviews.apache.org/r/35363/ Developer guide for libprocess -- Key: MESOS-2545 URL: https://issues.apache.org/jira/browse/MESOS-2545 Project: Mesos Issue Type: Documentation Components: libprocess Reporter: Bernd Mathiske Assignee: Joerg Schad Labels: documentation Create a developer guide for libprocess that explains the philosophy behind it and explains the most important features as well as the prevalent use patterns in Mesos with examples. This could be similar to stout/README.md. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2545) Developer guide for libprocess
[ https://issues.apache.org/jira/browse/MESOS-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2545: -- Description: Create a developer guide for libprocess that explains the philosophy behind it and explains the most important features as well as the prevalent use patterns in Mesos with examples. This could be similar to stout/README.md. was: Create a user guide for libprocess that explains the philosophy behind it and explains the most important features as well as the prevalent use patterns in Mesos with examples. This could be similar to stout/README.md. Developer guide for libprocess -- Key: MESOS-2545 URL: https://issues.apache.org/jira/browse/MESOS-2545 Project: Mesos Issue Type: Documentation Components: libprocess Reporter: Bernd Mathiske Assignee: Joerg Schad Labels: documentation Create a developer guide for libprocess that explains the philosophy behind it and explains the most important features as well as the prevalent use patterns in Mesos with examples. This could be similar to stout/README.md. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MESOS-2501) Doxygen style for libprocess
[ https://issues.apache.org/jira/browse/MESOS-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske reopened MESOS-2501: --- The file exists and it contains the most critical content to make use of it, but editing it is not done yet. Doxygen style for libprocess Key: MESOS-2501 URL: https://issues.apache.org/jira/browse/MESOS-2501 Project: Mesos Issue Type: Documentation Components: libprocess Reporter: Bernd Mathiske Assignee: Joerg Schad Original Estimate: 7m Remaining Estimate: 7m Create a description of the Doxygen style to use for libprocess documentation. It is expected that this will later also become the Doxygen style for stout and Mesos, but we are working on libprocess only for now. Possible outcome: a file named docs/doxygen-style.md We hope for much input and expect a lot of discussion! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2705) Add correct format template declarations to the styleguide
[ https://issues.apache.org/jira/browse/MESOS-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587662#comment-14587662 ] Alexander Rojas commented on MESOS-2705: The big question here is, why do we do it like this? Most code omits this space. Add correct format template declarations to the styleguide -- Key: MESOS-2705 URL: https://issues.apache.org/jira/browse/MESOS-2705 Project: Mesos Issue Type: Documentation Reporter: Alexander Rojas Assignee: Alexander Rojas The general rule to format templates is to declare them as: {code} template typename T // notice the space between template and class Foo { … }; {code} However, the style is not documented anywhere nor it is inherited from the Google style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2545) Developer guide for libprocess
[ https://issues.apache.org/jira/browse/MESOS-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2545: -- Summary: Developer guide for libprocess (was: User guide for libprocess) Developer guide for libprocess -- Key: MESOS-2545 URL: https://issues.apache.org/jira/browse/MESOS-2545 Project: Mesos Issue Type: Documentation Components: libprocess Reporter: Bernd Mathiske Assignee: Joerg Schad Labels: documentation Create a user guide for libprocess that explains the philosophy behind it and explains the most important features as well as the prevalent use patterns in Mesos with examples. This could be similar to stout/README.md. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2819) Doxygen generation is not integrated in make process.
[ https://issues.apache.org/jira/browse/MESOS-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587842#comment-14587842 ] Joerg Schad commented on MESOS-2819: Please also update the Building Doxygen section in the doxygen style guide. Doxygen generation is not integrated in make process. - Key: MESOS-2819 URL: https://issues.apache.org/jira/browse/MESOS-2819 Project: Mesos Issue Type: Documentation Reporter: Joerg Schad It would be nice to integrate the doxygen generation in the make process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2873) style hook prevent's valid markdown files from getting committed
Alexander Rojas created MESOS-2873: -- Summary: style hook prevent's valid markdown files from getting committed Key: MESOS-2873 URL: https://issues.apache.org/jira/browse/MESOS-2873 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Priority: Trivial According to the original [markdown specification|http://daringfireball.net/projects/markdown/syntax#p] and to the most [recent standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two spaces at the end of a line create a hard line break (it breaks the line without starting a new paragraph), similar to the html code {{br/}}. However, there's a hook in mesos which prevent files with trailing whitespace to be committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2705) Add correct format template declarations to the styleguide
[ https://issues.apache.org/jira/browse/MESOS-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587888#comment-14587888 ] Michael Park commented on MESOS-2705: - Looked into this a little bit, and I'm shocked at how inconsistent this style is at least in the places I looked. The [latest draft of the standard|http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4431.pdf] has 1545 occurrences of {{template}} and 2252 occurrences of {{template }}. The LLVM codebase has 785 occurrences of {{template}} and 2573 occurrences of {{template }}. The Clang codebase has 9120 occurrences of {{template}} and 4541 occurrences of {{template }}. In Mesos, we've got 19 occurrences of {{template}} and 738 occurrences of {{template }}. So I'm not sure that most code omits this space is true, it seems to me like most codebases don't actually care. Having said that, I think arguments can be made in favor of either side. I would say let's go with what {{clang-format}} does, which is {{template }}. Add correct format template declarations to the styleguide -- Key: MESOS-2705 URL: https://issues.apache.org/jira/browse/MESOS-2705 Project: Mesos Issue Type: Documentation Reporter: Alexander Rojas Assignee: Alexander Rojas The general rule to format templates is to declare them as: {code} template typename T // notice the space between template and class Foo { … }; {code} However, the style is not documented anywhere nor it is inherited from the Google style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2873) style hook prevent's valid markdown files from getting committed
[ https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587896#comment-14587896 ] Alexander Rojas commented on MESOS-2873: [r/35506/|https://reviews.apache.org/r/35506/] - Excludes md files from style hook. style hook prevent's valid markdown files from getting committed Key: MESOS-2873 URL: https://issues.apache.org/jira/browse/MESOS-2873 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Priority: Trivial According to the original [markdown specification|http://daringfireball.net/projects/markdown/syntax#p] and to the most [recent standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two spaces at the end of a line create a hard line break (it breaks the line without starting a new paragraph), similar to the html code {{br/}}. However, there's a hook in mesos which prevent files with trailing whitespace to be committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)