[jira] [Created] (MESOS-2027) make distcheck on OSX 10.10
Till Toenshoff created MESOS-2027: - Summary: make distcheck on OSX 10.10 Key: MESOS-2027 URL: https://issues.apache.org/jira/browse/MESOS-2027 Project: Mesos Issue Type: Bug Components: build Environment: OSX 10.10 Reporter: Till Toenshoff Assignee: Till Toenshoff Priority: Minor It seems our ZooKeeper Yosemite hotfix does not correctly get applied when doing a make distcheck on OSX 10.10. {noformat} config.status: executing depfiles commands /Applications/Xcode.app/Contents/Developer/usr/bin/make all-am if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c; \ then mv -f .deps/zookeeper.Tpo .deps/zookeeper.Plo; else rm -f .deps/zookeeper.Tpo; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fno-common -DPIC -o zookeeper.o In file included from src/zookeeper.c:27: In file included from ./include/zookeeper.h:34: ./include/recordio.h:76:9: error: expected ')' int64_t htonll(int64_t v); ^ /usr/include/sys/_endian.h:141:25: note: expanded from macro 'htonll' #define htonll(x) __DARWIN_OSSwapInt64(x) ^ /usr/include/libkern/_OSByteOrder.h:78:30: note: expanded from macro '__DARWIN_OSSwapInt64' (__builtin_constant_p(x) ? __DARWIN_OSSwapConstInt64(x) : _OSSwapInt64(x)) ^ ./include/recordio.h:76:9: note: to match this '(' /usr/include/sys/_endian.h:141:25: note: expanded from macro 'htonll' #define htonll(x) __DARWIN_OSSwapInt64(x) ^ /usr/include/libkern/_OSByteOrder.h:78:5: note: expanded from macro '__DARWIN_OSSwapInt64' (__builtin_constant_p(x) ? __DARWIN_OSSwapConstInt64(x) : _OSSwapInt64(x)) ^ In file included from src/zookeeper.c:27: In file included from ./include/zookeeper.h:34: {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool
[ https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194647#comment-14194647 ] Bernd Mathiske commented on MESOS-1316: --- New patch for this issue: https://reviews.apache.org/r/27516/. This one is rebased to latest master and has a couple of minor edits. It replaces and is based on https://reviews.apache.org/r/21233 from [~benjaminhindman]. Implement decent unit test coverage for the mesos-fetcher tool -- Key: MESOS-1316 URL: https://issues.apache.org/jira/browse/MESOS-1316 Project: Mesos Issue Type: Improvement Components: technical debt, test Reporter: Tom Arnfeld Assignee: Bernd Mathiske There are current no tests that cover the {{mesos-fetcher}} tool itself, and hence bugs like MESOS-1313 have accidentally slipped though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2028) containerizer_pb2.py is empty
Thomas Rampelberg created MESOS-2028: Summary: containerizer_pb2.py is empty Key: MESOS-2028 URL: https://issues.apache.org/jira/browse/MESOS-2028 Project: Mesos Issue Type: Bug Components: python api Affects Versions: 0.20.1, 0.20.0, 0.21.0 Reporter: Thomas Rampelberg Priority: Minor The sed command to replace mesos.mesos_pb2 with mesos_pb2 is making containerizer_pb2.py blank. This has resulted in `mesos.interface` not being usable for containerizer work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2029) Allow slave to checkpoint resources.
Jie Yu created MESOS-2029: - Summary: Allow slave to checkpoint resources. Key: MESOS-2029 URL: https://issues.apache.org/jira/browse/MESOS-2029 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu The checkpointed resources are independent of the slave lifecycle. In other words, even if the slave host reboots, it'll still recover the checkpointed resources (unlike other checkpointed data). The slave needs to verify during startup that the checkpointed resources are compatible with the resources of the slave (specified using --resources flag). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1805) change const pass-by-value to const reference in stout
[ https://issues.apache.org/jira/browse/MESOS-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-1805: -- Target Version/s: 0.22.0 (was: 0.21.0) change const pass-by-value to const reference in stout -- Key: MESOS-1805 URL: https://issues.apache.org/jira/browse/MESOS-1805 Project: Mesos Issue Type: Improvement Components: stout Affects Versions: 0.20.0 Reporter: Kamil Domański Assignee: Kamil Domański Priority: Trivial Labels: easyfix, patch, performance {{os::shell}} and an overload of {{strings::internal::fmt}} in stout pass a {{const std::string}} parameter instead of {{const std::string}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1873) Don't pass task-related arguments to mesos-executor
[ https://issues.apache.org/jira/browse/MESOS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-1873: -- Target Version/s: 0.22.0 (was: 0.21.0) Don't pass task-related arguments to mesos-executor --- Key: MESOS-1873 URL: https://issues.apache.org/jira/browse/MESOS-1873 Project: Mesos Issue Type: Bug Components: slave Affects Versions: 0.20.1 Environment: Linux 3.13.0-35-generic x86_64 Ubuntu-Precise Reporter: R.B. Boyer *TL;DR:* When a command executor is used with {{shell=false}} and an array of arguments, those same arguments are directly passed to {{mesos-executor}} which fails miserably. --- Attempting to launch a task using the command executor with {{shell=false}} and passing arguments fails strangely. {noformat:title=CommandInfo proto} command { value: /my_program user: app shell: false arguments: my_program arguments: --start arguments: 2014-10-06 arguments: --end arguments: 2014-10-07 } {noformat} Dies with: {noformat:title=stderr} Failed to load unknown flag 'end' Usage: my_program [...] Supported options: --[no-]help Prints this help message (default: false) --[no-]override Whether or not to override the command the executor should run when the task is launched. Only this flag is expected to be on the command line and all arguments after the flag will be used as the subsequent 'argv' to be used with 'execvp' (default: false) {noformat} This is coming from a failed attempt to have the slave launch {{mesos-executor}}. This is due to an adverse interaction between new {{CommandInfo}} features and this blurb from {{src/slave/slave.cpp}}: {code} // Copy the CommandInfo to get the URIs and environment, but // update it to invoke 'mesos-executor' (unless we couldn't // resolve 'mesos-executor' via 'realpath', in which case just // echo the error and exit). executor.mutable_command()-MergeFrom(task.command()); Resultstring path = os::realpath( path::join(flags.launcher_dir, mesos-executor)); if (path.isSome()) { executor.mutable_command()-set_value(path.get()); } else { executor.mutable_command()-set_value( echo ' + (path.isError() ? path.error() : No such file or directory) + '; exit 1); } {code} This is failing to: * clear the {{arguments}} field * probably explicitly restore {{shell=true}} * clear {{container}} ? * clear {{user}} ? I was able to quickly fix this locally by making a man-in-the-middle program at {{/usr/local/libexec/mesos/mesos-executor}} that stripped all args before exec-ing the real {{mesos-executor}} binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1856) Support specifying libnl3 install location.
[ https://issues.apache.org/jira/browse/MESOS-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-1856: -- Target Version/s: 0.22.0 (was: 0.21.0) Support specifying libnl3 install location. --- Key: MESOS-1856 URL: https://issues.apache.org/jira/browse/MESOS-1856 Project: Mesos Issue Type: Task Reporter: Jie Yu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1971) Switch cgroups_limit_swap default to true
[ https://issues.apache.org/jira/browse/MESOS-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-1971: -- Target Version/s: 0.22.0 (was: 0.21.0) Switch cgroups_limit_swap default to true - Key: MESOS-1971 URL: https://issues.apache.org/jira/browse/MESOS-1971 Project: Mesos Issue Type: Improvement Reporter: Anton Lindström Assignee: Anton Lindström Priority: Trivial Switch cgroups_limit_swap to true per default, see MESOS-1662 for more information. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2030) Support persistent disk resource in master.
Jie Yu created MESOS-2030: - Summary: Support persistent disk resource in master. Key: MESOS-2030 URL: https://issues.apache.org/jira/browse/MESOS-2030 Project: Mesos Issue Type: Task Reporter: Jie Yu We need to do the following in master in order to support persistent disk resource: 1) Add an API allowing the framework to release a persistent disk resource. 2) Maintain an in-memory data structure to track persistent disk resources on each slave. Update this data structure when slaves register/re-register/disconnect, etc. 3) Relay releasing of persistent disk resource to the corresponding slave according to the data structure maintained in 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2030) Support persistent disk resource in master.
[ https://issues.apache.org/jira/browse/MESOS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-2030: - Assignee: Jie Yu Support persistent disk resource in master. --- Key: MESOS-2030 URL: https://issues.apache.org/jira/browse/MESOS-2030 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu We need to do the following in master in order to support persistent disk resource: 1) Add an API allowing the framework to release a persistent disk resource. 2) Maintain an in-memory data structure to track persistent disk resources on each slave. Update this data structure when slaves register/re-register/disconnect, etc. 3) Relay releasing of persistent disk resource to the corresponding slave according to the data structure maintained in 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2031) Manage persistent directories on slave.
[ https://issues.apache.org/jira/browse/MESOS-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-2031: - Assignee: Jie Yu Manage persistent directories on slave. --- Key: MESOS-2031 URL: https://issues.apache.org/jira/browse/MESOS-2031 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu Whenever a slave sees a persistent disk resource (in ExecutorInfo or TaskInfo) that is new to it, it will create a persistent directory which is for tasks to store persistent data. The slave needs to do the following after it's created: 1) symlink into the executor sandbox so that tasks/executor can see it 2) garbage collect it once it is released by the framework -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1873) Don't pass task-related arguments to mesos-executor
[ https://issues.apache.org/jira/browse/MESOS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-1873: -- Fix Version/s: 0.21.0 Don't pass task-related arguments to mesos-executor --- Key: MESOS-1873 URL: https://issues.apache.org/jira/browse/MESOS-1873 Project: Mesos Issue Type: Bug Components: slave Affects Versions: 0.20.1 Environment: Linux 3.13.0-35-generic x86_64 Ubuntu-Precise Reporter: R.B. Boyer Fix For: 0.21.0 *TL;DR:* When a command executor is used with {{shell=false}} and an array of arguments, those same arguments are directly passed to {{mesos-executor}} which fails miserably. --- Attempting to launch a task using the command executor with {{shell=false}} and passing arguments fails strangely. {noformat:title=CommandInfo proto} command { value: /my_program user: app shell: false arguments: my_program arguments: --start arguments: 2014-10-06 arguments: --end arguments: 2014-10-07 } {noformat} Dies with: {noformat:title=stderr} Failed to load unknown flag 'end' Usage: my_program [...] Supported options: --[no-]help Prints this help message (default: false) --[no-]override Whether or not to override the command the executor should run when the task is launched. Only this flag is expected to be on the command line and all arguments after the flag will be used as the subsequent 'argv' to be used with 'execvp' (default: false) {noformat} This is coming from a failed attempt to have the slave launch {{mesos-executor}}. This is due to an adverse interaction between new {{CommandInfo}} features and this blurb from {{src/slave/slave.cpp}}: {code} // Copy the CommandInfo to get the URIs and environment, but // update it to invoke 'mesos-executor' (unless we couldn't // resolve 'mesos-executor' via 'realpath', in which case just // echo the error and exit). executor.mutable_command()-MergeFrom(task.command()); Resultstring path = os::realpath( path::join(flags.launcher_dir, mesos-executor)); if (path.isSome()) { executor.mutable_command()-set_value(path.get()); } else { executor.mutable_command()-set_value( echo ' + (path.isError() ? path.error() : No such file or directory) + '; exit 1); } {code} This is failing to: * clear the {{arguments}} field * probably explicitly restore {{shell=true}} * clear {{container}} ? * clear {{user}} ? I was able to quickly fix this locally by making a man-in-the-middle program at {{/usr/local/libexec/mesos/mesos-executor}} that stripped all args before exec-ing the real {{mesos-executor}} binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1873) Don't pass task-related arguments to mesos-executor
[ https://issues.apache.org/jira/browse/MESOS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-1873: -- Target Version/s: (was: 0.22.0) Don't pass task-related arguments to mesos-executor --- Key: MESOS-1873 URL: https://issues.apache.org/jira/browse/MESOS-1873 Project: Mesos Issue Type: Bug Components: slave Affects Versions: 0.20.1 Environment: Linux 3.13.0-35-generic x86_64 Ubuntu-Precise Reporter: R.B. Boyer Fix For: 0.21.0 *TL;DR:* When a command executor is used with {{shell=false}} and an array of arguments, those same arguments are directly passed to {{mesos-executor}} which fails miserably. --- Attempting to launch a task using the command executor with {{shell=false}} and passing arguments fails strangely. {noformat:title=CommandInfo proto} command { value: /my_program user: app shell: false arguments: my_program arguments: --start arguments: 2014-10-06 arguments: --end arguments: 2014-10-07 } {noformat} Dies with: {noformat:title=stderr} Failed to load unknown flag 'end' Usage: my_program [...] Supported options: --[no-]help Prints this help message (default: false) --[no-]override Whether or not to override the command the executor should run when the task is launched. Only this flag is expected to be on the command line and all arguments after the flag will be used as the subsequent 'argv' to be used with 'execvp' (default: false) {noformat} This is coming from a failed attempt to have the slave launch {{mesos-executor}}. This is due to an adverse interaction between new {{CommandInfo}} features and this blurb from {{src/slave/slave.cpp}}: {code} // Copy the CommandInfo to get the URIs and environment, but // update it to invoke 'mesos-executor' (unless we couldn't // resolve 'mesos-executor' via 'realpath', in which case just // echo the error and exit). executor.mutable_command()-MergeFrom(task.command()); Resultstring path = os::realpath( path::join(flags.launcher_dir, mesos-executor)); if (path.isSome()) { executor.mutable_command()-set_value(path.get()); } else { executor.mutable_command()-set_value( echo ' + (path.isError() ? path.error() : No such file or directory) + '; exit 1); } {code} This is failing to: * clear the {{arguments}} field * probably explicitly restore {{shell=true}} * clear {{container}} ? * clear {{user}} ? I was able to quickly fix this locally by making a man-in-the-middle program at {{/usr/local/libexec/mesos/mesos-executor}} that stripped all args before exec-ing the real {{mesos-executor}} binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1143) Add a TASK_ERROR task status.
[ https://issues.apache.org/jira/browse/MESOS-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1143: - Sprint: Twitter Mesos Q4 Sprint 3 Add a TASK_ERROR task status. - Key: MESOS-1143 URL: https://issues.apache.org/jira/browse/MESOS-1143 Project: Mesos Issue Type: Improvement Components: framework, master Reporter: Benjamin Hindman Assignee: Dominic Hamon During task validation we drop tasks that have errors and send TASK_LOST status updates. In most circumstances a framework will want to relaunch a task that has gone lost, and in the event the task is actually malformed (thus invalid) this will result in an infinite loop of sending a task and having it go lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1830) Expose master stats differentiating between master-generated and slave-generated LOST tasks
[ https://issues.apache.org/jira/browse/MESOS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1830: - Sprint: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3 (was: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2) Expose master stats differentiating between master-generated and slave-generated LOST tasks --- Key: MESOS-1830 URL: https://issues.apache.org/jira/browse/MESOS-1830 Project: Mesos Issue Type: Story Components: master Reporter: Bill Farner Assignee: Dominic Hamon Priority: Minor The master exports a monotonically-increasing counter of tasks transitioned to TASK_LOST. This loses fidelity of the source of the lost task. A first step in exposing the source of lost tasks might be to just differentiate between TASK_LOST transitions initiated by the master vs the slave (and maybe bad input from the scheduler). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1941) Make executor's user owner of executor's cgroup directory
[ https://issues.apache.org/jira/browse/MESOS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1941: - Sprint: Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3 (was: Twitter Mesos Q4 Sprint 2) Make executor's user owner of executor's cgroup directory - Key: MESOS-1941 URL: https://issues.apache.org/jira/browse/MESOS-1941 Project: Mesos Issue Type: Improvement Components: isolation, slave Reporter: Mohit Soni Assignee: Ian Downes Priority: Minor Currently, when cgroups are enabled, and executor is spawned, it's mounted under, for ex: /sys/fs/cgroup/cpu/mesos/mesos-id. This directory in current implementation is only writable by root user. This prevents process launched by executor to mount its child processes under this cgroup, because the cgroup directory is only writable by root. To enable a executor spawned process to mount it's child processes under it's cgroup directory, the cgroup directory should be made writable by the user which spawns the executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1456) Metric lifetime should be tied to process runstate, not lifetime.
[ https://issues.apache.org/jira/browse/MESOS-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1456: - Sprint: Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3 (was: Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2) Metric lifetime should be tied to process runstate, not lifetime. - Key: MESOS-1456 URL: https://issues.apache.org/jira/browse/MESOS-1456 Project: Mesos Issue Type: Bug Components: statistics Affects Versions: 0.19.0 Reporter: Dominic Hamon Assignee: Dominic Hamon The usual pattern for termination of processes is {{terminate(..); wait(..); delete ..;}} but the {{SchedulerProcess}} is terminated and then deleted some time later. If the metrics endpoint is accessed within that period, it never returns as it tries to access a {{Gauge}} that has a reference to a valid PID that is not getting any timeslices (the {{SchedulerProcess}}). A one-off fix can be made to the {{SchedulerProcess}} to move the metrics add/remove calls to {{initialize}} and {{finalize}}, but this should be the general pattern for every process with metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1903) Add backoff to framework re-registration retries
[ https://issues.apache.org/jira/browse/MESOS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1903: - Sprint: Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3 (was: Twitter Mesos Q4 Sprint 2) Add backoff to framework re-registration retries Key: MESOS-1903 URL: https://issues.apache.org/jira/browse/MESOS-1903 Project: Mesos Issue Type: Task Reporter: Dominic Hamon Assignee: Vinod Kone To avoid so many duplicate framework re-registration attempts (and thus offer rescinds) we should add backoff to re-registration retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1974) Refactor the C++ 'Resources' abstraction.
[ https://issues.apache.org/jira/browse/MESOS-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1974: - Sprint: Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3 (was: Twitter Mesos Q4 Sprint 2) Refactor the C++ 'Resources' abstraction. - Key: MESOS-1974 URL: https://issues.apache.org/jira/browse/MESOS-1974 Project: Mesos Issue Type: Improvement Reporter: Jie Yu Assignee: Jie Yu The existing C++ 'Resources' interfaces are poorly designed. Some of them are confusing and unintuitive. Some of them are overloaded with too many functionalities. For instance, {noformat} bool operator = (const Resource left, const Resource right); {noformat} This interface in non-intuitive because A = B doesn't imply !(B = A). {noformat} Resource operator + (const Resource left, const Resource right); {noformat} This one is also non-intuitive because if 'left' is not compatible with 'right', the result is 'left' (why not right???). Similar for operator '-'. {noformat} OptionResource Resources::get(const Resource r) const; {noformat} This one assume Resources is flattened, but it might not be. As we start to introduce persistent disk resources (MESOS-1554), things will get more complicated. For example, one may want to get two types of 'disk()' functions: one returns the ephemeral disk bytes (with no disk info), one returns the total disk bytes (including ones that have disk info). We may wanna introduce a concept about Resource that indicates that a resource cannot be merged or split (e.g., atomic?). Since we need to change this class anyway. I wanna take this chance to refactor it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1807: - Sprint: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3 (was: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2) Disallow executors with cpu only or memory only resources - Key: MESOS-1807 URL: https://issues.apache.org/jira/browse/MESOS-1807 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone Labels: newbie Currently master allows executors to be launched with either only cpus or only memory but we shouldn't allow that. This is because executor is an actual unix process that is launched by the slave. If an executor doesn't specify cpus, what should do the cpu limits be for that executor when there are no tasks running on it? If no cpu limits are set then it might starve other executors/tasks on the slave violating isolation guarantees. Same goes with memory. Moreover, the current containerizer/isolator code will throw failures when using such an executor, e.g., when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1751) Request for stats.json cannot be fulfilled after stopping the framework
[ https://issues.apache.org/jira/browse/MESOS-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1751: - Sprint: Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3 (was: Mesos Q3 Sprint 6, Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2) Request for stats.json cannot be fulfilled after stopping the framework -- Key: MESOS-1751 URL: https://issues.apache.org/jira/browse/MESOS-1751 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Environment: Test case launched on Mac OS X Mavericks. Reporter: Alexander Rukletsov Assignee: Dominic Hamon Priority: Minor Request for stats.json to master from a test case doesn't work after calling frameworks' {{driver.stop()}}. However, it works for state.json. I think the problem is related to {{stats()}} continuation {{_stats()}}. The following test illustrates the issue: {code:title=TestCase.cpp|borderStyle=solid} TEST_F(MasterTest, RequestAfterDriverStop) { TryPIDMaster master = StartMaster(); ASSERT_SOME(master); TryPIDSlave slave = StartSlave(); ASSERT_SOME(slave); MockScheduler sched; MesosSchedulerDriver driver( sched, DEFAULT_FRAMEWORK_INFO, master.get(), DEFAULT_CREDENTIAL); driver.start(); Futureprocess::http::Response response_before = process::http::get(master.get(), stats.json); AWAIT_READY(response_before); driver.stop(); Futureprocess::http::Response response_after = process::http::get(master.get(), stats.json); AWAIT_READY(response_after); driver.join(); Shutdown(); // Must shutdown before 'containerizer' gets deallocated. } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1930) Expose TASK_KILLED reason.
[ https://issues.apache.org/jira/browse/MESOS-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1930: - Sprint: Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3 (was: Twitter Mesos Q4 Sprint 2) Expose TASK_KILLED reason. -- Key: MESOS-1930 URL: https://issues.apache.org/jira/browse/MESOS-1930 Project: Mesos Issue Type: Story Reporter: Alexander Rukletsov Assignee: Dominic Hamon Priority: Minor A task process may be killed by a SIGTERM or SIGKILL. The only possibility to check how the task process has exited is to examine the message: {{status.message().find(Terminated)}}. However, a task may not run in its own process, hence the executor may not be able to provide an exit status. What we actually want is an artificial task exit status that is rendered by the executor. This may be resolved by adding second tier states or state explanations. Here is a link to a discussion: https://reviews.apache.org/r/26382/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-487) Balloon framework fails to run due to bad flags
[ https://issues.apache.org/jira/browse/MESOS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-487: Sprint: Twitter Mesos Q4 Sprint 3 Balloon framework fails to run due to bad flags --- Key: MESOS-487 URL: https://issues.apache.org/jira/browse/MESOS-487 Project: Mesos Issue Type: Bug Reporter: Vinod Kone Assignee: Vinod Kone Labels: twitter I suspect this has to do with the latest flags refactor. [vinod@smfd-bkq-03-sr4 build]$ sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter=*Balloon* --verbose WARNING: Logging before InitGoogleLogging() is written to STDERR I0529 22:28:13.094351 31506 process.cpp:1426] libprocess is initialized on 10.37.184.103:53425 for 24 cpus I0529 22:28:13.095010 31506 logging.cpp:91] Logging to STDERR Source directory: /home/vinod/mesos Build directory: /home/vinod/mesos/build - We cannot run any cgroups tests that require mounting hierarchies because you have the following hierarchies mounted: /cgroup We'll disable the CgroupsNoHierarchyTest test fixture for now. - Note: Google Test filter = *Balloon*-CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy: [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from CgroupsIsolatorTest [ RUN ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework Using temporary directory '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_pWWdE1' Launched master at 31574 Failed to load unknown flag 'build_dir' Usage: lt-mesos-master [...] Supported options: --allocation_interval=VALUE Amount of time to wait between performing (batch) allocations (e.g., 500ms, 1sec, etc) (default: 1secs) --cluster=VALUE Human readable name for the cluster, displayed in the webui --framework_sorter=VALUEPolicy to use for allocating resources between a given user's frameworks. Options are the same as for user_allocator (default: drf) --[no-]help Prints this help message (default: false) --ip=VALUE IP address to listen on --log_dir=VALUE Location to put log files (no default, nothing is written to disk unless specified; does not affect logging to stderr) --logbufsecs=VALUE How many seconds to buffer log messages for (default: 0) --port=VALUEPort to listen on (default: 5050) --[no-]quietDisable logging to stderr (default: false) --[no-]root_submissions Can root submit frameworks? (default: true) --slaves=VALUE Initial slaves that should be considered part of this cluster (or if using ZooKeeper a URL) (default: *) --user_sorter=VALUE Policy to use for allocating resources between users. May be one of: dominant_resource_fairness (drf) (default: drf) --webui_dir=VALUE Location of the webui files/assets (default: /usr/local/share/mesos/webui) --whitelist=VALUE Path to a file with a list of slaves (one per line) to advertise offers for; should be of the form: file://path/to/file (default: *) --zk=VALUE ZooKeeper URL (used for leader election amongst masters) May be one of: zk://host1:port1,host2:port2,.../path zk://username:password@host1:port1,host2:port2,.../path file://path/to/file (where file contains one of the above) (default: ) {RED}Master crashed; failing test /home/vinod/mesos/src/tests/balloon_framework_test.sh: line 31: kill: (31574) - No such process ../../src/tests/script.cpp:76: Failure Failed balloon_framework_test.sh exited with status 2 [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework (2031 ms) [--] 1 test from CgroupsIsolatorTest (2031 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (2031 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework 1 FAILED TEST -- This message was sent by
[jira] [Updated] (MESOS-723) Expose total number of resources allocated to the slave in its endpoint
[ https://issues.apache.org/jira/browse/MESOS-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-723: Sprint: Twitter Mesos Q4 Sprint 3 Expose total number of resources allocated to the slave in its endpoint --- Key: MESOS-723 URL: https://issues.apache.org/jira/browse/MESOS-723 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Labels: twitter This could be useful information if there are bugs in master/slave that causes slaves to overcommit its resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1718) Command executor can overcommit the slave.
[ https://issues.apache.org/jira/browse/MESOS-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1718: - Sprint: Twitter Mesos Q4 Sprint 3 Command executor can overcommit the slave. -- Key: MESOS-1718 URL: https://issues.apache.org/jira/browse/MESOS-1718 Project: Mesos Issue Type: Bug Components: slave Reporter: Benjamin Mahler Assignee: Ian Downes Currently we give a small amount of resources to the command executor, in addition to resources used by the command task: https://github.com/apache/mesos/blob/0.20.0-rc1/src/slave/slave.cpp#L2448 {code: title=} ExecutorInfo Slave::getExecutorInfo( const FrameworkID frameworkId, const TaskInfo task) { ... // Add an allowance for the command executor. This does lead to a // small overcommit of resources. executor.mutable_resources()-MergeFrom( Resources::parse( cpus: + stringify(DEFAULT_EXECUTOR_CPUS) + ; + mem: + stringify(DEFAULT_EXECUTOR_MEM.megabytes())).get()); ... } {code} This leads to an overcommit of the slave. Ideally, for command tasks we can transfer all of the task resources to the executor at the slave / isolation level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2008) MasterAuthorizationTest.DuplicateReregistration is flaky
[ https://issues.apache.org/jira/browse/MESOS-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-2008: - Sprint: Twitter Mesos Q4 Sprint 3 MasterAuthorizationTest.DuplicateReregistration is flaky Key: MESOS-2008 URL: https://issues.apache.org/jira/browse/MESOS-2008 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Environment: https://builds.apache.org/computer/ubuntu-4/ Reporter: Yan Xu Assignee: Vinod Kone {noformat:title=} [ RUN ] MasterAuthorizationTest.DuplicateReregistration Using temporary directory '/tmp/MasterAuthorizationTest_DuplicateReregistration_DLOmYX' I1029 08:25:26.021766 32232 leveldb.cpp:176] Opened db in 3.066621ms I1029 08:25:26.022734 32232 leveldb.cpp:183] Compacted db in 935019ns I1029 08:25:26.022766 32232 leveldb.cpp:198] Created db iterator in 4350ns I1029 08:25:26.022785 32232 leveldb.cpp:204] Seeked to beginning of db in 902ns I1029 08:25:26.022799 32232 leveldb.cpp:273] Iterated through 0 keys in the db in 387ns I1029 08:25:26.022831 32232 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I1029 08:25:26.023305 32248 recover.cpp:437] Starting replica recovery I1029 08:25:26.023598 32248 recover.cpp:463] Replica is in EMPTY status I1029 08:25:26.025059 32260 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1029 08:25:26.025320 32247 recover.cpp:188] Received a recover response from a replica in EMPTY status I1029 08:25:26.025585 32256 recover.cpp:554] Updating replica status to STARTING I1029 08:25:26.026546 32249 master.cpp:312] Master 20141029-082526-3142697795-40696-32232 (pomona.apache.org) started on 67.195.81.187:40696 I1029 08:25:26.026561 32261 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 69ns I1029 08:25:26.026592 32249 master.cpp:358] Master only allowing authenticated frameworks to register I1029 08:25:26.026592 32261 replica.cpp:320] Persisted replica status to STARTING I1029 08:25:26.026605 32249 master.cpp:363] Master only allowing authenticated slaves to register I1029 08:25:26.026639 32249 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterAuthorizationTest_DuplicateReregistration_DLOmYX/credentials' I1029 08:25:26.026877 32249 master.cpp:392] Authorization enabled I1029 08:25:26.026901 32260 recover.cpp:463] Replica is in STARTING status I1029 08:25:26.027498 32261 master.cpp:120] No whitelist given. Advertising offers for all slaves I1029 08:25:26.027541 32248 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.187:40696 I1029 08:25:26.028055 32252 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I1029 08:25:26.028451 32247 recover.cpp:188] Received a recover response from a replica in STARTING status I1029 08:25:26.028733 32249 master.cpp:1242] The newly elected leader is master@67.195.81.187:40696 with id 20141029-082526-3142697795-40696-32232 I1029 08:25:26.028764 32249 master.cpp:1255] Elected as the leading master! I1029 08:25:26.028781 32249 master.cpp:1073] Recovering from registrar I1029 08:25:26.028904 32246 recover.cpp:554] Updating replica status to VOTING I1029 08:25:26.029163 32257 registrar.cpp:313] Recovering registrar I1029 08:25:26.029556 32251 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 485711ns I1029 08:25:26.029588 32251 replica.cpp:320] Persisted replica status to VOTING I1029 08:25:26.029726 32253 recover.cpp:568] Successfully joined the Paxos group I1029 08:25:26.029932 32253 recover.cpp:452] Recover process terminated I1029 08:25:26.030436 32250 log.cpp:656] Attempting to start the writer I1029 08:25:26.032152 32248 replica.cpp:474] Replica received implicit promise request with proposal 1 I1029 08:25:26.032778 32248 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 597030ns I1029 08:25:26.032807 32248 replica.cpp:342] Persisted promised to 1 I1029 08:25:26.033481 32254 coordinator.cpp:230] Coordinator attemping to fill missing position I1029 08:25:26.035429 32247 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I1029 08:25:26.036154 32247 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 690208ns I1029 08:25:26.036181 32247 replica.cpp:676] Persisted action at 0 I1029 08:25:26.037344 32249 replica.cpp:508] Replica received write request for position 0 I1029 08:25:26.037395 32249 leveldb.cpp:438] Reading position from leveldb took 22607ns I1029 08:25:26.038074 32249 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 647429ns I1029
[jira] [Updated] (MESOS-2030) Support persistent disk resource in master.
[ https://issues.apache.org/jira/browse/MESOS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-2030: - Sprint: Twitter Mesos Q4 Sprint 3 Support persistent disk resource in master. --- Key: MESOS-2030 URL: https://issues.apache.org/jira/browse/MESOS-2030 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu We need to do the following in master in order to support persistent disk resource: 1) Add an API allowing the framework to release a persistent disk resource. 2) Maintain an in-memory data structure to track persistent disk resources on each slave. Update this data structure when slaves register/re-register/disconnect, etc. 3) Relay releasing of persistent disk resource to the corresponding slave according to the data structure maintained in 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1902) Support persistent disk resource.
[ https://issues.apache.org/jira/browse/MESOS-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1902: - Sprint: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2 (was: Twitter Q4 Sprint 1, Twitter Mesos Q4 Sprint 2, Twitter Mesos Q4 Sprint 3) Support persistent disk resource. - Key: MESOS-1902 URL: https://issues.apache.org/jira/browse/MESOS-1902 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu Mesos needs to provide a way to allow tasks to write persistent data which won’t be garbage collected. For example, a task can write its persistent data to some predefined directory. When this task finishes, the framework can launch a new task which is able to access the persistent data written by the previous task which Mesos would have usually garbage-collected. One way to achieve that is to provide a new type of disk resources which are persistent. We call it persistent disk resource. When a framework launches a task using persistent disk resources, the data the task writes will be persisted. When the framework launches a new task using the same persistent disk resource (after the previous task finishes), the new task will be able to access the data written by the previous task. The persistent disk resource should be able to survive slave reboot or slave info/id change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-487) Balloon framework fails to run due to bad flags
[ https://issues.apache.org/jira/browse/MESOS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-487: Story Points: 1 Balloon framework fails to run due to bad flags --- Key: MESOS-487 URL: https://issues.apache.org/jira/browse/MESOS-487 Project: Mesos Issue Type: Bug Reporter: Vinod Kone Assignee: Vinod Kone Labels: twitter I suspect this has to do with the latest flags refactor. [vinod@smfd-bkq-03-sr4 build]$ sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter=*Balloon* --verbose WARNING: Logging before InitGoogleLogging() is written to STDERR I0529 22:28:13.094351 31506 process.cpp:1426] libprocess is initialized on 10.37.184.103:53425 for 24 cpus I0529 22:28:13.095010 31506 logging.cpp:91] Logging to STDERR Source directory: /home/vinod/mesos Build directory: /home/vinod/mesos/build - We cannot run any cgroups tests that require mounting hierarchies because you have the following hierarchies mounted: /cgroup We'll disable the CgroupsNoHierarchyTest test fixture for now. - Note: Google Test filter = *Balloon*-CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy: [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from CgroupsIsolatorTest [ RUN ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework Using temporary directory '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_pWWdE1' Launched master at 31574 Failed to load unknown flag 'build_dir' Usage: lt-mesos-master [...] Supported options: --allocation_interval=VALUE Amount of time to wait between performing (batch) allocations (e.g., 500ms, 1sec, etc) (default: 1secs) --cluster=VALUE Human readable name for the cluster, displayed in the webui --framework_sorter=VALUEPolicy to use for allocating resources between a given user's frameworks. Options are the same as for user_allocator (default: drf) --[no-]help Prints this help message (default: false) --ip=VALUE IP address to listen on --log_dir=VALUE Location to put log files (no default, nothing is written to disk unless specified; does not affect logging to stderr) --logbufsecs=VALUE How many seconds to buffer log messages for (default: 0) --port=VALUEPort to listen on (default: 5050) --[no-]quietDisable logging to stderr (default: false) --[no-]root_submissions Can root submit frameworks? (default: true) --slaves=VALUE Initial slaves that should be considered part of this cluster (or if using ZooKeeper a URL) (default: *) --user_sorter=VALUE Policy to use for allocating resources between users. May be one of: dominant_resource_fairness (drf) (default: drf) --webui_dir=VALUE Location of the webui files/assets (default: /usr/local/share/mesos/webui) --whitelist=VALUE Path to a file with a list of slaves (one per line) to advertise offers for; should be of the form: file://path/to/file (default: *) --zk=VALUE ZooKeeper URL (used for leader election amongst masters) May be one of: zk://host1:port1,host2:port2,.../path zk://username:password@host1:port1,host2:port2,.../path file://path/to/file (where file contains one of the above) (default: ) {RED}Master crashed; failing test /home/vinod/mesos/src/tests/balloon_framework_test.sh: line 31: kill: (31574) - No such process ../../src/tests/script.cpp:76: Failure Failed balloon_framework_test.sh exited with status 2 [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework (2031 ms) [--] 1 test from CgroupsIsolatorTest (2031 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (2031 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework 1 FAILED TEST -- This message was sent by Atlassian JIRA
[jira] [Updated] (MESOS-1718) Command executor can overcommit the slave.
[ https://issues.apache.org/jira/browse/MESOS-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1718: - Story Points: 3 Command executor can overcommit the slave. -- Key: MESOS-1718 URL: https://issues.apache.org/jira/browse/MESOS-1718 Project: Mesos Issue Type: Bug Components: slave Reporter: Benjamin Mahler Assignee: Ian Downes Currently we give a small amount of resources to the command executor, in addition to resources used by the command task: https://github.com/apache/mesos/blob/0.20.0-rc1/src/slave/slave.cpp#L2448 {code: title=} ExecutorInfo Slave::getExecutorInfo( const FrameworkID frameworkId, const TaskInfo task) { ... // Add an allowance for the command executor. This does lead to a // small overcommit of resources. executor.mutable_resources()-MergeFrom( Resources::parse( cpus: + stringify(DEFAULT_EXECUTOR_CPUS) + ; + mem: + stringify(DEFAULT_EXECUTOR_MEM.megabytes())).get()); ... } {code} This leads to an overcommit of the slave. Ideally, for command tasks we can transfer all of the task resources to the executor at the slave / isolation level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2017) Segfault with Pure virtual method called when tests fail
[ https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-2017: - Story Points: 5 Segfault with Pure virtual method called when tests fail -- Key: MESOS-2017 URL: https://issues.apache.org/jira/browse/MESOS-2017 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Yan Xu Assignee: Yan Xu Labels: twitter The most recent one: {noformat:title=DRFAllocatorTest.DRFAllocatorProcess} [ RUN ] DRFAllocatorTest.DRFAllocatorProcess Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j' I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 2018ns I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the db in 335ns I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from a replica in EMPTY status I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to STARTING I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 591981ns I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to STARTING I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status I1030 05:55:06.940820 24489 master.cpp:312] Master 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 67.195.81.187:40429 I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing authenticated frameworks to register I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing authenticated slaves to register I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for authentication from '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials' I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising offers for all slaves I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.187:40429 I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from a replica in STARTING status I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459 I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master! I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 536365ns I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to VOTING I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos group I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit promise request with proposal 1 I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 806463ns I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1 I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to fill missing position I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 603843ns I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0 I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request for position 0 I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb took 28437ns I1030 05:55:06.952896 24476 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 623980ns I1030 05:55:06.952926 24476 replica.cpp:676]
[jira] [Updated] (MESOS-2032) Update Maintenance design to account for persistent resources.
[ https://issues.apache.org/jira/browse/MESOS-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-2032: - Sprint: Twitter Mesos Q4 Sprint 3 Update Maintenance design to account for persistent resources. -- Key: MESOS-2032 URL: https://issues.apache.org/jira/browse/MESOS-2032 Project: Mesos Issue Type: Task Components: framework, master, slave Reporter: Benjamin Mahler Assignee: Benjamin Mahler With persistent resources and dynamic reservations, frameworks need to know how long the resources will be unavailable for maintenance operations. This is because for persistent resources, the framework needs to understand how long the persistent resource will be unavailable. For example, if there will be a 10 minute reboot for a kernel upgrade, the framework will not want to re-replicate all of it's persistent data on the machine. Rather, tolerating one unavailable replica for the maintenance window would be preferred. I'd like to do a revisit of the design to ensure it works well for persistent resources as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2034) Documentation for isolator namespaces/pid.
Ian Downes created MESOS-2034: - Summary: Documentation for isolator namespaces/pid. Key: MESOS-2034 URL: https://issues.apache.org/jira/browse/MESOS-2034 Project: Mesos Issue Type: Documentation Affects Versions: 0.21.0 Reporter: Ian Downes Assignee: Ian Downes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2033) Documentation for isolator filesystem/shared.
Ian Downes created MESOS-2033: - Summary: Documentation for isolator filesystem/shared. Key: MESOS-2033 URL: https://issues.apache.org/jira/browse/MESOS-2033 Project: Mesos Issue Type: Documentation Affects Versions: 0.21.0 Reporter: Ian Downes Assignee: Ian Downes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1941) Make executor's user owner of executor's cgroup directory
[ https://issues.apache.org/jira/browse/MESOS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-1941: -- Labels: twitter (was: ) Make executor's user owner of executor's cgroup directory - Key: MESOS-1941 URL: https://issues.apache.org/jira/browse/MESOS-1941 Project: Mesos Issue Type: Improvement Components: isolation, slave Reporter: Mohit Soni Assignee: Ian Downes Priority: Minor Labels: twitter Currently, when cgroups are enabled, and executor is spawned, it's mounted under, for ex: /sys/fs/cgroup/cpu/mesos/mesos-id. This directory in current implementation is only writable by root user. This prevents process launched by executor to mount its child processes under this cgroup, because the cgroup directory is only writable by root. To enable a executor spawned process to mount it's child processes under it's cgroup directory, the cgroup directory should be made writable by the user which spawns the executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2036) Fix the Json format for the --modules and update the help message
[ https://issues.apache.org/jira/browse/MESOS-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195370#comment-14195370 ] Kapil Arya commented on MESOS-2036: --- Updated the Json format and adjusted the --modules help message accordingly. https://reviews.apache.org/r/27481 Fix the Json format for the --modules and update the help message - Key: MESOS-2036 URL: https://issues.apache.org/jira/browse/MESOS-2036 Project: Mesos Issue Type: Bug Reporter: Kapil Arya Assignee: Kapil Arya Priority: Blocker The Json format for specifying module-specific parameters is not correctly reflected in the help message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2025) OsTest.killtreeNoRoot: Process reparent assumes new parent is init pid 1
[ https://issues.apache.org/jira/browse/MESOS-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-2025: Sprint: Mesosphere Q4 Sprint 2 (was: Mesosphere Q4 Sprint 1 10/31) OsTest.killtreeNoRoot: Process reparent assumes new parent is init pid 1 Key: MESOS-2025 URL: https://issues.apache.org/jira/browse/MESOS-2025 Project: Mesos Issue Type: Bug Components: stout Environment: Ubuntu 14.04 with graphical interface Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Minor Reparenting does not always assign pid 1 (/sbin/init). If there is a user init such as init --user with some other pid, this will be the new parent. Modify os_tests to check up the parent tree, and succeed if there is a path to pid 1 without zombies along the way. This is not the cleanest fix, but I'm having trouble finding a way to find the appropriate init to check for. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1991) Remove dynamic allocation from Option
[ https://issues.apache.org/jira/browse/MESOS-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-1991: Sprint: Mesosphere Q4 Sprint 2 (was: Mesosphere Q4 Sprint 1 10/31) Remove dynamic allocation from Option - Key: MESOS-1991 URL: https://issues.apache.org/jira/browse/MESOS-1991 Project: Mesos Issue Type: Improvement Components: stout Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Minor Remove dynamic allocations from Option class. The motivation for this is 3-fold: 1. Reduce dynamic allocations. These can cause latency jitter as process lifetime grows. This kind of jitter can make it hard to grasp the upper bound of latency on certain operations under locks. This modification only moves the allocated space of T, it does not reduce or increase the number of actual construction / move calls unless the new move constructor is used. 2. The commonly understood implication of Optional / Option / Nullable is that it augments the type field by 1 bit in order to allow representation of an unknown or null state. This is handy in cases where a type such as int64_t fully utilizes its 64 bit storage space, and representing unknown would otherwise require us to steal a number (such as INT64_MAX). This class should not take on the additional responsibility of managing memory for the augmented type. 3. It can be very deceptive to a newcomer when Optionint64_t does a dynamic allocation. Intuitively you would not expect a type such as int64_t to do a dynamic allocation or be expensive to copy. Naturally OptionBigType would be expected to be expensive to copy, and so a developer would be more inclined to do something like std::shared_ptrOptionBigType. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool
[ https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-1316: Sprint: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 2 (was: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 1 10/31) Implement decent unit test coverage for the mesos-fetcher tool -- Key: MESOS-1316 URL: https://issues.apache.org/jira/browse/MESOS-1316 Project: Mesos Issue Type: Improvement Components: technical debt, test Reporter: Tom Arnfeld Assignee: Bernd Mathiske There are current no tests that cover the {{mesos-fetcher}} tool itself, and hence bugs like MESOS-1313 have accidentally slipped though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2010) Libprocess: Introduce enable_shared_from_this
[ https://issues.apache.org/jira/browse/MESOS-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-2010: Sprint: Mesosphere Q4 Sprint 2 (was: Mesosphere Q4 Sprint 1 10/31) Libprocess: Introduce enable_shared_from_this - Key: MESOS-2010 URL: https://issues.apache.org/jira/browse/MESOS-2010 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere add enable_shared_from_this to the configure check -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1330) Introduce stream abstraction to libprocess
[ https://issues.apache.org/jira/browse/MESOS-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-1330: Sprint: Mesosphere Q4 Sprint 2 (was: Mesosphere Q4 Sprint 1 10/31) Introduce stream abstraction to libprocess -- Key: MESOS-1330 URL: https://issues.apache.org/jira/browse/MESOS-1330 Project: Mesos Issue Type: Task Components: general, libprocess Reporter: Niklas Quarfot Nielsen Assignee: Joris Van Remoortere Labels: libprocess, network I think it makes sense to think in terms of different low or middle layer transports (which can accommodate channels like SSL). We could capture connection life-cycles and network send/receive primitives in a much explicit manner than currently in libprocess. I have a proof of concept transport / connection abstraction ready and which we can use to iterate a design. Notably, there are opportunities to change the current SocketManager/Socket abstractions to explicit ConnectionManager/Connection, which allow several and composeable communication layers. I am proposing to own this ticket and am looking for a shepherd to (thoroughly) go over design considerations before jumping into an actual implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2009) Libprocess: Introduce mutex
[ https://issues.apache.org/jira/browse/MESOS-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-2009: Sprint: Mesosphere Q4 Sprint 2 (was: Mesosphere Q4 Sprint 1 10/31) Libprocess: Introduce mutex --- Key: MESOS-2009 URL: https://issues.apache.org/jira/browse/MESOS-2009 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere add mutex to the configure check -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1571) Signal escalation timeout is not configurable
[ https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-1571: Sprint: Mesosphere Q4 Sprint 2 (was: Mesosphere Q4 Sprint 1 10/31) Signal escalation timeout is not configurable - Key: MESOS-1571 URL: https://issues.apache.org/jira/browse/MESOS-1571 Project: Mesos Issue Type: Bug Reporter: Niklas Quarfot Nielsen Assignee: Alexander Rukletsov Even though the executor shutdown grace period is set to a larger interval, the signal escalation timeout will still be 3 seconds. It should either be configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2011) Introduce mutex
[ https://issues.apache.org/jira/browse/MESOS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-2011: Sprint: Mesosphere Q4 Sprint 2 (was: Mesosphere Q4 Sprint 1 10/31) Introduce mutex --- Key: MESOS-2011 URL: https://issues.apache.org/jira/browse/MESOS-2011 Project: Mesos Issue Type: Improvement Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere * add mutex to the configure check * document use of mutex in style guide -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool
[ https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-1316: Sprint: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 1 10/31 (was: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 2) Implement decent unit test coverage for the mesos-fetcher tool -- Key: MESOS-1316 URL: https://issues.apache.org/jira/browse/MESOS-1316 Project: Mesos Issue Type: Improvement Components: technical debt, test Reporter: Tom Arnfeld Assignee: Bernd Mathiske There are current no tests that cover the {{mesos-fetcher}} tool itself, and hence bugs like MESOS-1313 have accidentally slipped though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1316) Implement decent unit test coverage for the mesos-fetcher tool
[ https://issues.apache.org/jira/browse/MESOS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-1316: Sprint: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 1 10/31, Mesosphere Q4 Sprint 2 (was: Q3 Sprint 1, Q3 Sprint 2, Mesosphere Q4 Sprint 1 10/31) Implement decent unit test coverage for the mesos-fetcher tool -- Key: MESOS-1316 URL: https://issues.apache.org/jira/browse/MESOS-1316 Project: Mesos Issue Type: Improvement Components: technical debt, test Reporter: Tom Arnfeld Assignee: Bernd Mathiske There are current no tests that cover the {{mesos-fetcher}} tool itself, and hence bugs like MESOS-1313 have accidentally slipped though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2035) Add reason to containerizer proto Termination
[ https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-2035: - Description: When an isolator kills a task, the reason is unknown. As part of MESOS-1830, the reason is set to a general one but ideally we would have the termination reason to pass through to the status update. (was: When an isolator kills a task, the reason is unknown. As part of MESOS-1830, the reason is set to a general one but ideally we would have the termination reason to pass through to the status update. We could also differentiate a bad command (using the Command executor) from a termination from an isolator.) Add reason to containerizer proto Termination - Key: MESOS-2035 URL: https://issues.apache.org/jira/browse/MESOS-2035 Project: Mesos Issue Type: Improvement Components: slave Affects Versions: 0.21.0 Reporter: Dominic Hamon Assignee: Dominic Hamon Priority: Minor When an isolator kills a task, the reason is unknown. As part of MESOS-1830, the reason is set to a general one but ideally we would have the termination reason to pass through to the status update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1941) Make executor's user owner of executor's cgroup directory
[ https://issues.apache.org/jira/browse/MESOS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195491#comment-14195491 ] Ian Downes commented on MESOS-1941: --- https://reviews.apache.org/r/27557/ https://reviews.apache.org/r/27558/ Make executor's user owner of executor's cgroup directory - Key: MESOS-1941 URL: https://issues.apache.org/jira/browse/MESOS-1941 Project: Mesos Issue Type: Improvement Components: isolation, slave Reporter: Mohit Soni Assignee: Ian Downes Priority: Minor Labels: twitter Currently, when cgroups are enabled, and executor is spawned, it's mounted under, for ex: /sys/fs/cgroup/cpu/mesos/mesos-id. This directory in current implementation is only writable by root user. This prevents process launched by executor to mount its child processes under this cgroup, because the cgroup directory is only writable by root. To enable a executor spawned process to mount it's child processes under it's cgroup directory, the cgroup directory should be made writable by the user which spawns the executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2037) Update docs/configuration.md
Kapil Arya created MESOS-2037: - Summary: Update docs/configuration.md Key: MESOS-2037 URL: https://issues.apache.org/jira/browse/MESOS-2037 Project: Mesos Issue Type: Documentation Reporter: Kapil Arya Assignee: Kapil Arya Priority: Blocker Update documentation for configuration flags (docs/configuration.md) to reflect the current state.https://reviews.apache.org/r/27556/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2025) OsTest.killtreeNoRoot: Process reparent assumes new parent is init pid 1
[ https://issues.apache.org/jira/browse/MESOS-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2025: -- Target Version/s: 0.21.0 OsTest.killtreeNoRoot: Process reparent assumes new parent is init pid 1 Key: MESOS-2025 URL: https://issues.apache.org/jira/browse/MESOS-2025 Project: Mesos Issue Type: Bug Components: stout Environment: Ubuntu 14.04 with graphical interface Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Minor Reparenting does not always assign pid 1 (/sbin/init). If there is a user init such as init --user with some other pid, this will be the new parent. Modify os_tests to check that the subtree has been reparented to a process different from its original parent (a.k.a. child) and that it is not a zombie. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1981) Create docs/modules.md to record module API changes
[ https://issues.apache.org/jira/browse/MESOS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195568#comment-14195568 ] Till Toenshoff commented on MESOS-1981: --- Given that Niklas is currently unavailable, I will take the freedom to commit this now. Create docs/modules.md to record module API changes --- Key: MESOS-1981 URL: https://issues.apache.org/jira/browse/MESOS-1981 Project: Mesos Issue Type: Bug Reporter: Kapil Arya Assignee: Kapil Arya The docs/modules.md file keep a history of all module API changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1950) Add module writers guide
[ https://issues.apache.org/jira/browse/MESOS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya reassigned MESOS-1950: - Assignee: Kapil Arya Add module writers guide Key: MESOS-1950 URL: https://issues.apache.org/jira/browse/MESOS-1950 Project: Mesos Issue Type: Documentation Components: modules Reporter: Niklas Quarfot Nielsen Assignee: Kapil Arya Priority: Critical Similar to Apache Webserver's Developing Modules guide (http://httpd.apache.org/docs/2.4/developer/modguide.html), we should write up a comprehensive guide to writing robust modules. I started a draft here: https://cwiki.apache.org/confluence/display/MESOS/Mesos+Modules+Developer+Guide It should be completed and/or copied (or moved) to docs/modules.md. There may be usefulness for both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1937) Create a document explaining the --modules flag
[ https://issues.apache.org/jira/browse/MESOS-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195570#comment-14195570 ] Kapil Arya commented on MESOS-1937: --- RR: https://reviews.apache.org/r/27453/ Create a document explaining the --modules flag --- Key: MESOS-1937 URL: https://issues.apache.org/jira/browse/MESOS-1937 Project: Mesos Issue Type: Documentation Reporter: Kapil Arya Assignee: Kapil Arya Priority: Blocker As the protobuf/Json for --modules is evolving, it is harder to explain everything in the command-line help. We should create a man page sort of document that explain all the intricacies of the --modules flag and refer to the document in the command-line help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1950) Add module writers guide
[ https://issues.apache.org/jira/browse/MESOS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-1950: -- Shepherd: Till Toenshoff Add module writers guide Key: MESOS-1950 URL: https://issues.apache.org/jira/browse/MESOS-1950 Project: Mesos Issue Type: Documentation Components: modules Reporter: Niklas Quarfot Nielsen Assignee: Kapil Arya Priority: Critical Similar to Apache Webserver's Developing Modules guide (http://httpd.apache.org/docs/2.4/developer/modguide.html), we should write up a comprehensive guide to writing robust modules. I started a draft here: https://cwiki.apache.org/confluence/display/MESOS/Mesos+Modules+Developer+Guide It should be completed and/or copied (or moved) to docs/modules.md. There may be usefulness for both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1950) Add module writers guide
[ https://issues.apache.org/jira/browse/MESOS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195574#comment-14195574 ] Kapil Arya commented on MESOS-1950: --- RR: https://reviews.apache.org/r/27453/ Add module writers guide Key: MESOS-1950 URL: https://issues.apache.org/jira/browse/MESOS-1950 Project: Mesos Issue Type: Documentation Components: modules Reporter: Niklas Quarfot Nielsen Assignee: Kapil Arya Priority: Critical Similar to Apache Webserver's Developing Modules guide (http://httpd.apache.org/docs/2.4/developer/modguide.html), we should write up a comprehensive guide to writing robust modules. I started a draft here: https://cwiki.apache.org/confluence/display/MESOS/Mesos+Modules+Developer+Guide It should be completed and/or copied (or moved) to docs/modules.md. There may be usefulness for both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1950) Add module writers guide
[ https://issues.apache.org/jira/browse/MESOS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-1950: -- Target Version/s: 0.21.0 Add module writers guide Key: MESOS-1950 URL: https://issues.apache.org/jira/browse/MESOS-1950 Project: Mesos Issue Type: Documentation Components: modules Reporter: Niklas Quarfot Nielsen Assignee: Kapil Arya Priority: Critical Similar to Apache Webserver's Developing Modules guide (http://httpd.apache.org/docs/2.4/developer/modguide.html), we should write up a comprehensive guide to writing robust modules. I started a draft here: https://cwiki.apache.org/confluence/display/MESOS/Mesos+Modules+Developer+Guide It should be completed and/or copied (or moved) to docs/modules.md. There may be usefulness for both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2001) Authenticatee modules similar to Authenticator modules
[ https://issues.apache.org/jira/browse/MESOS-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2001: -- Sprint: Mesosphere Q4 Sprint 1 10/31, Mesosphere Q4 Sprint 2 (was: Mesosphere Q4 Sprint 1 10/31) Authenticatee modules similar to Authenticator modules -- Key: MESOS-2001 URL: https://issues.apache.org/jira/browse/MESOS-2001 Project: Mesos Issue Type: Epic Components: modules Reporter: Till Toenshoff Labels: authentication, module For covering a complete modules based authentication, we will need to allow for authenticatee modules just like we are with authenticator modules. h4.Motivation Allow for third parties to quickly develop and plug-in new authentication methods. The modularized Authenticatee API will lower the barrier for the community to provide new methods to Mesos. An example for such additional, next step module could be PAM (LDAP, MySQL, NIS, UNIX) backed authentication. cyrus-sasl2 itself already offers more than a half a dozen mechanisms via its standard plugins and these could be triggered by additional Authenticator / Authenticatee modules. cyrus-sasl2 does support even more mechanisms when being custom built (about a full dozen) but we do not want to bundle cyrus-sasl2 to enforce custom builds. Alternative authentication (especially non-SASL based) methods may bring in new dependencies that we don't want to enforce on all of our users. Mesos users may be required to use custom authentication techniques due to strict security policies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)