[jira] [Created] (MESOS-6774) Role sorter and quota role sorter can have more copies of share resources in allocations than in total.
Yan Xu created MESOS-6774: - Summary: Role sorter and quota role sorter can have more copies of share resources in allocations than in total. Key: MESOS-6774 URL: https://issues.apache.org/jira/browse/MESOS-6774 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Yan Xu The way shared resources support works in the allocator is to allocate multiple copies of the shared resources so multiple frameworks can receive them. Multiple copies of the same shared resources doesn't affect the quantity of the sorter's allocations and total pool so it doesn't have an impact on DRF. To make resource accounting work, though, when the copies of the same resource are add to a framework's allocation, we increase total size of the total pool in the sorter (again, adding these copies doesn't affect quantity) so that the *allocations in a sorter is always bounded by the total pool in the sorter*. This invariant is a requirement for the following logic in the allocator to work: {code:title=Remove the resources from the framework sorter when it's unallocated from the framework} frameworkSorters[role]->unallocated( frameworkId.value(), slaveId, resources); frameworkSorters[role]->remove(slaveId, resources); {code} e.g., if there are 2 copies of a shared disk allocated to framework1, the sorter's total pool has 2 copies of the disk as well. However we currently only do this for the framework sorter below a role because the allocator (implicitly) assumes that role sorter, being the root-level sorter, has a total pool that's unchanged during allocation or resource recover. This is not a problem right now because for this reason, {{Sorter::add(const SlaveID& slaveId, const Resources& resources)/remove(const SlaveID& slaveId, const Resources& resources)}} are not called during allocation or resource recover. This will likely change with MESOS-6375, when role sorters are having a hierarchy so not all of them are bound to the physical size of the cluster. We should revisit the shared resource allocation logic then to make sure the invariant *allocations in a sorter is always bounded by the total pool in the sorter* holds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6717) Add Windows support to agent test harness
[ https://issues.apache.org/jira/browse/MESOS-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737005#comment-15737005 ] Joseph Wu edited comment on MESOS-6717 at 12/10/16 2:07 AM: {code} commit b9ef614c53373fdc3aadc0f237e7533b1d2a7209 Author: Alex ClemmerDate: Thu Dec 8 17:18:01 2016 -0800 Stout: Moved `os::getenv` from `os.hpp` to `os/getenv.hpp`. This commit moves `os::getenv` to its own file under the `stout/os/` directory in preparation for a functional change to `os::temp`; which should look for the standard environment variable `TMPDIR` before falling back to `/tmp` on POSIX environments. In other words, `stout/os/temp.hpp` needs to call `os::getenv`, but doing so prior to this commit would introduce a circular header dependency with `stout/os.hpp`. (`stout/os.hpp` aggregates all header files in `stout/os/` and therefore, no files in `stout/os/` should be taking a dependency on it.) Review: https://reviews.apache.org/r/54519/ {code} {code} commit ca0a8d552cd098593c3c3b0f76d5215846de7120 Author: Alex Clemmer Date: Thu Dec 8 17:25:06 2016 -0800 Stout: Added logic for TMPDIR environment variable in `os::temp`. `TMPDIR` is a POSIX-standard environment variable which can be used to specify a temporary directory. This variable is currently read in the agent tests, but ignored in other parts of the codebase. (`os::temp` is commonly used by `os::mkdtemp`.) This commit is one of two commits that will normalize the location of the temporary directory. Review: https://reviews.apache.org/r/54489/ {code} {code} commit 883f5d2e31eb3f73e808e58c020f5b68ca7b2e1d Author: Alex Clemmer Date: Thu Dec 8 17:38:41 2016 -0800 Normalized how temporary directories are determined in tests. This changes the Mesos tests to use the updated `os::temp` helper, which (on POSIX) now checks the `TMPDIR` environment variable. On Windows, this changes the temporary directory to an appropriate location (`/tmp` does not exist on Windows by default). Review: https://reviews.apache.org/r/54490/ {code} {code} commit 3511b5407710e9a0d0a668ce1663a8d89cc028ca Author: Joseph Wu Date: Fri Dec 9 17:50:38 2016 -0800 Removed the UUID from IO Switchboard tests. The IO switchboard server creates a UNIX socket at a given path. Due to OS constraints, this path must be less than 104 characters long. In the tests, the path is set to a value based on the test directory. If the test directory is too long, the UNIX socket creation will fail, as observed in OSX, where the standard temporary directory does not default to `/tmp` (as is the case on most Linux's). The test directory was changed to provide platform-specific values in this review: https://reviews.apache.org/r/54490/ This commit shortens the UNIX socket address by removing the UUID. This is safe because we are not running multiple IO switchboards in the same test, in the same directory. {code} was (Author: kaysoky): {code} commit ca0a8d552cd098593c3c3b0f76d5215846de7120 Author: Alex Clemmer Date: Thu Dec 8 17:25:06 2016 -0800 Stout: Added logic for TMPDIR environment variable in `os::temp`. `TMPDIR` is a POSIX-standard environment variable which can be used to specify a temporary directory. This variable is currently read in the agent tests, but ignored in other parts of the codebase. (`os::temp` is commonly used by `os::mkdtemp`.) This commit is one of two commits that will normalize the location of the temporary directory. Review: https://reviews.apache.org/r/54489/ {code} {code} commit 883f5d2e31eb3f73e808e58c020f5b68ca7b2e1d Author: Alex Clemmer Date: Thu Dec 8 17:38:41 2016 -0800 Normalized how temporary directories are determined in tests. This changes the Mesos tests to use the updated `os::temp` helper, which (on POSIX) now checks the `TMPDIR` environment variable. On Windows, this changes the temporary directory to an appropriate location (`/tmp` does not exist on Windows by default). Review: https://reviews.apache.org/r/54490/ {code} {code} commit 3511b5407710e9a0d0a668ce1663a8d89cc028ca Author: Joseph Wu Date: Fri Dec 9 17:50:38 2016 -0800 Removed the UUID from IO Switchboard tests. The IO switchboard server creates a UNIX socket at a given path. Due to OS constraints, this path must be less than 104 characters long. In the tests, the path is set to a value based on the test directory. If the test directory is too long, the UNIX socket creation
[jira] [Commented] (MESOS-6717) Add Windows support to agent test harness
[ https://issues.apache.org/jira/browse/MESOS-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737005#comment-15737005 ] Joseph Wu commented on MESOS-6717: -- {code} commit ca0a8d552cd098593c3c3b0f76d5215846de7120 Author: Alex ClemmerDate: Thu Dec 8 17:25:06 2016 -0800 Stout: Added logic for TMPDIR environment variable in `os::temp`. `TMPDIR` is a POSIX-standard environment variable which can be used to specify a temporary directory. This variable is currently read in the agent tests, but ignored in other parts of the codebase. (`os::temp` is commonly used by `os::mkdtemp`.) This commit is one of two commits that will normalize the location of the temporary directory. Review: https://reviews.apache.org/r/54489/ {code} {code} commit 883f5d2e31eb3f73e808e58c020f5b68ca7b2e1d Author: Alex Clemmer Date: Thu Dec 8 17:38:41 2016 -0800 Normalized how temporary directories are determined in tests. This changes the Mesos tests to use the updated `os::temp` helper, which (on POSIX) now checks the `TMPDIR` environment variable. On Windows, this changes the temporary directory to an appropriate location (`/tmp` does not exist on Windows by default). Review: https://reviews.apache.org/r/54490/ {code} {code} commit 3511b5407710e9a0d0a668ce1663a8d89cc028ca Author: Joseph Wu Date: Fri Dec 9 17:50:38 2016 -0800 Removed the UUID from IO Switchboard tests. The IO switchboard server creates a UNIX socket at a given path. Due to OS constraints, this path must be less than 104 characters long. In the tests, the path is set to a value based on the test directory. If the test directory is too long, the UNIX socket creation will fail, as observed in OSX, where the standard temporary directory does not default to `/tmp` (as is the case on most Linux's). The test directory was changed to provide platform-specific values in this review: https://reviews.apache.org/r/54490/ This commit shortens the UNIX socket address by removing the UUID. This is safe because we are not running multiple IO switchboards in the same test, in the same directory. {code} > Add Windows support to agent test harness > - > > Key: MESOS-6717 > URL: https://issues.apache.org/jira/browse/MESOS-6717 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Alex Clemmer >Assignee: Alex Clemmer >Priority: Blocker > Labels: microsoft, windows-mvp > > Of particular interest is in `src/tests/CMakeLists.txt` is support enough of > the following that we can successfully run agent tests: > TEST_HELPER_SRC > MESOS_TESTS_UTILS_SRC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6773) Provide REST-style endpoints that map to v1 master/agent Calls.
Benjamin Mahler created MESOS-6773: -- Summary: Provide REST-style endpoints that map to v1 master/agent Calls. Key: MESOS-6773 URL: https://issues.apache.org/jira/browse/MESOS-6773 Project: Mesos Issue Type: Improvement Components: HTTP API Reporter: Benjamin Mahler With the addition of V1 {{master::Call}} and {{agent::Call}} to replace the V0 REST-style endpoints (e.g. /state, /metrics/snapshot, etc), users can no longer hit these endpoints in their browser or use query parameters. Also, tooling has to send POST data, which is a bit more onerous in most libraries than simply using a URL with query parameters. Per the [design doc|https://docs.google.com/document/d/1XfgF4jDXZDVIEWQPx6Y4glgeTTswAAxw6j8dPDAtoeI], we can add a mapping to REST-style endpoints to provide users with a means to hit these endpoints without POST data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6772) Stop building mesos-slave.
[ https://issues.apache.org/jira/browse/MESOS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736652#comment-15736652 ] James Peach commented on MESOS-6772: Rather than building binaries for both {{mesos-agent}} and {{mesos-slave}}, just install a symlink from the latter to the former. > Stop building mesos-slave. > -- > > Key: MESOS-6772 > URL: https://issues.apache.org/jira/browse/MESOS-6772 > Project: Mesos > Issue Type: Bug >Reporter: James Peach >Assignee: James Peach > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6772) Stop building mesos-slave.
James Peach created MESOS-6772: -- Summary: Stop building mesos-slave. Key: MESOS-6772 URL: https://issues.apache.org/jira/browse/MESOS-6772 Project: Mesos Issue Type: Bug Reporter: James Peach Assignee: James Peach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5849) Agent sandboxes on Windows surpass the 260 character path length limit
[ https://issues.apache.org/jira/browse/MESOS-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Clemmer updated MESOS-5849: Assignee: Daniel Pravat (was: Alex Clemmer) > Agent sandboxes on Windows surpass the 260 character path length limit > -- > > Key: MESOS-5849 > URL: https://issues.apache.org/jira/browse/MESOS-5849 > Project: Mesos > Issue Type: Bug > Components: agent > Environment: Windows Server 2012, Windows Server 2016 RC >Reporter: Lokendra Malik >Assignee: Daniel Pravat >Priority: Blocker > Labels: microsoft, tech-debt, windows > Attachments: Pasted image at 2016_07_14 09_02 PM.png, mesoscrash.jpg > > > When I tried to deploy an application on mesos-agent(windows), the moment > application is deployed mesos agent service on windows node is crashed and in > logs I can see error: > I0714 07:20:09.788785 5640 containerizer.cpp:781] Starting container > '031878d5-32fa-41ed-8b23-d0d91fe34f05' for executor > 'windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf' of framework > '5c83c39f-75a0-4f38-9e47-633767b47976-' > F0714 07:20:09.797576 5480 slave.cpp:6174] > CHECK_SOME(state::checkpoint(path, t)): Failed to create directory > 'E:\agentlogs\meta\slaves\803264d5-8f2d-46bb-8019-de0f9565c971-S5\frameworks\5c83c39f-75a0-4f38-9e47-633767b47976-\executors\windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf\runs\031878d5-32fa-41ed-8b23-d0d91fe34f05\tasks\windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf': > No such file or directory > We debug the issue and found issue with fine name reached to max filepath > length: > E:\agentlogs\meta\slaves\803264d5-8f2d-46bb-8019-de0f9565c971-S5\frameworks\5c83c39f-75a0-4f38-9e47-633767b47976-\executors\windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf\runs\031878d5-32fa-41ed-8b23-d0d91fe34f05\tasks\windemo.10cc3e54-49ce-11e6-a2a2-08002786cbbf > I think path length limit in windows is 256 which is revoked and this made > service to be crashed while this will work fine for linux mesos agents so we > may have to control current UUID.toString() method to be shorter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6771) Add and vet `install` target
Alex Clemmer created MESOS-6771: --- Summary: Add and vet `install` target Key: MESOS-6771 URL: https://issues.apache.org/jira/browse/MESOS-6771 Project: Mesos Issue Type: Bug Components: cmake Reporter: Alex Clemmer Assignee: Alex Clemmer We need to be able to do something like `make install` and while CMake comes with something like this out of the box, we do need to vet it (at the very least). As a general note (as jpeach suggests), we should take care to not generate a separate binary for `mesos-slave` and `mesos-agent`. If it exists at all, it should be a symlink generated upon `install`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6770) Handle SSL socket read and write events separately
Greg Mann created MESOS-6770: Summary: Handle SSL socket read and write events separately Key: MESOS-6770 URL: https://issues.apache.org/jira/browse/MESOS-6770 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Greg Mann The SSL socket code in libprocess currently does not distinguish between events received during reading and those received during writing. However, libevent does provide event flags with this information: {{BEV_EVENT_READING}} and {{BEV_EVENT_WRITING}}. We should make use of these flags to handle read and write events differently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6769) The server does not close it's end of the connection after returning a response to a streaming request.
[ https://issues.apache.org/jira/browse/MESOS-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-6769: -- Description: Consider this scenario, - The client starts to send a streaming request to the agent with the {{Connection: close}} header set. This means that the client is relying on the server to close it's end of the connection after sending the response. - If the request failed on the server i.e., some validation errors. The server sends the response but does not close it's end of the socket. - Some client libraries e.g., Python Requests rely on the server to close its end of the socket after sending the response. Otherwise, the connection just hangs on the client when it has no more streaming data to send in such cases. Libprocess should close its end of the connection after sending the response in such cases. was: Consider this scenario, - The client starts to send a streaming request to the agent with the {{Connection: close}} header set. This means that the client is relying on the server to close it's end of the connection after sending the response. - If the request failed on the server i.e., some validation errors. The server sends the response but does not close it's end of the socket. - Some client libraries e.g., Python Requests rely on the server to close its end of the socket after sending the response. Otherwise, the connection just hangs on the client when it has no more streaming data to send in such cases. Libprocess should close its end of the > The server does not close it's end of the connection after returning a > response to a streaming request. > --- > > Key: MESOS-6769 > URL: https://issues.apache.org/jira/browse/MESOS-6769 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar > Labels: libprocess, mesosphere > > Consider this scenario, > - The client starts to send a streaming request to the agent with the > {{Connection: close}} header set. This means that the client is relying on > the server to close it's end of the connection after sending the response. > - If the request failed on the server i.e., some validation errors. The > server sends the response but does not close it's end of the socket. > - Some client libraries e.g., Python Requests rely on the server to close its > end of the socket after sending the response. Otherwise, the connection just > hangs on the client when it has no more streaming data to send in such cases. > Libprocess should close its end of the connection after sending the response > in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6769) The server does not close it's end of the connection after returning a response to a streaming request.
Anand Mazumdar created MESOS-6769: - Summary: The server does not close it's end of the connection after returning a response to a streaming request. Key: MESOS-6769 URL: https://issues.apache.org/jira/browse/MESOS-6769 Project: Mesos Issue Type: Bug Reporter: Anand Mazumdar Consider this scenario, - The client starts to send a streaming request to the agent with the {{Connection: close}} header set. This means that the client is relying on the server to close it's end of the connection after sending the response. - If the request failed on the server i.e., some validation errors. The server sends the response but does not close it's end of the socket. - Some client libraries e.g., Python Requests rely on the server to close its end of the socket after sending the response. Otherwise, the connection just hangs on the client when it has no more streaming data to send in such cases. Libprocess should close its end of the -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6350) Raise minimum required cmake version
[ https://issues.apache.org/jira/browse/MESOS-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Clemmer updated MESOS-6350: Labels: mesosphere microsoft tech-debt (was: mesosphere tech-debt) > Raise minimum required cmake version > > > Key: MESOS-6350 > URL: https://issues.apache.org/jira/browse/MESOS-6350 > Project: Mesos > Issue Type: Improvement > Components: cmake >Reporter: Benjamin Bannier > Labels: mesosphere, microsoft, tech-debt > > We currently require at least cmake-2.8 which had its first point release > 2010 and last update 2013. Meanwhile upstream is preparing the release of > 3.7.0. While cmake support in Mesos is still experimental we should evaluate > how much we can increase the minimal required version so we are not locked > into an old version lacking desirable features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6768) Introduce a containerizer and executor suitable for scale testing
Ilya Pronin created MESOS-6768: -- Summary: Introduce a containerizer and executor suitable for scale testing Key: MESOS-6768 URL: https://issues.apache.org/jira/browse/MESOS-6768 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Ilya Pronin Assignee: Ilya Pronin Priority: Minor The {{ScaleTestContainerizer}} and {{ScaleTestExecutor}} implement the basic behaviors of a containerizer and executor while consuming no resources. They are intended for use in scale testing a large number of agents interacting with a master. The containerizer does no actual containerization nor does it actually even run the taks. It simply runs a {{ScaleTestExecutor}} which sends {{TASK_RUNNING}} on {{Executor::launchTask()}} and {{TASK_KILLED}} on {{Executor::killTask()}}. Because no resources are actually consumed (but are accounted for by the agent) any amount of resources can be offered by the agent through {{--resources}} so that any desired number of tasks can be run. Enable it with {{--containerizers=scale-test}}. [~jieyu], can you shepherd this, please? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5387) mesos-execute exit status is always success
[ https://issues.apache.org/jira/browse/MESOS-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-5387: -- Shepherd: Till Toenshoff > mesos-execute exit status is always success > --- > > Key: MESOS-5387 > URL: https://issues.apache.org/jira/browse/MESOS-5387 > Project: Mesos > Issue Type: Bug > Components: cli >Affects Versions: 0.28.1 >Reporter: Luca Bruno > > mesos-execute should be able to return an exit status based on the status of > the task. Currently it always exists with 0. > It's very hard for a caller to know the status of the task, for example from > a bash script calling mesos-execute. > I believe mesos-execute is a simple but very useful cli tool for one-shot > tasks, and as such deserves more attention. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.
[ https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-6743: Description: If {{docker stop}} finishes with an error status, the executor should catch this and react instead of indefinitely waiting for {{reaped}} to return. An interesting question is _how_ to react. Here are possible solutions. 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do if {{docker stop}} continues to fail. 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}? 3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running. was: If {{docker stop}} finishes with an error status, the executor should catch this and react instead of indefinitely waiting for {{reaped}} to return. An interesting question is _how_ to react. Here are possible solutions. 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do if {{docker stop}} continues to fail. 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: {TASK_KILLING}} for every kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}? 3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running. > Docker executor hangs forever if `docker stop` fails. > - > > Key: MESOS-6743 > URL: https://issues.apache.org/jira/browse/MESOS-6743 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.0.1, 1.1.0 >Reporter: Alexander Rukletsov > Labels: mesosphere > > If {{docker stop}} finishes with an error status, the executor should catch > this and react instead of indefinitely waiting for {{reaped}} to return. > An interesting question is _how_ to react. Here are possible solutions. > 1. Retry {{docker stop}}. In this case it is unclear how many times to retry > and what to do if {{docker stop}} continues to fail. > 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. > However, in this case it is unclear what status updates we should send: > {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill > a task? or set a specific reason in {{TASK_KILLING}}? > 3. Clean up and exit. In this case we should make sure the task container is > killed or notify the framework and the operator that the container may still > be running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.
[ https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-6743: Description: If {{docker stop}} finishes with an error status, the executor should catch this and react instead of indefinitely waiting for {{reaped}} to return. An interesting question is _how_ to react. Here are possible solutions. 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do if {{docker stop}} continues to fail. 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: {TASK_KILLING}} for every kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}? 3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running. was: If {{docker stop}} finishes with an error status, the executor should catch this and react instead of indefinitely waiting for {{reaped}} to return. An interesting question is _how_ to react. Here are possible solutions. 1. Retry {{docker stop}}. In this case it is unclear how many times to retry and what to do if {{docker stop}} continues to fail. 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: {[TASK_KILLING}} for every kill retry? an extra update when we failed to kill a task? or set a specific reason in {{TASK_KILLING}}? 3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running. > Docker executor hangs forever if `docker stop` fails. > - > > Key: MESOS-6743 > URL: https://issues.apache.org/jira/browse/MESOS-6743 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.0.1, 1.1.0 >Reporter: Alexander Rukletsov > Labels: mesosphere > > If {{docker stop}} finishes with an error status, the executor should catch > this and react instead of indefinitely waiting for {{reaped}} to return. > An interesting question is _how_ to react. Here are possible solutions. > 1. Retry {{docker stop}}. In this case it is unclear how many times to retry > and what to do if {{docker stop}} continues to fail. > 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. > However, in this case it is unclear what status updates we should send: > {TASK_KILLING}} for every kill retry? an extra update when we failed to kill > a task? or set a specific reason in {{TASK_KILLING}}? > 3. Clean up and exit. In this case we should make sure the task container is > killed or notify the framework and the operator that the container may still > be running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6767) Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766
[ https://issues.apache.org/jira/browse/MESOS-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6767: --- Description: This error message can pop up in unexpected places (e.g. when running a LAUNCH_NESTED_CONTAINER_SESSION and an invalid command is passed to it). We should likely just remove the UNREACHABLE() statement here as it's obviously reachable in cases where the command we are trying to launch is not found. was: This error message can pop up in unexpected places (e.g. when running a LACUNH_NESTED_CONTAINER_SESSION and an invalid command is passed to it). We should likely just remove the UNREACHABLE() statement here as it's obviously reachable in cases where the command we are trying to launch is not found. > Reached unreachable statement at > /mesos/src/slave/containerizer/mesos/launch.cpp:766 > -- > > Key: MESOS-6767 > URL: https://issues.apache.org/jira/browse/MESOS-6767 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Jie Yu > Labels: containerizer, mesosphere > > This error message can pop up in unexpected places (e.g. when running a > LAUNCH_NESTED_CONTAINER_SESSION and an invalid command is passed to it). > We should likely just remove the UNREACHABLE() statement here as it's > obviously reachable in cases where the command we are trying to launch is not > found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6767) Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766
[ https://issues.apache.org/jira/browse/MESOS-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6767: --- Priority: Blocker (was: Major) > Reached unreachable statement at > /mesos/src/slave/containerizer/mesos/launch.cpp:766 > -- > > Key: MESOS-6767 > URL: https://issues.apache.org/jira/browse/MESOS-6767 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Jie Yu >Priority: Blocker > Labels: containerizer, mesosphere > > This error message can pop up in unexpected places (e.g. when running a > LAUNCH_NESTED_CONTAINER_SESSION and an invalid command is passed to it). > We should likely just remove the UNREACHABLE() statement here as it's > obviously reachable in cases where the command we are trying to launch is not > found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6767) Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766
[ https://issues.apache.org/jira/browse/MESOS-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6767: --- Story Points: 1 Labels: containerizer mesosphere (was: ) > Reached unreachable statement at > /mesos/src/slave/containerizer/mesos/launch.cpp:766 > -- > > Key: MESOS-6767 > URL: https://issues.apache.org/jira/browse/MESOS-6767 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Jie Yu > Labels: containerizer, mesosphere > > This error message can pop up in unexpected places (e.g. when running a > LACUNH_NESTED_CONTAINER_SESSION and an invalid command is passed to it). > We should likely just remove the UNREACHABLE() statement here as it's > obviously reachable in cases where the command we are trying to launch is not > found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6767) Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766
Kevin Klues created MESOS-6767: -- Summary: Reached unreachable statement at /mesos/src/slave/containerizer/mesos/launch.cpp:766 Key: MESOS-6767 URL: https://issues.apache.org/jira/browse/MESOS-6767 Project: Mesos Issue Type: Bug Reporter: Kevin Klues Assignee: Jie Yu This error message can pop up in unexpected places (e.g. when running a LACUNH_NESTED_CONTAINER_SESSION and an invalid command is passed to it). We should likely just remove the UNREACHABLE() statement here as it's obviously reachable in cases where the command we are trying to launch is not found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-907) Add Kerberos Authentication support
[ https://issues.apache.org/jira/browse/MESOS-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734795#comment-15734795 ] Klaus Ma commented on MESOS-907: I think Mesos prefer to delegate this feature to user to make a module :). > Add Kerberos Authentication support > --- > > Key: MESOS-907 > URL: https://issues.apache.org/jira/browse/MESOS-907 > Project: Mesos > Issue Type: Story > Components: general >Reporter: Adam B >Assignee: Tim Anderegg > Labels: security, twitter > > MESOS-704 added basic authentication support using CRAM-MD5 through SASL. Now > we should integrate Kerberos authentication using GSS-API, which is already > supported by SASL. Kerberos is a widely-used industry standard authentication > service, and integration with Mesos will make it easier for customers to > integrate their existing security process with Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-907) Add Kerberos Authentication support
[ https://issues.apache.org/jira/browse/MESOS-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734791#comment-15734791 ] haosdent commented on MESOS-907: Mesos have not support kerberos. > Add Kerberos Authentication support > --- > > Key: MESOS-907 > URL: https://issues.apache.org/jira/browse/MESOS-907 > Project: Mesos > Issue Type: Story > Components: general >Reporter: Adam B >Assignee: Tim Anderegg > Labels: security, twitter > > MESOS-704 added basic authentication support using CRAM-MD5 through SASL. Now > we should integrate Kerberos authentication using GSS-API, which is already > supported by SASL. Kerberos is a widely-used industry standard authentication > service, and integration with Mesos will make it easier for customers to > integrate their existing security process with Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.
[ https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734764#comment-15734764 ] Elve Xu commented on MESOS-6213: also fail on version :1.1.0 with macOS Sierra > Build failure on macOS Sierra: Protobuf atomics deprecated. > --- > > Key: MESOS-6213 > URL: https://issues.apache.org/jira/browse/MESOS-6213 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Charles Allen > > Building on OSX is giving the following error. > {code} > In file included from > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184: > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9: > error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first > deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() > from instead [-Werror,-Wdeprecated-declarations] > if (OSAtomicCompareAndSwap64Barrier( > ^ > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9: > note: > 'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated > here > boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t > __newValue, > ^ > {code} > Protobuf is not listed as a component so I just set it as {{build}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)