[jira] [Assigned] (MESOS-7300) Mesos failed to build on Windows due to error C2440: 'return': cannot convert from 'Error' to 'bool'
[ https://issues.apache.org/jira/browse/MESOS-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu reassigned MESOS-7300: Resolution: Fixed Assignee: Andrew Schwartzmeyer Fix Version/s: 1.3.0 {code} commit 1de39e676a0dc5f78eeff303cb0eba5467168b9f Author: Andrew SchwartzmeyerDate: Fri Mar 24 21:07:30 2017 -0700 Windows: Fixed return of bad types in stat.hpp. Commit 5f159cdcb introduced `return Error(...)` logic to functions which return `bool`, not `Try`, which broke the Windows build. Furthermore, in the instances of `isdir` and `isfile`, erroring when asked to not follow a symlink is not correct. The semantics of symlinks provide clear answers to `isdir` and `isfile` when the target is a link, and is not being followed (it is neither a regular file nor a directory). We explicitly match the POSIX semantics for `isfile` where `S_IFREG` returns `false` for symbolic links. For the functions `mode` and `dev`, which return types wrapped by `Try`, we should only error if asked not to follow symlinks, and the target is actually a symlink. If it is not a symlink to begin with, we should not prematurely error. If it is a symlink, we should error because there is no equivalent of `lstat` on Windows to obtain `st_mode` or `st_dev` of a symlink itself. Review: https://reviews.apache.org/r/57926/ {code} > Mesos failed to build on Windows due to error C2440: 'return': cannot convert > from 'Error' to 'bool' > > > Key: MESOS-7300 > URL: https://issues.apache.org/jira/browse/MESOS-7300 > Project: Mesos > Issue Type: Bug > Components: build > Environment: Windows Server 2012 R2 + VS2015 Update 3 >Reporter: Karen Huang >Assignee: Andrew Schwartzmeyer >Priority: Blocker > Fix For: 1.3.0 > > > I try to build Mesos (master branch revision 322300f) with VS2015 Update 3 on > Windows. It failed to build with the following error: > D:\Mesos\src\3rdparty\stout\include\stout/os/windows/stat.hpp(41): error > C2440: 'return': cannot convert from 'Error' to 'bool' (compiling source file > D:\Mesos\src\3rdparty\libprocess\src\time.cpp) > [D:\Mesos\build_x64\3rdparty\libprocess\src\process-0.0.1.vcxproj] > D:\Mesos\src\3rdparty\stout\include\stout/os/windows/stat.hpp(59): error > C2440: 'return': cannot convert from 'Error' to 'bool' (compiling source file > D:\Mesos\src\3rdparty\libprocess\src\time.cpp) > [D:\Mesos\build_x64\3rdparty\libprocess\src\process-0.0.1.vcxproj] > This issue starts to be reproduce form master branch revision "82e4077" > (https://github.com/apache/mesos/commit/82e4077ceb40e84c2796be43f1448eec0bfd7c69#diff-76a72473075f57f8d0d3b3bf6f150672) > I presume this is a issue in your source code. The function "inline bool > isdir( const std::string& path, const FollowSymlink follow = FOLLOW_SYMLINK)" > needs a bool type return value. But the return value type of > "Error("Non-following stat not supported for '" + path + "'")" is not bool. > In D:\Mesos\src\3rdparty\stout\include\stout\os\windows\stat.hpp file: > inline bool isdir( > const std::string& path, > const FollowSymlink follow = FOLLOW_SYMLINK) > { > struct _stat s; > if (follow == DO_NOT_FOLLOW_SYMLINK) { > return Error("Non-following stat not supported for '" + path + "'"); > } > if (::_stat(path.c_str(), ) < 0) { > return false; > } > return S_ISDIR(s.st_mode); > } > In D:\Mesos\src\3rdparty\stout\include\stout\errorbase.hpp file: > class Error > { > public: > explicit Error(const std::string& _message) : message(_message) {} > const std::string message; > }; -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7311) CopyFetcherPluginTest.FetchExistingFile
Andrew Schwartzmeyer created MESOS-7311: --- Summary: CopyFetcherPluginTest.FetchExistingFile Key: MESOS-7311 URL: https://issues.apache.org/jira/browse/MESOS-7311 Project: Mesos Issue Type: Bug Components: fetcher Environment: Windows 10 Reporter: Andrew Schwartzmeyer The CopyFetcherPluginTest.FetchExistingFile unit tests (from mesos-tests) is routinely failing on Windows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7310) Implement a separate python client library for the new cli
[ https://issues.apache.org/jira/browse/MESOS-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Chung updated MESOS-7310: -- Description: cli_new in its current form is very difficult to package due to the following reasons: 1. src/cli_new/lib/mesos imports plugins using relative imports, which fails if it is built into a pip package 2. there is no setup.py script which defines what should be installed 3. plugins/tests are unnecessarily included in the package, which are things consumers of the package shouldn’t be able to import having such a package will allow external consumers to be able to add application-specific wrappers on it, e.g. integration with ACL libraries of their choice. The plan as discussed will create a `mesos` package under `src/python/lib`, potentially including a `setup.py` for building the package into a PyPI package. > Implement a separate python client library for the new cli > -- > > Key: MESOS-7310 > URL: https://issues.apache.org/jira/browse/MESOS-7310 > Project: Mesos > Issue Type: Task > Components: cli >Affects Versions: 1.3.0 >Reporter: Eric Chung >Assignee: Eric Chung > > cli_new in its current form is very difficult to package due to the following > reasons: > 1. src/cli_new/lib/mesos imports plugins using relative imports, which fails > if it is built into a pip package > 2. there is no setup.py script which defines what should be installed > 3. plugins/tests are unnecessarily included in the package, which are things > consumers of the package shouldn’t be able to import > having such a package will allow external consumers to be able to add > application-specific wrappers on it, e.g. integration with ACL libraries of > their choice. > The plan as discussed will create a `mesos` package under `src/python/lib`, > potentially including a `setup.py` for building the package into a PyPI > package. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7310) Implement a separate python client library for the new cli
Eric Chung created MESOS-7310: - Summary: Implement a separate python client library for the new cli Key: MESOS-7310 URL: https://issues.apache.org/jira/browse/MESOS-7310 Project: Mesos Issue Type: Task Components: cli Affects Versions: 1.3.0 Reporter: Eric Chung Assignee: Eric Chung -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7309) Support specifying devices for a container.
Jie Yu created MESOS-7309: - Summary: Support specifying devices for a container. Key: MESOS-7309 URL: https://issues.apache.org/jira/browse/MESOS-7309 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Jie Yu Some container requires some devices to be available in the container (e.g., /dev/fuse). Currently, the default devices are hard coded if the rootfs image is specified for the container. We should allow frameworks to specify additional devices that will be made available to the container. Besides bind mount the device file, the devices cgroup needs to be configured properly to allow access to that device. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7308) Race condition in `updateAllocation()` on DESTORY of a shared volume.
Anindya Sinha created MESOS-7308: Summary: Race condition in `updateAllocation()` on DESTORY of a shared volume. Key: MESOS-7308 URL: https://issues.apache.org/jira/browse/MESOS-7308 Project: Mesos Issue Type: Bug Components: general Reporter: Anindya Sinha Assignee: Anindya Sinha When a {{DESTROY}} (for shared volume) is processed in the master actor, we rescind pending offers to which the volume to be destroyed is already offered to. Before allocator executes the {{updateAllocation()}} API, offers with the same shared volume can be sent to frameworks since the destroyed shared volume is not removed from {{slaves.total}} till {{updateAllocation()}} completes. As a result, the following check can fail: {code} CHECK_EQ( frameworkAllocation.flatten().createStrippedScalarQuantity(), updatedFrameworkAllocation.flatten().createStrippedScalarQuantity()); {code} We need to address this condition by not failing the {{CHECK_EQ}}, and also ensuring that the master's state is restored to honor the {{DESTROY}} of the shared volume. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7307) Fix Windows build break by stat.hpp changes
Andrew Schwartzmeyer created MESOS-7307: --- Summary: Fix Windows build break by stat.hpp changes Key: MESOS-7307 URL: https://issues.apache.org/jira/browse/MESOS-7307 Project: Mesos Issue Type: Bug Environment: Windows 10 Reporter: Andrew Schwartzmeyer Assignee: Andrew Schwartzmeyer Commit 5f159cdcb introduced `return Error(...)` logic to functions which return `bool`, not `Try` in stat.hpp, which broke the Windows build. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7306) Support mount propagation for Volumes.
Jie Yu created MESOS-7306: - Summary: Support mount propagation for Volumes. Key: MESOS-7306 URL: https://issues.apache.org/jira/browse/MESOS-7306 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Jie Yu Currently, all mounts in a container are marked as 'slave' by default. However, for some cases, we may want mounts under certain directory in a container to be propagate back to the root mount namespace. This is useful for the case where we want the mounts to survive container failures. See more documentation about mount propagation in: https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-5995) Protobuf JSON deserialisation does not accept numbers formated as strings
[ https://issues.apache.org/jira/browse/MESOS-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-5995: --- Priority: Critical (was: Minor) > Protobuf JSON deserialisation does not accept numbers formated as strings > - > > Key: MESOS-5995 > URL: https://issues.apache.org/jira/browse/MESOS-5995 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 1.0.0 >Reporter: Tomasz Janiszewski >Assignee: Tomasz Janiszewski >Priority: Critical > > Proto2 does not specify JSON mappings but > [Proto3|https://developers.google.com/protocol-buffers/docs/proto3#json] does > and it recommend to map 64bit numbers as a string. Unfortunately Mesos does > not accepts strings in places of uint64 and return 400 Bad > {quote} > Request error Failed to convert JSON into Call protobuf: Not expecting a JSON > string for field 'value'. > {quote} > Is this by purpose or is this a bug? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7303) Support Isolator capabilities.
[ https://issues.apache.org/jira/browse/MESOS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7303: -- Description: Currently, isolators have one capability: whether it supports nesting or not. To support launching containers that are not tied to Mesos tasks or executors (standalone containers), we need to add another capability to the Isolator interface so that we can avoid invoking those isolators that are not yet support that when launching standalone containers. (was: Currently, isolators have one capability: whether it supports nesting or not. To support launching containers that are not tied to Mesos tasks or executors, we need to add another capability to the Isolator interface so that we can avoid invoking those isolators that are not yet support that when launching such containers.) > Support Isolator capabilities. > -- > > Key: MESOS-7303 > URL: https://issues.apache.org/jira/browse/MESOS-7303 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu > > Currently, isolators have one capability: whether it supports nesting or not. > To support launching containers that are not tied to Mesos tasks or executors > (standalone containers), we need to add another capability to the Isolator > interface so that we can avoid invoking those isolators that are not yet > support that when launching standalone containers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7302) Support launching standalone containers.
[ https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7302: -- Epic Name: Standalone Container (was: Taskless and Executorless Containers) > Support launching standalone containers. > > > Key: MESOS-7302 > URL: https://issues.apache.org/jira/browse/MESOS-7302 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Jie Yu > > Containerizer should support launching containers (both top level and nested) > that are not tied to a particular Mesos task or executor. This is for the > case where the agent wants to launch some system containers (e.g., for CSI > plugin) that will be managed by Mesos. > More specifically, the Containerizer interfaces should be refactored so that > they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` > interface depends on them. Instead, we should consistently use ContainerInfo > and CommandInfo in Containerizer and isolators. > This is also one necessary step towards running MesosContainerizer in > standalone mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7305) Adjust the recover logic of MesosContainerizer to allow standalone containers.
Jie Yu created MESOS-7305: - Summary: Adjust the recover logic of MesosContainerizer to allow standalone containers. Key: MESOS-7305 URL: https://issues.apache.org/jira/browse/MESOS-7305 Project: Mesos Issue Type: Task Components: containerization Reporter: Jie Yu The current recovery logic in MesosContainerizer assumes that all top level containers are tied to some Mesos executors. Adding standalone containers will invalid this assumption. The recovery logic must be changed to adapt to that. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7302) Support launching standalone containers.
[ https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7302: -- Summary: Support launching standalone containers. (was: Support launching containers that are not tied to Mesos tasks or executors.) > Support launching standalone containers. > > > Key: MESOS-7302 > URL: https://issues.apache.org/jira/browse/MESOS-7302 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Jie Yu > > Containerizer should support launching containers (both top level and nested) > that are not tied to a particular Mesos task or executor. This is for the > case where the agent wants to launch some system containers (e.g., for CSI > plugin) that will be managed by Mesos. > More specifically, the Containerizer interfaces should be refactored so that > they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` > interface depends on them. Instead, we should consistently use ContainerInfo > and CommandInfo in Containerizer and isolators. > This is also one necessary step towards running MesosContainerizer in > standalone mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7302) Support launching containers that are not tied to Mesos tasks or executors.
[ https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7302: -- Description: Containerizer should support launching containers (both top level and nested) that are not tied to a particular Mesos task or executor. This is for the case where the agent wants to launch some system containers (e.g., for CSI plugin) that will be managed by Mesos. More specifically, the Containerizer interfaces should be refactored so that they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` interface depends on them. Instead, we should consistently use ContainerInfo and CommandInfo in Containerizer and isolators. This is also one necessary step towards running MesosContainerizer in standalone mode. was: Containerizer should support launching containers (both top level and nested) that are not tied to a particular Mesos task or executor. This is for the case where the agent wants to launch some system containers (e.g., for CSI plugin) that will be managed by Mesos. More specifically, the Containerizer interfaces should be refactored so that they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` interface depends on them. Instead, we should consistently use ContainerInfo and CommandInfo in Containerizer and isolators. > Support launching containers that are not tied to Mesos tasks or executors. > --- > > Key: MESOS-7302 > URL: https://issues.apache.org/jira/browse/MESOS-7302 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Jie Yu > > Containerizer should support launching containers (both top level and nested) > that are not tied to a particular Mesos task or executor. This is for the > case where the agent wants to launch some system containers (e.g., for CSI > plugin) that will be managed by Mesos. > More specifically, the Containerizer interfaces should be refactored so that > they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` > interface depends on them. Instead, we should consistently use ContainerInfo > and CommandInfo in Containerizer and isolators. > This is also one necessary step towards running MesosContainerizer in > standalone mode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7304) Fetcher should not depends on SlaveID.
Jie Yu created MESOS-7304: - Summary: Fetcher should not depends on SlaveID. Key: MESOS-7304 URL: https://issues.apache.org/jira/browse/MESOS-7304 Project: Mesos Issue Type: Task Components: containerization, fetcher Reporter: Jie Yu Currently, various Fetcher interfaces depends on SlaveID, which is an unnecessary coupling. For instance: {code} Try Fetcher::recover(const SlaveID& slaveId, const Flags& flags); Future Fetcher::fetch( const ContainerID& containerId, const CommandInfo& commandInfo, const string& sandboxDirectory, const Option& user, const SlaveID& slaveId, const Flags& flags); {code} Looks like the only reason we need a SlaveID is because we need to calculate the fetcher cache directory based on that. We should calculate the fetcher cache directory in the caller and pass that directory to Fetcher. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940982#comment-15940982 ] Aaron Wood commented on MESOS-6127: --- Current design doc is here https://docs.google.com/document/d/1vD8x2l5X6LzrHMCIOcWnx8ky_nT_Y2UIrQUXQB7DdLw/edit I don't have the bandwidth at work to take this on right now but I believe [~ipronin] was interested in taking this further. > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. > Some ideas for libs: > https://nghttp2.org/documentation/package_README.html - Has encoders/decoders > supporting HPACK https://nghttp2.org/documentation/tutorial-hpack.html > https://nghttp2.org/documentation/libnghttp2_asio.html - Currently marked as > experimental by the nghttp2 docs -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7303) Support Isolator capabilities.
Jie Yu created MESOS-7303: - Summary: Support Isolator capabilities. Key: MESOS-7303 URL: https://issues.apache.org/jira/browse/MESOS-7303 Project: Mesos Issue Type: Task Components: containerization Reporter: Jie Yu Currently, isolators have one capability: whether it supports nesting or not. To support launching containers that are not tied to Mesos tasks or executors, we need to add another capability to the Isolator interface so that we can avoid invoking those isolators that are not yet support that when launching such containers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7302) Support launching containers that are not tied to Mesos tasks or executors.
[ https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7302: -- Description: Containerizer should support launching containers (both top level and nested) that are not tied to a particular Mesos task or executor. This is for the case where the agent wants to launch some system containers (e.g., for CSI plugin) that will be managed by Mesos. More specifically, the Containerizer interfaces should be refactored so that they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` interface depends on them. Instead, we should consistently use ContainerInfo and CommandInfo in Containerizer and isolators. was: Containerizer should support launching containers (both top level and nested) that are not tied to a particular Mesos task or executor. This is for the case where the agent wants to launch some system containers (e.g., for CSI plugin) that will be managed by Mesos. More specifically, the Containerizer interfaces should be refactored so that they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` interface depends on them. Instead, we should consistently use ContainerInfo and CommandInfo in Containerizer and isolators. > Support launching containers that are not tied to Mesos tasks or executors. > --- > > Key: MESOS-7302 > URL: https://issues.apache.org/jira/browse/MESOS-7302 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Jie Yu > > Containerizer should support launching containers (both top level and nested) > that are not tied to a particular Mesos task or executor. This is for the > case where the agent wants to launch some system containers (e.g., for CSI > plugin) that will be managed by Mesos. > More specifically, the Containerizer interfaces should be refactored so that > they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` > interface depends on them. Instead, we should consistently use ContainerInfo > and CommandInfo in Containerizer and isolators. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7302) Support launching containers that are not tied to Mesos tasks or executors.
Jie Yu created MESOS-7302: - Summary: Support launching containers that are not tied to Mesos tasks or executors. Key: MESOS-7302 URL: https://issues.apache.org/jira/browse/MESOS-7302 Project: Mesos Issue Type: Epic Components: containerization Reporter: Jie Yu Containerizer should support launching containers (both top level and nested) that are not tied to a particular Mesos task or executor. This is for the case where the agent wants to launch some system containers (e.g., for CSI plugin) that will be managed by Mesos. More specifically, the Containerizer interfaces should be refactored so that they do not depend on TaskInfo or ExecutorInfo. Currently, the `launch` interface depends on them. Instead, we should consistently use ContainerInfo and CommandInfo in Containerizer and isolators. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7279) State Diagrams for V1 Schedulers & Executors
[ https://issues.apache.org/jira/browse/MESOS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940901#comment-15940901 ] Chun-Hung Hsiao commented on MESOS-7279: Is it more clear now? > State Diagrams for V1 Schedulers & Executors > > > Key: MESOS-7279 > URL: https://issues.apache.org/jira/browse/MESOS-7279 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Minor > Labels: documentation > > State diagrams for schedulers' and executors' finite state machines in the > Mesos V1 Framework API to show that what events they would receive under what > situations. For example, when a scheduler is in a certain state, it would > only receive certain events from the master or connected/disconnected events, > and the diagram can show that what action it should take for each scenario. > This would be useful for framework developers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7279) State Diagrams for V1 Schedulers & Executors
[ https://issues.apache.org/jira/browse/MESOS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940767#comment-15940767 ] Anand Mazumdar commented on MESOS-7279: --- It's not immediately clear from the ticket what type of state diagrams are we looking for. Can we modify the description accordingly? > State Diagrams for V1 Schedulers & Executors > > > Key: MESOS-7279 > URL: https://issues.apache.org/jira/browse/MESOS-7279 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Minor > Labels: documentation > > State diagrams for schedulers and executors in the Mesos V1 Framework API to > show that what events they would receive under what situations. This would be > useful for framework developers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7279) State Diagrams for V1 Schedulers & Executors
[ https://issues.apache.org/jira/browse/MESOS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940750#comment-15940750 ] Chun-Hung Hsiao commented on MESOS-7279: Sure but I'll put it at a lower priority ;) > State Diagrams for V1 Schedulers & Executors > > > Key: MESOS-7279 > URL: https://issues.apache.org/jira/browse/MESOS-7279 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Minor > Labels: documentation > > State diagrams for schedulers and executors in the Mesos V1 Framework API to > show that what events they would receive under what situations. This would be > useful for framework developers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7279) State Diagrams for V1 Schedulers & Executors
[ https://issues.apache.org/jira/browse/MESOS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao reassigned MESOS-7279: -- Assignee: Chun-Hung Hsiao > State Diagrams for V1 Schedulers & Executors > > > Key: MESOS-7279 > URL: https://issues.apache.org/jira/browse/MESOS-7279 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Minor > Labels: documentation > > State diagrams for schedulers and executors in the Mesos V1 Framework API to > show that what events they would receive under what situations. This would be > useful for framework developers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7271) JNI SIGSEGV failed when connecting spark to mesos master
[ https://issues.apache.org/jira/browse/MESOS-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940744#comment-15940744 ] Michael Gummelt commented on MESOS-7271: I don't know, but I've been running Spark 2.1 against Mesos 1.2 w/o any problems, so I can't repro this. > JNI SIGSEGV failed when connecting spark to mesos master > > > Key: MESOS-7271 > URL: https://issues.apache.org/jira/browse/MESOS-7271 > Project: Mesos > Issue Type: Bug > Components: java api >Affects Versions: 1.1.0, 1.2.0 > Environment: Ubuntu 16.04, OpenJDK 8, Spark 2.1.1 >Reporter: Qi Cui > > Run starting. Expected test count is: 1 > SampleDataFrameTest: > 17/03/20 11:53:16 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0320 11:53:19.775842 4679 process.cpp:1071] libprocess is initialized on > 192.168.0.99:38293 with 8 worker threads > I0320 11:53:19.775975 4679 logging.cpp:199] Logging to STDERR > I0320 11:53:19.789871 4725 sched.cpp:226] Version: 1.1.0 > I0320 11:53:19.832826 4717 sched.cpp:330] New master detected at > master@192.168.0.50:5050 > I0320 11:53:19.838253 4717 sched.cpp:341] No credentials provided. > Attempting to register without authentication > I0320 11:53:19.838337 4717 sched.cpp:820] Sending SUBSCRIBE call to > master@192.168.0.50:5050 > I0320 11:53:19.840265 4717 sched.cpp:853] Will retry registration in > 32.354951ms if necessary > I0320 11:53:19.844734 4717 sched.cpp:743] Framework registered with > 6e147824-5d88-411b-9c09-a7137565c309-0001 > I0320 11:53:19.864850 4717 sched.cpp:757] Scheduler::registered took > 20.022604ms > ERROR: exception pending on entry to FindMesosClass() > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7ffa06fea4a6, pid=4677, tid=0x7ff9a1a46700 > # > # JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build > 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.so+0x6744a6] > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /media/sf_G_DRIVE/src/spark-testing-base/hs_err_pid4677.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7277) General checker does not support command checks via agent.
[ https://issues.apache.org/jira/browse/MESOS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940605#comment-15940605 ] Gastón Kleiman commented on MESOS-7277: --- https://reviews.apache.org/r/57912/ > General checker does not support command checks via agent. > -- > > Key: MESOS-7277 > URL: https://issues.apache.org/jira/browse/MESOS-7277 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Gastón Kleiman > Labels: health-check, mesosphere > > Command checks via agent are necessary for executors, that launch their tasks > via agent, e.g., default executor. General checker should support launching > command as nested containers via agent in order to be used by such executors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7301) CommandExecutorTest.NoTransitionFromKillingToRunning is flaky.
Alexander Rukletsov created MESOS-7301: -- Summary: CommandExecutorTest.NoTransitionFromKillingToRunning is flaky. Key: MESOS-7301 URL: https://issues.apache.org/jira/browse/MESOS-7301 Project: Mesos Issue Type: Bug Components: test Affects Versions: 1.3.0 Environment: Mac Mini with Mac OS 10.11.6 with SSL enabled Reporter: Alexander Rukletsov I see {{CommandExecutorTest.NoTransitionFromKillingToRunning}} failing often on Mac. According to the logs, the task is not transitioning to {{KILLED}} from {{KILLING}}. The reason is however unclear at first glance. >From a single `make check` session: * "good" run http://pastebin.com/88Ar34Lz * "bad" run http://pastebin.com/RKMmYV8z I've seen both versions of the test (with HTTP and non-HTTP) command executor fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7290) make fails at protobuf stage
[ https://issues.apache.org/jira/browse/MESOS-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940442#comment-15940442 ] Till Toenshoff edited comment on MESOS-7290 at 3/24/17 2:42 PM: [~rharnasch] the compilation error {{g++: internal compiler error: Killed (program cc1plus)}} is indeed a common problem when using too many compilation processes for the available RAM on your build machine. You were obviously hitting two problems and the internal error should be entirely unrelated to the python egg / proto problem. I do not have exact, current numbers but generally g++ needs more than clang++. As a rule of thumb, make sure that you have about 2gb per compilation process for clang++ and 2.5gb for g++. You can adjust the number of processes by providing the {{j N}} flag to {{make}}. So for 16gb of RAM with clang++ you may go all the way up to {{j 8}} - certainly also depending on the amount of cores/threads on the build machines CPU. Numbers may vary for your system... If it fails again, reduce {{N}}. The python egg version comparison issues, to me, indeed sounds like a bug that should get triaged and fixed. was (Author: tillt): [~rharnasch] the compilation error {{g++: internal compiler error: Killed (program cc1plus)}} is indeed a common problem when using too many compilation processes for the available RAM on your build machine. I do not have exact, current numbers but generally g++ needs more than clang++. As a rule of thumb, make sure that you have about 2gb per compilation process for clang++ and 2.5gb for g++. You can adjust the number of processes by providing the {{j N}} flag to {{make}}. So for 16gb of RAM with clang++ you may go all the way up to {{j 8}} - certainly also depending on the amount of cores/threads on the build machines CPU. Numbers may vary for your system... If it fails again, reduce {{N}}. > make fails at protobuf stage > > > Key: MESOS-7290 > URL: https://issues.apache.org/jira/browse/MESOS-7290 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: CentOS 7.3 (built from 1611 image) >Reporter: Raul Harnasch >Assignee: Kapil Arya > > {noformat} > Building protobuf Python egg ... > cd ../3rdparty/protobuf-2.6.1/python && \ > CC="gcc"\ > CXX="g++" \ > CFLAGS="-g1 -O0 -Wno-unused-local-typedefs" \ > CXXFLAGS="-g1 -O0 -Wno-unused-local-typedefs -std=c++11" > \ > PYTHONPATH=/opt/mesos/build/3rdparty/setuptools-20.9.0 \ > /bin/python setup.py build bdist_egg > Installed > /opt/mesos/build/3rdparty/protobuf-2.6.1/python/.eggs/google_apputils-0.4.2-py2.7.egg > Traceback (most recent call last): > File "setup.py", line 200, in > "Protocol Buffers are Google's data interchange format.", > File "/usr/lib64/python2.7/distutils/core.py", line 112, in setup > _setup_distribution = dist = klass(attrs) > File "/opt/mesos/build/3rdparty/setuptools-20.9.0/setuptools/dist.py", line > 269, in __init__ > self.fetch_build_eggs(attrs['setup_requires']) > File "/opt/mesos/build/3rdparty/setuptools-20.9.0/setuptools/dist.py", line > 313, in fetch_build_eggs > replace_conflicting=True, > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 826, in resolve > dist = best[req.key] = env.best_match(req, ws, installer) > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 1085, in best_match > dist = working_set.find(req) > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 695, in find > raise VersionConflict(dist, req) > pkg_resources.VersionConflict: (pytz 2012d > (/usr/lib/python2.7/site-packages), Requirement.parse('pytz>=2010')) > make[2]: *** > [../3rdparty/protobuf-2.6.1/python/dist/protobuf-2.6.1-py2.7.egg] Error 1 > make[2]: *** Waiting for unfinished jobs > make[2]: Leaving directory `/opt/mesos/build/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/opt/mesos/build/src' > make: *** [all-recursive] Error 1 > {noformat} > Looks like a dependency issue, but as the error suggests, I have pytz 2012d > when the minimum requirement is 2010. Pip confirms this: > {noformat} > $ pip freeze | grep pytz > pytz===2012d > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7290) make fails at protobuf stage
[ https://issues.apache.org/jira/browse/MESOS-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940442#comment-15940442 ] Till Toenshoff commented on MESOS-7290: --- [~rharnasch] the compilation error {{g++: internal compiler error: Killed (program cc1plus)}} is indeed a common problem when using too many compilation processes for the available RAM on your build machine. I do not have exact, current numbers but generally g++ needs more than clang++. As a rule of thumb, make sure that you have about 2gb per compilation process for clang++ and 2.5gb for g++. You can adjust the number of processes by providing the {{j N}} flag to {{make}}. So for 16gb of RAM with clang++ you may go all the way up to {{j 8}} - certainly also depending on the amount of cores/threads on the build machines CPU. Numbers may vary for your system... If it fails again, reduce {{N}}. > make fails at protobuf stage > > > Key: MESOS-7290 > URL: https://issues.apache.org/jira/browse/MESOS-7290 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: CentOS 7.3 (built from 1611 image) >Reporter: Raul Harnasch >Assignee: Kapil Arya > > {noformat} > Building protobuf Python egg ... > cd ../3rdparty/protobuf-2.6.1/python && \ > CC="gcc"\ > CXX="g++" \ > CFLAGS="-g1 -O0 -Wno-unused-local-typedefs" \ > CXXFLAGS="-g1 -O0 -Wno-unused-local-typedefs -std=c++11" > \ > PYTHONPATH=/opt/mesos/build/3rdparty/setuptools-20.9.0 \ > /bin/python setup.py build bdist_egg > Installed > /opt/mesos/build/3rdparty/protobuf-2.6.1/python/.eggs/google_apputils-0.4.2-py2.7.egg > Traceback (most recent call last): > File "setup.py", line 200, in > "Protocol Buffers are Google's data interchange format.", > File "/usr/lib64/python2.7/distutils/core.py", line 112, in setup > _setup_distribution = dist = klass(attrs) > File "/opt/mesos/build/3rdparty/setuptools-20.9.0/setuptools/dist.py", line > 269, in __init__ > self.fetch_build_eggs(attrs['setup_requires']) > File "/opt/mesos/build/3rdparty/setuptools-20.9.0/setuptools/dist.py", line > 313, in fetch_build_eggs > replace_conflicting=True, > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 826, in resolve > dist = best[req.key] = env.best_match(req, ws, installer) > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 1085, in best_match > dist = working_set.find(req) > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 695, in find > raise VersionConflict(dist, req) > pkg_resources.VersionConflict: (pytz 2012d > (/usr/lib/python2.7/site-packages), Requirement.parse('pytz>=2010')) > make[2]: *** > [../3rdparty/protobuf-2.6.1/python/dist/protobuf-2.6.1-py2.7.egg] Error 1 > make[2]: *** Waiting for unfinished jobs > make[2]: Leaving directory `/opt/mesos/build/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/opt/mesos/build/src' > make: *** [all-recursive] Error 1 > {noformat} > Looks like a dependency issue, but as the error suggests, I have pytz 2012d > when the minimum requirement is 2010. Pip confirms this: > {noformat} > $ pip freeze | grep pytz > pytz===2012d > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7290) make fails at protobuf stage
[ https://issues.apache.org/jira/browse/MESOS-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940418#comment-15940418 ] Raul Harnasch commented on MESOS-7290: -- Yes. I'm telling you, the error has to do with a comparison in the version numbers. {noformat} pkg_resources.VersionConflict: (pytz 2012d (/usr/lib/python2.7/site-packages), Requirement.parse('pytz>=2010')) {noformat} That error -- at least to me -- looks like it wants a version >= 2010. But, obviously, I have 2012d. The comparison doesn't like the {{d}} in {{2012d}}. Given that I can go in and change the version number listed in the egg-info file within the pytz library, and mesos builds, tells me that the issue has nothing to do with what dependancies I have installed and everything to do with the version checking that's going on. It failed without reason. > make fails at protobuf stage > > > Key: MESOS-7290 > URL: https://issues.apache.org/jira/browse/MESOS-7290 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: CentOS 7.3 (built from 1611 image) >Reporter: Raul Harnasch >Assignee: Kapil Arya > > {noformat} > Building protobuf Python egg ... > cd ../3rdparty/protobuf-2.6.1/python && \ > CC="gcc"\ > CXX="g++" \ > CFLAGS="-g1 -O0 -Wno-unused-local-typedefs" \ > CXXFLAGS="-g1 -O0 -Wno-unused-local-typedefs -std=c++11" > \ > PYTHONPATH=/opt/mesos/build/3rdparty/setuptools-20.9.0 \ > /bin/python setup.py build bdist_egg > Installed > /opt/mesos/build/3rdparty/protobuf-2.6.1/python/.eggs/google_apputils-0.4.2-py2.7.egg > Traceback (most recent call last): > File "setup.py", line 200, in > "Protocol Buffers are Google's data interchange format.", > File "/usr/lib64/python2.7/distutils/core.py", line 112, in setup > _setup_distribution = dist = klass(attrs) > File "/opt/mesos/build/3rdparty/setuptools-20.9.0/setuptools/dist.py", line > 269, in __init__ > self.fetch_build_eggs(attrs['setup_requires']) > File "/opt/mesos/build/3rdparty/setuptools-20.9.0/setuptools/dist.py", line > 313, in fetch_build_eggs > replace_conflicting=True, > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 826, in resolve > dist = best[req.key] = env.best_match(req, ws, installer) > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 1085, in best_match > dist = working_set.find(req) > File > "/opt/mesos/build/3rdparty/setuptools-20.9.0/pkg_resources/__init__.py", line > 695, in find > raise VersionConflict(dist, req) > pkg_resources.VersionConflict: (pytz 2012d > (/usr/lib/python2.7/site-packages), Requirement.parse('pytz>=2010')) > make[2]: *** > [../3rdparty/protobuf-2.6.1/python/dist/protobuf-2.6.1-py2.7.egg] Error 1 > make[2]: *** Waiting for unfinished jobs > make[2]: Leaving directory `/opt/mesos/build/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/opt/mesos/build/src' > make: *** [all-recursive] Error 1 > {noformat} > Looks like a dependency issue, but as the error suggests, I have pytz 2012d > when the minimum requirement is 2010. Pip confirms this: > {noformat} > $ pip freeze | grep pytz > pytz===2012d > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7300) Mesos failed to build on Windows due to error C2440: 'return': cannot convert from 'Error' to 'bool'
Karen Huang created MESOS-7300: -- Summary: Mesos failed to build on Windows due to error C2440: 'return': cannot convert from 'Error' to 'bool' Key: MESOS-7300 URL: https://issues.apache.org/jira/browse/MESOS-7300 Project: Mesos Issue Type: Bug Components: build Environment: Windows Server 2012 R2 + VS2015 Update 3 Reporter: Karen Huang Priority: Blocker I try to build Mesos (master branch revision 322300f) with VS2015 Update 3 on Windows. It failed to build with the following error: D:\Mesos\src\3rdparty\stout\include\stout/os/windows/stat.hpp(41): error C2440: 'return': cannot convert from 'Error' to 'bool' (compiling source file D:\Mesos\src\3rdparty\libprocess\src\time.cpp) [D:\Mesos\build_x64\3rdparty\libprocess\src\process-0.0.1.vcxproj] D:\Mesos\src\3rdparty\stout\include\stout/os/windows/stat.hpp(59): error C2440: 'return': cannot convert from 'Error' to 'bool' (compiling source file D:\Mesos\src\3rdparty\libprocess\src\time.cpp) [D:\Mesos\build_x64\3rdparty\libprocess\src\process-0.0.1.vcxproj] This issue starts to be reproduce form master branch revision "82e4077" (https://github.com/apache/mesos/commit/82e4077ceb40e84c2796be43f1448eec0bfd7c69#diff-76a72473075f57f8d0d3b3bf6f150672) I presume this is a issue in your source code. The function "inline bool isdir( const std::string& path, const FollowSymlink follow = FOLLOW_SYMLINK)" needs a bool type return value. But the return value type of "Error("Non-following stat not supported for '" + path + "'")" is not bool. In D:\Mesos\src\3rdparty\stout\include\stout\os\windows\stat.hpp file: inline bool isdir( const std::string& path, const FollowSymlink follow = FOLLOW_SYMLINK) { struct _stat s; if (follow == DO_NOT_FOLLOW_SYMLINK) { return Error("Non-following stat not supported for '" + path + "'"); } if (::_stat(path.c_str(), ) < 0) { return false; } return S_ISDIR(s.st_mode); } In D:\Mesos\src\3rdparty\stout\include\stout\errorbase.hpp file: class Error { public: explicit Error(const std::string& _message) : message(_message) {} const std::string message; }; -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7181) Stale frameworks seen on Mesos, but not known to scheduler
[ https://issues.apache.org/jira/browse/MESOS-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939922#comment-15939922 ] Yan Xu commented on MESOS-7181: --- Yeah so I meant that on the receiving end the process manager doesn't know whether an actor is being {{link}} ed or not so it has to send {{TargetPIDExited}} in all situations, this is different than the current local {{PID}} behavior. Also this message is sent not when the actor dies but when a message arrives, so I guess if a frameworks dies when it's suppressed and with no pending status updates, the master will not find out about it because it doesn't send messages? Perhaps we can have a {{Link}} message sent to the linkee based on which it can send a special {{Exited}} message to the sender when the actor terminates? > Stale frameworks seen on Mesos, but not known to scheduler > -- > > Key: MESOS-7181 > URL: https://issues.apache.org/jira/browse/MESOS-7181 > Project: Mesos > Issue Type: Bug > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha > > Using a scheduler which launches multiple frameworks using scheduler driver, > we observe occasionally that a framework exists on Mesos which is not known > to the scheduler. Since there is no entity that acts on the offers, this > framework ends up hogging all the offers leading to starvation in the cluster. > This particular scenario is as follows: > 1) Scheduler does a driver.start() which results in the 1st SUBSCRIBE sent to > master. > 2) The scheduler driver resends the SUBSCRIBE (since the framework has not > yet registered) which is a result of the exponential backoff. > 3) Framework is registered based on the 1st SUBSCRIBE, but the scheduler > issues a driver.stop() immediately which results in a TEARDOWN sent to the > master. > 4) Master processes the TEARDOWN which removes the framework. > 5) Master now processes the 2nd SUBSCRIBE (after authorization) and tries to > add this framework. This succeeds and a new framework id is generated (since > the original framework is no longer registered after the TEARDOWN) but the > Scheduler driver by now has already terminated once the scheduler issued the > driver.stop(). So, master continues to send offers to this 2nd framework and > hogs on to offers till offer time out. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7263) User supplied task environment variables cause warnings in sandbox stdout.
[ https://issues.apache.org/jira/browse/MESOS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7263: --- Story Points: 3 (was: 5) > User supplied task environment variables cause warnings in sandbox stdout. > -- > > Key: MESOS-7263 > URL: https://issues.apache.org/jira/browse/MESOS-7263 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Affects Versions: 1.2.0 >Reporter: Till Toenshoff >Assignee: Till Toenshoff > Labels: mesosphere > Fix For: 1.2.1, 1.3.0 > > > The default executor causes task/command environment variables to get > duplicated internally, causing warnings in the resulting sandbox {{stdout}}. > {noformat} > $ ./src/mesos-execute --name="test" --env='{"key1":"value1"}' > --command='sleep 1000' --master=127.0.0.1:5050 > {noformat} > Result in {{stdout}} of the sandbox: > {noformat} > Overwriting environment variable 'key1', original: 'value1', new: 'value1' > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-6951) Docker containerizer: mangled environment when env value contains LF byte.
[ https://issues.apache.org/jira/browse/MESOS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6951: --- Summary: Docker containerizer: mangled environment when env value contains LF byte. (was: Docker containerizer: mangled environment when env value contains LF byte) > Docker containerizer: mangled environment when env value contains LF byte. > -- > > Key: MESOS-6951 > URL: https://issues.apache.org/jira/browse/MESOS-6951 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Jan-Philip Gehrcke >Assignee: Till Toenshoff > Labels: mesosphere > Fix For: 1.2.1, 1.3.0 > > > Consider this Marathon app definition: > {code} > { > "id": "/testapp", > "cmd": "env && tail -f /dev/null", > "env":{ > "TESTVAR":"line1\nline2" > }, > "cpus": 0.1, > "mem": 10, > "instances": 1, > "container": { > "type": "DOCKER", > "docker": { > "image": "alpine" > } > } > } > {code} > The JSON-encoded newline in the value of the {{TESTVAR}} environment variable > leads to a corrupted task environment. What follows is a subset of the > resulting task environment (as printed via {{env}}, i.e. in key=value > notation): > {code} > line2= > TESTVAR=line1 > {code} > That is, the trailing part of the intended value ended up being interpreted > as variable name, and only the leading part of the intended value was used as > actual value for {{TESTVAR}}. > Common application scenarios that would badly break with that involve > pretty-printed JSON documents or YAML documents passed along via the > environment. > Following the code and information flow led to the conclusion that Docker's > {{--env-file}} command line interface is the weak point in the flow. It is > currently used in Mesos' Docker containerizer for passing the environment to > the container: > {code} > argv.push_back("--env-file"); > argv.push_back(environmentFile); > {code} > (Ref: > [code|https://github.com/apache/mesos/blob/c0aee8cc10b1d1f4b2db5ff12b771372fdd5b1f3/src/docker/docker.cpp#L584]) > Docker's {{--env-file}} argument behavior is documented via > {quote} > The --env-file flag takes a filename as an argument > and expects each line to be in the VAR=VAL format, > {quote} > (Ref: https://docs.docker.com/engine/reference/commandline/run/) > That is, Docker identifies individual environment variable key/value pair > definitions based on newline bytes in that file which explains the observed > environment variable value fragmentation. Notably, Docker does not provide a > mechanism for escaping newline bytes in the values specified in this > environment file. > I think it is important to understand that Docker's {{--env-file}} mechanism > is ill-posed in the sense that it is not capable of transmitting the whole > range of environment variable values allowed by POSIX. That's what the Single > UNIX Specification, Version 3 has to say about environment variable values: > {quote} > the value shall be composed of characters from the > portable character set (except NUL and as indicated below). > {quote} > (Ref: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html) > About "The portable character set": > http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html#tagtcjh_3 > It includes (among others) the LF byte. Understandably, the current Docker > {{--env-file}} behavior will not change, so this is not an issue that can be > deferred to Docker: https://github.com/docker/docker/issues/12997 > Notably, the {{--env-file}} method for communicating environment variables to > Docker containers was just recently introduced to Mesos as of > https://issues.apache.org/jira/browse/MESOS-6566, for not leaking secrets > through the process listing. Previously, we specified env key/value pairs on > the command line which leaked secrets to the process list and probably also > did not support the full range of valid environment variable values. > We need a solution that > 1) does not leak sensitive values (i.e. is compliant with MESOS-6566). > 2) allows for passing arbitrary environment variable values. > It seems that Docker's {{--env}} method can be used for that. It can be used > to define _just the names of the environment variables_ to-be-passed-along, > in which case the docker binary will read the corresponding values from its > own environment, which we can clearly prepare appropriately when we invoke > the corresponding child process. This method would still leak environment > variable _names_ to the process listing, but (especially if documented) this > should be fine. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7265) Containerizer startup may cause sensitive data to leak into sandbox logs.
[ https://issues.apache.org/jira/browse/MESOS-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7265: --- Fix Version/s: 1.2.1 > Containerizer startup may cause sensitive data to leak into sandbox logs. > - > > Key: MESOS-7265 > URL: https://issues.apache.org/jira/browse/MESOS-7265 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Affects Versions: 1.2.0 >Reporter: Till Toenshoff >Assignee: Till Toenshoff > Labels: mesosphere > Fix For: 1.2.1, 1.3.0 > > > The task sandbox logging does show the callup for the containerizer launch > with all of its flags. > This is not safe when assuming that we may not want to leak sensitive data > into the sandbox logging. > Example: > {noformat} > Received SUBSCRIBED event > Subscribed executor on lobomacpro2.fritz.box > Received LAUNCH event > Starting task test > /Users/till/Development/mesos-private/build/src/mesos-containerizer launch > --help="false" > --launch_info="{"command":{"environment":{"variables":[{"name":"key1","type":"VALUE","value":"value1"}]},"shell":true,"value":"sleep > > 1000"},"environment":{"variables":[{"name":"BIN_SH","type":"VALUE","value":"xpg4"},{"name":"DUALCASE","type":"VALUE","value":"1"},{"name":"DYLD_LIBRARY_PATH","type":"VALUE","value":"\/Users\/till\/Development\/mesos-private\/build\/src\/.libs"},{"name":"LIBPROCESS_PORT","type":"VALUE","value":"0"},{"name":"MESOS_AGENT_ENDPOINT","type":"VALUE","value":"192.168.178.20:5051"},{"name":"MESOS_CHECKPOINT","type":"VALUE","value":"0"},{"name":"MESOS_DIRECTORY","type":"VALUE","value":"\/tmp\/mesos\/slaves\/816619b6-f5ce-42d6-ad6b-2ef2001adc0a-S0\/frameworks\/4c8a82d4-8a5b-47f5-a660-5fef15da71a5-\/executors\/test\/runs\/b4bd0251-b42a-4ab3-9f02-60ede75bf3b1"},{"name":"MESOS_EXECUTOR_ID","type":"VALUE","value":"test"},{"name":"MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD","type":"VALUE","value":"5secs"},{"name":"MESOS_FRAMEWORK_ID","type":"VALUE","value":"4c8a82d4-8a5b-47f5-a660-5fef15da71a5-"},{"name":"MESOS_HTTP_COMMAND_EXECUTOR","type":"VALUE","value":"0"},{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/mesos\/slaves\/816619b6-f5ce-42d6-ad6b-2ef2001adc0a-S0\/frameworks\/4c8a82d4-8a5b-47f5-a660-5fef15da71a5-\/executors\/test\/runs\/b4bd0251-b42a-4ab3-9f02-60ede75bf3b1"},{"name":"MESOS_SLAVE_ID","type":"VALUE","value":"816619b6-f5ce-42d6-ad6b-2ef2001adc0a-S0"},{"name":"MESOS_SLAVE_PID","type":"VALUE","value":"slave(1)@192.168.178.20:5051"},{"name":"PATH","type":"VALUE","value":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin"},{"name":"PWD","type":"VALUE","value":"\/private\/tmp\/mesos\/slaves\/816619b6-f5ce-42d6-ad6b-2ef2001adc0a-S0\/frameworks\/4c8a82d4-8a5b-47f5-a660-5fef15da71a5-\/executors\/test\/runs\/b4bd0251-b42a-4ab3-9f02-60ede75bf3b1"},{"name":"SHLVL","type":"VALUE","value":"0"},{"name":"__CF_USER_TEXT_ENCODING","type":"VALUE","value":"0x1F5:0x0:0x0"},{"name":"key1","type":"VALUE","value":"value1"},{"name":"key1","type":"VALUE","value":"value1"}]}}" > Forked command at 16329 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7263) User supplied task environment variables cause warnings in sandbox stdout.
[ https://issues.apache.org/jira/browse/MESOS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7263: --- Fix Version/s: 1.3.0 > User supplied task environment variables cause warnings in sandbox stdout. > -- > > Key: MESOS-7263 > URL: https://issues.apache.org/jira/browse/MESOS-7263 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Affects Versions: 1.2.0 >Reporter: Till Toenshoff >Assignee: Till Toenshoff > Labels: mesosphere > Fix For: 1.3.0 > > > The default executor causes task/command environment variables to get > duplicated internally, causing warnings in the resulting sandbox {{stdout}}. > {noformat} > $ ./src/mesos-execute --name="test" --env='{"key1":"value1"}' > --command='sleep 1000' --master=127.0.0.1:5050 > {noformat} > Result in {{stdout}} of the sandbox: > {noformat} > Overwriting environment variable 'key1', original: 'value1', new: 'value1' > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7263) User supplied task environment variables cause warnings in sandbox stdout.
[ https://issues.apache.org/jira/browse/MESOS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939880#comment-15939880 ] Alexander Rukletsov commented on MESOS-7263: {noformat} Commit: 71a9feffd768d8857da34e2d6d06cd765403ccbc [71a9fef] Author: Till ToenshoffDate: 24 March 2017 at 06:57:42 GMT+1 Committer: Alexander Rukletsov Commit Date: 24 March 2017 at 07:05:25 GMT+1 Fixed environment duplication in command executor. Review: https://reviews.apache.org/r/57762/ {noformat} > User supplied task environment variables cause warnings in sandbox stdout. > -- > > Key: MESOS-7263 > URL: https://issues.apache.org/jira/browse/MESOS-7263 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Affects Versions: 1.2.0 >Reporter: Till Toenshoff >Assignee: Till Toenshoff > Labels: mesosphere > Fix For: 1.3.0 > > > The default executor causes task/command environment variables to get > duplicated internally, causing warnings in the resulting sandbox {{stdout}}. > {noformat} > $ ./src/mesos-execute --name="test" --env='{"key1":"value1"}' > --command='sleep 1000' --master=127.0.0.1:5050 > {noformat} > Result in {{stdout}} of the sandbox: > {noformat} > Overwriting environment variable 'key1', original: 'value1', new: 'value1' > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)