[jira] [Commented] (MESOS-5821) Clean up the billions of compiler warnings on MSVC
[ https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15518125#comment-15518125 ] Joseph Wu commented on MESOS-5821: -- {code} commit 439db8c36c50fd294b2c978cdc877d9bd77301b3 Author: Daniel PravatDate: Fri Sep 23 18:36:34 2016 -0700 Windows: Fixed warnings in `shell.hpp`. The `spawn` functions return the `intptr_t`. This patch deals with warnings about implicitly casting this to `int`. Review: https://reviews.apache.org/r/52065/ {code} {code} commit 8dcd13d9f4a6023af574e15c1af42b0dd799e847 Author: Daniel Pravat Date: Fri Sep 23 18:43:35 2016 -0700 Windows: Fixed warnings in `windows.hpp`. The `_write` function takes an `unsigned int` type as the third argument. The underlying type of `size_t` depends on the platform architecture. Review: https://reviews.apache.org/r/52193/ {code} {code} commit 8bfb11f0711826e9cb899c7d162cf47de911c719 Author: Daniel Pravat Date: Fri Sep 23 18:49:27 2016 -0700 Fixed warnings in StatisticsTest.Statistics. This removes some implicit double-to-float casts. Review: https://reviews.apache.org/r/52198/ {code} > Clean up the billions of compiler warnings on MSVC > -- > > Key: MESOS-5821 > URL: https://issues.apache.org/jira/browse/MESOS-5821 > Project: Mesos > Issue Type: Bug > Components: slave >Reporter: Alex Clemmer >Assignee: Daniel Pravat > Labels: mesosphere, slave > > Clean builds of Mesos on Windows will result in approximately {{5800 > Warning(s)}} or more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6246) Libprocess links will not generate an ExitedEvent if the socket creation fails
[ https://issues.apache.org/jira/browse/MESOS-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517963#comment-15517963 ] Joseph Wu commented on MESOS-6246: -- | https://reviews.apache.org/r/52180/ | Fix edge case of {{link}} + {{ExitedEvent}} | > Libprocess links will not generate an ExitedEvent if the socket creation fails > -- > > Key: MESOS-6246 > URL: https://issues.apache.org/jira/browse/MESOS-6246 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 0.27.3, 0.28.2, 1.0.1 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: libprocess, mesosphere > > Noticed this while inspecting nearby code for potential races. > Normally, when a libprocess actor (the "linkee") links to a remote process, > it does the following: > 1) Create a socket. > 2) Connect to the remote process (asynchronous). > 3) Check the connection succeeded. > If (2) or (3) fail, the linkee will receive a {{ExitedEvent}}, which > indicates that the link broke. In case (1) fails, there is no > {{ExitedEvent}}: > https://github.com/apache/mesos/blob/7c833abbec9c9e4eb51d67f7a8e7a8d0870825f8/3rdparty/libprocess/src/process.cpp#L1558-L1562 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6245) Driver based schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors.
[ https://issues.apache.org/jira/browse/MESOS-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-6245: -- Description: It seems that the agent code sets {{StatusUpdate}}->{{slave_id}} but does not set the {{TaskStatus}}->{{slave_id}} if it's not already set. On the driver, when we receive such a status update and if it has explicit ACK enabled, it would pass the {{TaskStatus}} to the scheduler. But, the scheduler has no way of acking this update due to {{slave_id}} not being present. Note that, implicit acknowledgements still work since they use the {{slave_id}} from {{StatusUpdate}}. Hence, we never noticed this in our tests as all of them use implicit acknowledgements on the driver. (was: It seems that the driver has an old check relying on the `PID`. The `PID` is always `UPID()` for HTTP based executors. If a scheduler is using explicit acknowledgements, it won't ever be able to acknowledge the update since the driver would clean up the {{uuid}} field! Note that all our tests use implicit acknowledgements and we never got around to catching this issue till Marathon started using the HTTP based executors.) > Driver based schedulers performing explicit acknowledgements cannot > acknowledge updates from HTTP based executors. > -- > > Key: MESOS-6245 > URL: https://issues.apache.org/jira/browse/MESOS-6245 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > Fix For: 1.1.0, 1.0.2 > > > It seems that the agent code sets {{StatusUpdate}}->{{slave_id}} but does not > set the {{TaskStatus}}->{{slave_id}} if it's not already set. On the driver, > when we receive such a status update and if it has explicit ACK enabled, it > would pass the {{TaskStatus}} to the scheduler. But, the scheduler has no way > of acking this update due to {{slave_id}} not being present. Note that, > implicit acknowledgements still work since they use the {{slave_id}} from > {{StatusUpdate}}. Hence, we never noticed this in our tests as all of them > use implicit acknowledgements on the driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6234) Potential socket leak during Zookeeper network changes
[ https://issues.apache.org/jira/browse/MESOS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517678#comment-15517678 ] Joseph Wu edited comment on MESOS-6234 at 9/24/16 12:12 AM: | https://reviews.apache.org/r/52181/ | Prevent relinking races -> leaks | was (Author: kaysoky): | https://reviews.apache.org/r/52180/ | Fix edge case of {{link}} + {{ExitedEvent}} | | https://reviews.apache.org/r/52181/ | Prevent relinking races -> leaks | > Potential socket leak during Zookeeper network changes > -- > > Key: MESOS-6234 > URL: https://issues.apache.org/jira/browse/MESOS-6234 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 0.28.3, 1.0.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: libprocess, mesosphere > Fix For: 0.28.3, 1.1.0, 1.0.2 > > > There is a potential leak when using the version of {{link}} with > {{RemoteConnection::RECONNECT}}. This was originally implemented to refresh > links during master recovery. > The leak occurs here: > https://github.com/apache/mesos/blob/5e23edd513caec51ce3e94b3d785d714052525e8/3rdparty/libprocess/src/process.cpp#L1592-L1597 > ^ The comment here is not correct, as that is *not* the last reference to the > {{existing}} socket. > At this point, the {{existing}} socket may be a perfectly valid link. Valid > links will all have a reference inside a callback loop created here: > https://github.com/apache/mesos/blob/5e23edd513caec51ce3e94b3d785d714052525e8/3rdparty/libprocess/src/process.cpp#L1503-L1509 > - > We need to stop the callback loop but prevent any resulting {{ExitedEvents}} > from being sent due to stopping the callback loop. This means discarding the > callback loop's future after we have called {{swap_implementing_socket}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6245) Driver based schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors.
[ https://issues.apache.org/jira/browse/MESOS-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-6245: -- Summary: Driver based schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors. (was: Schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors.) > Driver based schedulers performing explicit acknowledgements cannot > acknowledge updates from HTTP based executors. > -- > > Key: MESOS-6245 > URL: https://issues.apache.org/jira/browse/MESOS-6245 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > Fix For: 1.1.0, 1.0.2 > > > It seems that the driver has an old check relying on the `PID`. The `PID` is > always `UPID()` for HTTP based executors. If a scheduler is using explicit > acknowledgements, it won't ever be able to acknowledge the update since the > driver would clean up the {{uuid}} field! > Note that all our tests use implicit acknowledgements and we never got around > to catching this issue till Marathon started using the HTTP based executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6245) Schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors.
Anand Mazumdar created MESOS-6245: - Summary: Schedulers performing explicit acknowledgements cannot acknowledge updates from HTTP based executors. Key: MESOS-6245 URL: https://issues.apache.org/jira/browse/MESOS-6245 Project: Mesos Issue Type: Bug Reporter: Anand Mazumdar Assignee: Anand Mazumdar Fix For: 1.1.0, 1.0.2 It seems that the driver has an old check relying on the `PID`. The `PID` is always `UPID()` for HTTP based executors. If a scheduler is using explicit acknowledgements, it won't ever be able to acknowledge the update since the driver would clean up the {{uuid}} field! Note that all our tests use implicit acknowledgements and we never got around to catching this issue till Marathon started using the HTTP based executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6233) Master CHECK fails during recovery while relinking to other masters
[ https://issues.apache.org/jira/browse/MESOS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517685#comment-15517685 ] Joseph Wu edited comment on MESOS-6233 at 9/23/16 9:58 PM: --- | https://reviews.apache.org/r/52182/ | Fix relink race that causes this check failure | was (Author: kaysoky): | https://reviews.apache.org/r/52180/ | Fix relink race that causes this check failure | > Master CHECK fails during recovery while relinking to other masters > --- > > Key: MESOS-6233 > URL: https://issues.apache.org/jira/browse/MESOS-6233 > Project: Mesos > Issue Type: Bug > Components: general, master >Affects Versions: 0.28.3, 1.0.1 >Reporter: Alex Kaplan >Assignee: Joseph Wu >Priority: Blocker > Labels: mesosphere > Fix For: 0.28.3, 1.1.0, 1.0.2 > > > Mesos Version: 1.0.1 > OS: CoreOS 1068 > {code} > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: I0922 20:05:17.948004 > 104495 manager.cpp:795] overlay-master in `RECOVERING` state . Hence, not > sending an update to agentoverlay-agent@10.4.4.1:5051 > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: F0922 20:05:17.948120 > 104529 process.cpp:2243] Check failed: sockets.count(from_fd) > 0 > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: *** Check failure > stack trace: *** > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908829fd google::LogMessage::Fail() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19088482d google::LogMessage::SendToLog() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908825ec google::LogMessage::Flush() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190885129 google::LogMessageFatal::~LogMessageFatal() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908171dd process::SocketManager::swap_implementing_socket() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19081aa90 process::SocketManager::link_connect() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908227f9 > _ZNSt17_Function_handlerIFvRKN7process6FutureI7NothingEEEZNKS3_5onAnyISt5_BindIFSt7_Mem_fnIMNS0_13SocketManagerEFvS5_NS0_7network6SocketERKNS0_4UPIDEEEPSA_St12_PlaceholderILi1EESC_SD_EEvEES5_OT_NS3_6PreferEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x41eb26 > _ZN7process8internal3runISt8functionIFvRKNS_6FutureI7NothingJRS5_EEEvRKSt6vectorIT_SaISC_EEDpOT0_ > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x42a36f process::Future<>::fail() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19085283c process::network::LibeventSSLSocketImpl::event_callback() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190852f17 process::network::LibeventSSLSocketImpl::event_callback() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18d616631 bufferevent_run_deferred_callbacks_locked > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18d60cc5d event_base_loop > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190865a1d process::EventLoop::run() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18eeabd73 (unknown) > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18e6a852c (unknown) > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18e3e61dd (unknown) > Sep 22 20:05:18 node-44a84215535c systemd[1]: > [0;1;39mdcos-mesos-master.service: Main process exited, code=killed, > status=6/ABRT > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6233) Master CHECK fails during recovery while relinking to other masters
[ https://issues.apache.org/jira/browse/MESOS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517685#comment-15517685 ] Joseph Wu commented on MESOS-6233: -- | https://reviews.apache.org/r/52180/ | Fix relink race that causes this check failure | > Master CHECK fails during recovery while relinking to other masters > --- > > Key: MESOS-6233 > URL: https://issues.apache.org/jira/browse/MESOS-6233 > Project: Mesos > Issue Type: Bug > Components: general, master >Affects Versions: 0.28.3, 1.0.1 >Reporter: Alex Kaplan >Assignee: Joseph Wu >Priority: Blocker > Labels: mesosphere > Fix For: 0.28.3, 1.1.0, 1.0.2 > > > Mesos Version: 1.0.1 > OS: CoreOS 1068 > {code} > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: I0922 20:05:17.948004 > 104495 manager.cpp:795] overlay-master in `RECOVERING` state . Hence, not > sending an update to agentoverlay-agent@10.4.4.1:5051 > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: F0922 20:05:17.948120 > 104529 process.cpp:2243] Check failed: sockets.count(from_fd) > 0 > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: *** Check failure > stack trace: *** > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908829fd google::LogMessage::Fail() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19088482d google::LogMessage::SendToLog() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908825ec google::LogMessage::Flush() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190885129 google::LogMessageFatal::~LogMessageFatal() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908171dd process::SocketManager::swap_implementing_socket() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19081aa90 process::SocketManager::link_connect() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc1908227f9 > _ZNSt17_Function_handlerIFvRKN7process6FutureI7NothingEEEZNKS3_5onAnyISt5_BindIFSt7_Mem_fnIMNS0_13SocketManagerEFvS5_NS0_7network6SocketERKNS0_4UPIDEEEPSA_St12_PlaceholderILi1EESC_SD_EEvEES5_OT_NS3_6PreferEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x41eb26 > _ZN7process8internal3runISt8functionIFvRKNS_6FutureI7NothingJRS5_EEEvRKSt6vectorIT_SaISC_EEDpOT0_ > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x42a36f process::Future<>::fail() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc19085283c process::network::LibeventSSLSocketImpl::event_callback() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190852f17 process::network::LibeventSSLSocketImpl::event_callback() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18d616631 bufferevent_run_deferred_callbacks_locked > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18d60cc5d event_base_loop > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc190865a1d process::EventLoop::run() > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18eeabd73 (unknown) > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18e6a852c (unknown) > Sep 22 20:05:17 node-44a84215535c mesos-master[104478]: @ > 0x7fc18e3e61dd (unknown) > Sep 22 20:05:18 node-44a84215535c systemd[1]: > [0;1;39mdcos-mesos-master.service: Main process exited, code=killed, > status=6/ABRT > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6234) Potential socket leak during Zookeeper network changes
[ https://issues.apache.org/jira/browse/MESOS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517678#comment-15517678 ] Joseph Wu commented on MESOS-6234: -- | https://reviews.apache.org/r/52180/ | Fix edge case of {{link}} + {{ExitedEvent}} | | https://reviews.apache.org/r/52181/ | Prevent relinking races -> leaks | > Potential socket leak during Zookeeper network changes > -- > > Key: MESOS-6234 > URL: https://issues.apache.org/jira/browse/MESOS-6234 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 0.28.3, 1.0.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: libprocess, mesosphere > Fix For: 0.28.3, 1.1.0, 1.0.2 > > > There is a potential leak when using the version of {{link}} with > {{RemoteConnection::RECONNECT}}. This was originally implemented to refresh > links during master recovery. > The leak occurs here: > https://github.com/apache/mesos/blob/5e23edd513caec51ce3e94b3d785d714052525e8/3rdparty/libprocess/src/process.cpp#L1592-L1597 > ^ The comment here is not correct, as that is *not* the last reference to the > {{existing}} socket. > At this point, the {{existing}} socket may be a perfectly valid link. Valid > links will all have a reference inside a callback loop created here: > https://github.com/apache/mesos/blob/5e23edd513caec51ce3e94b3d785d714052525e8/3rdparty/libprocess/src/process.cpp#L1503-L1509 > - > We need to stop the callback loop but prevent any resulting {{ExitedEvents}} > from being sent due to stopping the callback loop. This means discarding the > callback loop's future after we have called {{swap_implementing_socket}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6244) Add support for streaming HTTP request bodies in libprocess.
Benjamin Mahler created MESOS-6244: -- Summary: Add support for streaming HTTP request bodies in libprocess. Key: MESOS-6244 URL: https://issues.apache.org/jira/browse/MESOS-6244 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Benjamin Mahler We currently have support for streaming responses. See MESOS-2438. Servers can start sending the response body before the body is complete. Clients can start reading a response before the body is complete. This is an optimization for large responses and is a requirement for infinite "streaming" style endpoints. We currently do not have support for streaming requests. This would allow a client to stream a large or infinite request body to the server without having to have the complete body in hand, and it would allow a server to read request bodies before they are have been completely received over the connection. This is a requirement if we want to allow clients to "stream" data into a server, i.e. an infinite request body. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3454) Remove duplicated logic in Flags::load
[ https://issues.apache.org/jira/browse/MESOS-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517541#comment-15517541 ] Michael Park commented on MESOS-3454: - [~greggomann] I can be the shepherd but I'll have to schedule to review it in a few weeks time. If you want to put together a patch until then, and have discussions, I'm happy to help. > Remove duplicated logic in Flags::load > -- > > Key: MESOS-3454 > URL: https://issues.apache.org/jira/browse/MESOS-3454 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Klaus Ma >Priority: Minor > > In {{flags.hpp}}, there are two functions with almost the same logic; this > ticket is used to merge the duplicated part. > {code} > inline Try FlagsBase::load( > const Option& prefix, > int* argc, > char*** argv, > bool unknowns, > bool duplicates) > ... > inline Try FlagsBase::load( > const Option& prefix, > int argc, > const char* const *argv, > bool unknowns, > bool duplicates) > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6243) Expose failures and unknown container cases from Containerizer::destroy.
Benjamin Mahler created MESOS-6243: -- Summary: Expose failures and unknown container cases from Containerizer::destroy. Key: MESOS-6243 URL: https://issues.apache.org/jira/browse/MESOS-6243 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Benjamin Mahler Assignee: Benjamin Mahler Currently the callers of `destroy` cannot determine if the call succeeds or fails (without a secondary call to `wait()`). This also allows the caller to distinguish between a failure and waiting on an unknown container. This is important for the upcoming agent child container API, as the end-user would benefit from the distinction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517376#comment-15517376 ] Jamie Briant commented on MESOS-6118: - I can't give you any code, but if you can give me a version that dumps the output to a file rather than the logging (because its the logging that's truncating), then I'll be happy to run it. > Agent would crash with docker container tasks due to host mount table read. > --- > > Key: MESOS-6118 > URL: https://issues.apache.org/jira/browse/MESOS-6118 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 1.0.1 > Environment: Build: 2016-08-26 23:06:27 by centos > Version: 1.0.1 > Git tag: 1.0.1 > Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3 > systemd version `219` detected > Inializing systemd state > Created systemd slice: `/run/systemd/system/mesos_executors.slice` > Started systemd slice `mesos_executors.slice` > Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni > Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 > UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jamie Briant >Assignee: Kevin Klues >Priority: Critical > Labels: linux, slave > Fix For: 1.1.0, 1.0.2 > > Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, > cycle6.log, slave-crash.log > > > I have a framework which schedules thousands of short running (a few seconds > to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the > slave process will crash every few minutes (with systemd restarting it). > Crash is: > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678 1232 > fs.cpp:140] Check failed: !visitedParents.contains(parentId) > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: > *** > Version 1.0.0 works without this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6242) Expose unknown container case on Containerizer::wait.
Benjamin Mahler created MESOS-6242: -- Summary: Expose unknown container case on Containerizer::wait. Key: MESOS-6242 URL: https://issues.apache.org/jira/browse/MESOS-6242 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Benjamin Mahler Assignee: Benjamin Mahler This allows the caller to distinguish between a failure and waiting on an unknown container. This is important for the upcoming agent nested container API, as the end-user would benefit from the distinction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6241) Add agent::Call / agent::Response API for managing nested containers.
Benjamin Mahler created MESOS-6241: -- Summary: Add agent::Call / agent::Response API for managing nested containers. Key: MESOS-6241 URL: https://issues.apache.org/jira/browse/MESOS-6241 Project: Mesos Issue Type: Task Components: HTTP API, slave Reporter: Benjamin Mahler Assignee: Benjamin Mahler In order to manage nested containers from executors or from tooling, we'll need to have an API for managing nested containers. Per the [design doc|https://docs.google.com/document/d/1FtcyQkDfGp-bPHTW4pUoqQCgVlPde936bo-IIENO_ho/] we will start with the following: * Launch: create a new nested container underneath the parent container. * Wait: wait for the nested container to terminate. * Kill: kill a non-terminal nested container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6236) Launch subprocesses associated with specified namespaces.
[ https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517296#comment-15517296 ] Jie Yu commented on MESOS-6236: --- That means we need to move ns related functions from src/linux/ns.hpp to stout. > Launch subprocesses associated with specified namespaces. > - > > Key: MESOS-6236 > URL: https://issues.apache.org/jira/browse/MESOS-6236 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Qian Zhang > Labels: mesosphere > Fix For: 1.1.0 > > > Currently there is no standard way in Mesos to launch a child process in a > different namespace (e.g. {{net}}, {{mnt}}). A user may leverage > {{Subprocess}} and provide its own {{clone}} callback, but this approach is > error-prone. > One possible solution is to implement a {{Subprocess}}' child hook. In > [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have > introduced a child hook framework in subprocess and implemented three child > hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce > another child hook {{SETNS}} so that other components (e.g., health check) > can call it to enter the namespaces of a specific process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6240) Allow executor/agent communication over domain sockets and named PIPES
[ https://issues.apache.org/jira/browse/MESOS-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avinash Sridharan updated MESOS-6240: - Summary: Allow executor/agent communication over domain sockets and named PIPES (was: Allow executor/agent communication over domain socker and named PIPES) > Allow executor/agent communication over domain sockets and named PIPES > -- > > Key: MESOS-6240 > URL: https://issues.apache.org/jira/browse/MESOS-6240 > Project: Mesos > Issue Type: Improvement > Components: containerization > Environment: Linux and Windows >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > Currently, the executor agent communication happens specifically over TCP > sockets. This works fine in most cases, but specifically for the > `MesosContainerizer` when containers are running on CNI networks, this mode > of communication starts imposing constraints on the CNI network. Since, now > there has to connectivity between the CNI network (on which the executor is > running) and the agent. Introducing paths from a CNI network to the > underlying agent, at best, creates headaches for operators and at worst > introduces serious security holes in the network, since it is breaking the > isolation between the container CNI network and the host network (on which > the agent is running). > In order to simplify/strengthen deployment of Mesos containers on CNI > networks we therefore need to move away from using TCP/IP sockets for > executor/agent communication. Since, executor and agent are guaranteed to run > on the same host, the above problems can be resolved if, for the > `MesosContainerizer`, we use UNIX domain sockets or named pipes instead of > TCP/IP sockets for the executor/agent communication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6240) Allow executor/agent communication over domain socker and named PIPES
Avinash Sridharan created MESOS-6240: Summary: Allow executor/agent communication over domain socker and named PIPES Key: MESOS-6240 URL: https://issues.apache.org/jira/browse/MESOS-6240 Project: Mesos Issue Type: Improvement Components: containerization Environment: Linux and Windows Reporter: Avinash Sridharan Assignee: Avinash Sridharan Currently, the executor agent communication happens specifically over TCP sockets. This works fine in most cases, but specifically for the `MesosContainerizer` when containers are running on CNI networks, this mode of communication starts imposing constraints on the CNI network. Since, now there has to connectivity between the CNI network (on which the executor is running) and the agent. Introducing paths from a CNI network to the underlying agent, at best, creates headaches for operators and at worst introduces serious security holes in the network, since it is breaking the isolation between the container CNI network and the host network (on which the agent is running). In order to simplify/strengthen deployment of Mesos containers on CNI networks we therefore need to move away from using TCP/IP sockets for executor/agent communication. Since, executor and agent are guaranteed to run on the same host, the above problems can be resolved if, for the `MesosContainerizer`, we use UNIX domain sockets or named pipes instead of TCP/IP sockets for the executor/agent communication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517212#comment-15517212 ] Kevin Klues commented on MESOS-6118: Ack -- ~Kevin > Agent would crash with docker container tasks due to host mount table read. > --- > > Key: MESOS-6118 > URL: https://issues.apache.org/jira/browse/MESOS-6118 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 1.0.1 > Environment: Build: 2016-08-26 23:06:27 by centos > Version: 1.0.1 > Git tag: 1.0.1 > Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3 > systemd version `219` detected > Inializing systemd state > Created systemd slice: `/run/systemd/system/mesos_executors.slice` > Started systemd slice `mesos_executors.slice` > Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni > Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 > UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jamie Briant >Assignee: Kevin Klues >Priority: Critical > Labels: linux, slave > Fix For: 1.1.0, 1.0.2 > > Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, > cycle6.log, slave-crash.log > > > I have a framework which schedules thousands of short running (a few seconds > to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the > slave process will crash every few minutes (with systemd restarting it). > Crash is: > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678 1232 > fs.cpp:140] Check failed: !visitedParents.contains(parentId) > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: > *** > Version 1.0.0 works without this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6239) Fix warnings and errors produced by new hardened CXXFLAGS
Aaron Wood created MESOS-6239: - Summary: Fix warnings and errors produced by new hardened CXXFLAGS Key: MESOS-6239 URL: https://issues.apache.org/jira/browse/MESOS-6239 Project: Mesos Issue Type: Improvement Reporter: Aaron Wood Assignee: Aaron Wood Priority: Minor Most of the new warnings/errors come from libprocess/stout as there were never any CXXFLAGS propagated to them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517186#comment-15517186 ] Jie Yu commented on MESOS-6118: --- Thanks! We definitely didn't handle this case. Will make sure a fix to this land in 1.0.2. > Agent would crash with docker container tasks due to host mount table read. > --- > > Key: MESOS-6118 > URL: https://issues.apache.org/jira/browse/MESOS-6118 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 1.0.1 > Environment: Build: 2016-08-26 23:06:27 by centos > Version: 1.0.1 > Git tag: 1.0.1 > Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3 > systemd version `219` detected > Inializing systemd state > Created systemd slice: `/run/systemd/system/mesos_executors.slice` > Started systemd slice `mesos_executors.slice` > Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni > Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 > UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jamie Briant >Assignee: Kevin Klues >Priority: Critical > Labels: linux, slave > Fix For: 1.1.0, 1.0.2 > > Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, > cycle6.log, slave-crash.log > > > I have a framework which schedules thousands of short running (a few seconds > to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the > slave process will crash every few minutes (with systemd restarting it). > Crash is: > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678 1232 > fs.cpp:140] Check failed: !visitedParents.contains(parentId) > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: > *** > Version 1.0.0 works without this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6229) Default to using hardened compilation flags
[ https://issues.apache.org/jira/browse/MESOS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517039#comment-15517039 ] Aaron Wood edited comment on MESOS-6229 at 9/23/16 5:48 PM: Looks like there will need to be some fixes made ahead of time before this patch goes in (probably many more than this one): /bin/sh ../../libtool --tag=CXX --mode=compile g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.1.0\" -DPACKAGE_STRING=\"mesos\ 1.1.0\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. -I../../../3rdparty/libprocess -DBUILD_DIR=\"/Users//Code/src/mesos/build/3rdparty/libprocess\" -I../../../3rdparty/libprocess/include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I../../../3rdparty/libprocess/../stout/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/include/apr-1 -I/usr/include/apr-1.0 -Wall -Werror -Wsign-compare -Wformat-security -Wstack-protector -fno-omit-frame-pointer -fstack-protector-strong -pie -fPIE -D_FORTIFY_SOURCE=2 -O3 -g1 -O0 -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT libprocess_la-reap.lo -MD -MP -MF .deps/libprocess_la-reap.Tpo -c -o libprocess_la-reap.lo `test -f 'src/reap.cpp' || echo '../../../3rdparty/libprocess/'`src/reap.cpp ../../../3rdparty/libprocess/src/profiler.cpp:35:12: error: unused variable 'PROFILE_FILE' [-Werror,-Wunused-const-variable] const char PROFILE_FILE[] = "perftools.out"; ^ In file included from ../../../3rdparty/libprocess/src/profiler.cpp:24: ../../../3rdparty/libprocess/include/process/profiler.hpp:80:8: error: private field 'started' is not used [-Werror,-Wunused-private-field] bool started; ^ 2 errors generated. make[5]: *** [libprocess_la-profiler.lo] Error 1 make[5]: *** Waiting for unfinished jobs mv -f .deps/libprocess_la-logging.Tpo .deps/libprocess_la-logging.Plo mv -f .deps/libprocess_la-io.Tpo .deps/libprocess_la-io.Plo libtool: compile: g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.1.0\" "-DPACKAGE_STRING=\"mesos 1.1.0\"" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. -I../../../3rdparty/libprocess -DBUILD_DIR=\"/Users//Code/src/mesos/build/3rdparty/libprocess\" -I../../../3rdparty/libprocess/include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I../../../3rdparty/libprocess/../stout/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/include/apr-1 -I/usr/include/apr-1.0 -Wall -Werror -Wsign-compare -Wformat-security -Wstack-protector -fno-omit-frame-pointer -fstack-protector-strong -D_FORTIFY_SOURCE=2 -O3 -g1 -O0 -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT libprocess_la-reap.lo -MD -MP -MF .deps/libprocess_la-reap.Tpo -c ../../../3rdparty/libprocess/src/reap.cpp -fno-common -DPIC -o .libs/libprocess_la-reap.o In file included from ../../../3rdparty/libprocess/src/process.cpp:108: ../../../3rdparty/libprocess/src/encoder.hpp:278:15: error: comparison of integers of different signs: 'off_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] if (index >= length) { ~ ^ ~~ ../../../3rdparty/libprocess/src/process.cpp:3501:23: error: comparison of integers of different signs:
[jira] [Commented] (MESOS-6229) Default to using hardened compilation flags
[ https://issues.apache.org/jira/browse/MESOS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517039#comment-15517039 ] Aaron Wood commented on MESOS-6229: --- Looks like there will need to be some fixes made ahead of time before this patch goes in: ``` /bin/sh ../../libtool --tag=CXX --mode=compile g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.1.0\" -DPACKAGE_STRING=\"mesos\ 1.1.0\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. -I../../../3rdparty/libprocess -DBUILD_DIR=\"/Users//Code/src/mesos/build/3rdparty/libprocess\" -I../../../3rdparty/libprocess/include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I../../../3rdparty/libprocess/../stout/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/include/apr-1 -I/usr/include/apr-1.0 -Wall -Werror -Wsign-compare -Wformat-security -Wstack-protector -fno-omit-frame-pointer -fstack-protector-strong -pie -fPIE -D_FORTIFY_SOURCE=2 -O3 -g1 -O0 -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT libprocess_la-reap.lo -MD -MP -MF .deps/libprocess_la-reap.Tpo -c -o libprocess_la-reap.lo `test -f 'src/reap.cpp' || echo '../../../3rdparty/libprocess/'`src/reap.cpp ../../../3rdparty/libprocess/src/profiler.cpp:35:12: error: unused variable 'PROFILE_FILE' [-Werror,-Wunused-const-variable] const char PROFILE_FILE[] = "perftools.out"; ^ In file included from ../../../3rdparty/libprocess/src/profiler.cpp:24: ../../../3rdparty/libprocess/include/process/profiler.hpp:80:8: error: private field 'started' is not used [-Werror,-Wunused-private-field] bool started; ^ 2 errors generated. make[5]: *** [libprocess_la-profiler.lo] Error 1 make[5]: *** Waiting for unfinished jobs mv -f .deps/libprocess_la-logging.Tpo .deps/libprocess_la-logging.Plo mv -f .deps/libprocess_la-io.Tpo .deps/libprocess_la-io.Plo libtool: compile: g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.1.0\" "-DPACKAGE_STRING=\"mesos 1.1.0\"" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBSASL2=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBZ=1 -I. -I../../../3rdparty/libprocess -DBUILD_DIR=\"/Users//Code/src/mesos/build/3rdparty/libprocess\" -I../../../3rdparty/libprocess/include -isystem ../boost-1.53.0 -I../elfio-3.2 -I../glog-0.3.3/src -I../http-parser-2.6.2 -I../libev-4.22 -DPICOJSON_USE_INT64 -D__STDC_FORMAT_MACROS -I../picojson-1.3.0 -I../../../3rdparty/libprocess/../stout/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/include/apr-1 -I/usr/include/apr-1.0 -Wall -Werror -Wsign-compare -Wformat-security -Wstack-protector -fno-omit-frame-pointer -fstack-protector-strong -D_FORTIFY_SOURCE=2 -O3 -g1 -O0 -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT libprocess_la-reap.lo -MD -MP -MF .deps/libprocess_la-reap.Tpo -c ../../../3rdparty/libprocess/src/reap.cpp -fno-common -DPIC -o .libs/libprocess_la-reap.o In file included from ../../../3rdparty/libprocess/src/process.cpp:108: ../../../3rdparty/libprocess/src/encoder.hpp:278:15: error: comparison of integers of different signs: 'off_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] if (index >= length) { ~ ^ ~~ ../../../3rdparty/libprocess/src/process.cpp:3501:23: error: comparison of integers of different signs: 'int' and 'size_type' (aka 'unsigned long') [-Werror,-Wsign-compare] for (int i
[jira] [Updated] (MESOS-6237) Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6
[ https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Loesche updated MESOS-6237: - Summary: Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6 (was: Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6) > Agent Sandbox inaccessible when using IPv6 address in patch from > https://github.com/lava/mesos/tree/bennoe/ipv6 > --- > > Key: MESOS-6237 > URL: https://issues.apache.org/jira/browse/MESOS-6237 > Project: Mesos > Issue Type: Bug >Reporter: Lukas Loesche > > Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit > 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 > When using IPs instead of hostnames the Agent Sandbox is inaccessible in the > Web UI. The problem seems to be that there's no brackets around the IP so it > tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of > http://[2001:41d0:1000:ab9::]:5051 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6
[ https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Loesche updated MESOS-6237: - Description: Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 When using IPs instead of hostnames the Agent Sandbox is inaccessible in the Web UI. The problem seems to be that there's no brackets around the IP so it tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of http://[2001:41d0:1000:ab9::]:5051 was: Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 When using IPs instead of hostnames the Agent Sandbox is inaccessible. The problem seems to be that there's no brackets around the IP so it tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of http://[2001:41d0:1000:ab9::]:5051 > Slave Sandbox inaccessible when using IPv6 address in patch from > https://github.com/lava/mesos/tree/bennoe/ipv6 > --- > > Key: MESOS-6237 > URL: https://issues.apache.org/jira/browse/MESOS-6237 > Project: Mesos > Issue Type: Bug >Reporter: Lukas Loesche > > Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit > 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 > When using IPs instead of hostnames the Agent Sandbox is inaccessible in the > Web UI. The problem seems to be that there's no brackets around the IP so it > tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of > http://[2001:41d0:1000:ab9::]:5051 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6
[ https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-6237: Summary: Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6 (was: Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6) > Slave Sandbox inaccessible when using IPv6 address in patch from > https://github.com/lava/mesos/tree/bennoe/ipv6 > --- > > Key: MESOS-6237 > URL: https://issues.apache.org/jira/browse/MESOS-6237 > Project: Mesos > Issue Type: Bug >Reporter: Lukas Loesche > > Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit > 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 > When using IPs instead of hostnames the Agent Sandbox is inaccessible. The > problem seems to be that there's no brackets around the IP so it tries to > access e.g. http://2001:41d0:1000:ab9:::5051 instead of > http://[2001:41d0:1000:ab9::]:5051 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6237) Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6
[ https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Loesche updated MESOS-6237: - Summary: Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6 (was: Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6) > Agent Sandbox inaccessible when using IPv6 address in patch from > https://github.com/lava/mesos/tree/bennoe/ipv6 > --- > > Key: MESOS-6237 > URL: https://issues.apache.org/jira/browse/MESOS-6237 > Project: Mesos > Issue Type: Bug >Reporter: Lukas Loesche > > Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit > 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 > When using IPs instead of hostnames the Agent Sandbox is inaccessible. The > problem seems to be that there's no brackets around the IP so it tries to > access e.g. http://2001:41d0:1000:ab9:::5051 instead of > http://[2001:41d0:1000:ab9::]:5051 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6
[ https://issues.apache.org/jira/browse/MESOS-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Loesche updated MESOS-6238: - Description: Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 make fails when configure options --enable-ssl --enable-libevent were given. Error message: {noformat} ... ... ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void process::SocketManager::link_connect(const process::Future&, process::network::Socket, const process::UPID&)’: ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not declared in this scope Try ip = url.ip; ^ Makefile:997: recipe for target 'libprocess_la-process.lo' failed make[5]: *** [libprocess_la-process.lo] Error 1 ... ... {noformat} was: make fails when configure options --enable-ssl --enable-libevent were given. Error message: {noformat} ... ... ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void process::SocketManager::link_connect(const process::Future&, process::network::Socket, const process::UPID&)’: ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not declared in this scope Try ip = url.ip; ^ Makefile:997: recipe for target 'libprocess_la-process.lo' failed make[5]: *** [libprocess_la-process.lo] Error 1 ... ... {noformat} > SSL / libevent support broken in IPv6 patch from > https://github.com/lava/mesos/tree/bennoe/ipv6 > --- > > Key: MESOS-6238 > URL: https://issues.apache.org/jira/browse/MESOS-6238 > Project: Mesos > Issue Type: Bug >Reporter: Lukas Loesche > > Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit > 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 > make fails when configure options --enable-ssl --enable-libevent were given. > Error message: > {noformat} > ... > ... > ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void > process::SocketManager::link_connect(const process::Future&, > process::network::Socket, const process::UPID&)’: > ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not > declared in this scope >Try ip = url.ip; > ^ > Makefile:997: recipe for target 'libprocess_la-process.lo' failed > make[5]: *** [libprocess_la-process.lo] Error 1 > ... > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6
[ https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517001#comment-15517001 ] Lukas Loesche commented on MESOS-6237: -- Cc [~bennoe] > Slave Sandbox inaccessible when using IPv6 address in patch from > https://github.com/lava/mesos/tree/bennoe/ipv6 > --- > > Key: MESOS-6237 > URL: https://issues.apache.org/jira/browse/MESOS-6237 > Project: Mesos > Issue Type: Bug >Reporter: Lukas Loesche > > Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit > 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 > When using IPs instead of hostnames the Agent Sandbox is inaccessible. The > problem seems to be that there's no brackets around the IP so it tries to > access e.g. http://2001:41d0:1000:ab9:::5051 instead of > http://[2001:41d0:1000:ab9::]:5051 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6
[ https://issues.apache.org/jira/browse/MESOS-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517002#comment-15517002 ] Lukas Loesche commented on MESOS-6238: -- Cc [~bennoe] > SSL / libevent support broken in IPv6 patch from > https://github.com/lava/mesos/tree/bennoe/ipv6 > --- > > Key: MESOS-6238 > URL: https://issues.apache.org/jira/browse/MESOS-6238 > Project: Mesos > Issue Type: Bug >Reporter: Lukas Loesche > > make fails when configure options --enable-ssl --enable-libevent were given. > Error message: > {noformat} > ... > ... > ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void > process::SocketManager::link_connect(const process::Future&, > process::network::Socket, const process::UPID&)’: > ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not > declared in this scope >Try ip = url.ip; > ^ > Makefile:997: recipe for target 'libprocess_la-process.lo' failed > make[5]: *** [libprocess_la-process.lo] Error 1 > ... > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6
Lukas Loesche created MESOS-6238: Summary: SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6 Key: MESOS-6238 URL: https://issues.apache.org/jira/browse/MESOS-6238 Project: Mesos Issue Type: Bug Reporter: Lukas Loesche make fails when configure options --enable-ssl --enable-libevent were given. Error message: {noformat} ... ... ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void process::SocketManager::link_connect(const process::Future&, process::network::Socket, const process::UPID&)’: ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not declared in this scope Try ip = url.ip; ^ Makefile:997: recipe for target 'libprocess_la-process.lo' failed make[5]: *** [libprocess_la-process.lo] Error 1 ... ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6237) Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6
Lukas Loesche created MESOS-6237: Summary: Slave Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6 Key: MESOS-6237 URL: https://issues.apache.org/jira/browse/MESOS-6237 Project: Mesos Issue Type: Bug Reporter: Lukas Loesche Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 When using IPs instead of hostnames the Agent Sandbox is inaccessible. The problem seems to be that there's no brackets around the IP so it tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of http://[2001:41d0:1000:ab9::]:5051 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems
[ https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516903#comment-15516903 ] Mao Geng commented on MESOS-5909: - [~kaysoky] Thanks for shepherding. Addressed your review comments in https://reviews.apache.org/r/52048/, can you please check? > Stout "OsTest.User" test can fail on some systems > - > > Key: MESOS-5909 > URL: https://issues.apache.org/jira/browse/MESOS-5909 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Kapil Arya >Assignee: Mao Geng > Labels: mesosphere > Attachments: MESOS-5909-fix.diff > > > Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner > (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted > list ("100 471" in my case) causing the validation inside the loop to fail. > We should sort both lists before comparing the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516583#comment-15516583 ] Ian Babrou commented on MESOS-6118: --- Looks like Mesos is confused by the fact that my root is a parent of itself. This is probably because of the fact that system boots from the network and keeps / in ram. > Agent would crash with docker container tasks due to host mount table read. > --- > > Key: MESOS-6118 > URL: https://issues.apache.org/jira/browse/MESOS-6118 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 1.0.1 > Environment: Build: 2016-08-26 23:06:27 by centos > Version: 1.0.1 > Git tag: 1.0.1 > Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3 > systemd version `219` detected > Inializing systemd state > Created systemd slice: `/run/systemd/system/mesos_executors.slice` > Started systemd slice `mesos_executors.slice` > Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni > Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 > UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Jamie Briant >Assignee: Kevin Klues >Priority: Critical > Labels: linux, slave > Fix For: 1.1.0, 1.0.2 > > Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, > cycle6.log, slave-crash.log > > > I have a framework which schedules thousands of short running (a few seconds > to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the > slave process will crash every few minutes (with systemd restarting it). > Crash is: > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678 1232 > fs.cpp:140] Check failed: !visitedParents.contains(parentId) > Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: > *** > Version 1.0.0 works without this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516371#comment-15516371 ] Ian Babrou commented on MESOS-6118: --- I had to rework your patch a bit to apply on top of master. I then build 1.0.1 with the resulting fs.cpp: {noformat} Sep 23 12:56:49 36com72 mesos-agent[15633]: Failed to perform recovery: Collect failed: Unable to unmount volumes for Docker container '5ec94354-f785-4d13-b3ef-fb1a37eac007': Failed to get mount table: Cycle found in mount table hierarchy through entry '1': 1 1 0:2 / / rw shared:1 - rootfs rootfs rw,size=65513288k,nr_inodes=16378322 Sep 23 12:56:49 36com72 mesos-agent[15633]: 17 1 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw Sep 23 12:56:49 36com72 mesos-agent[15633]: 18 1 0:5 / /proc rw,nosuid,nodev,noexec,relatime shared:7 - proc proc rw Sep 23 12:56:49 36com72 mesos-agent[15633]: 19 1 0:6 / /dev rw,nosuid shared:8 - devtmpfs devtmpfs rw,size=65513304k,nr_inodes=16378326,mode=755 Sep 23 12:56:49 36com72 mesos-agent[15633]: 20 17 0:18 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:3 - securityfs securityfs rw Sep 23 12:56:49 36com72 mesos-agent[15633]: 21 17 0:16 / /sys/fs/selinux rw,relatime shared:4 - selinuxfs selinuxfs rw Sep 23 12:56:49 36com72 mesos-agent[15633]: 22 19 0:19 / /dev/shm rw,nosuid,nodev shared:9 - tmpfs tmpfs rw Sep 23 12:56:49 36com72 mesos-agent[15633]: 23 19 0:13 / /dev/pts rw,nosuid,noexec,relatime shared:10 - devpts devpts rw,gid=5,mode=620,ptmxmode=000 Sep 23 12:56:49 36com72 mesos-agent[15633]: 24 1 0:20 / /run rw,nosuid,nodev shared:11 - tmpfs tmpfs rw,mode=755 Sep 23 12:56:49 36com72 mesos-agent[15633]: 25 24 0:21 / /run/lock rw,nosuid,nodev,noexec,relatime shared:12 - tmpfs tmpfs rw,size=5120k Sep 23 12:56:49 36com72 mesos-agent[15633]: 26 17 0:22 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:5 - tmpfs tmpfs ro,mode=755 Sep 23 12:56:49 36com72 mesos-agent[15633]: 27 26 0:23 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd Sep 23 12:56:49 36com72 mesos-agent[15633]: 28 26 0:24 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,cpuset Sep 23 12:56:49 36com72 mesos-agent[15633]: 29 26 0:25 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,cpu,cpuacct Sep 23 12:56:49 36com72 mesos-agent[15633]: 30 26 0:26 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,blkio Sep 23 12:56:49 36com72 mesos-agent[15633]: 31 26 0:27 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,memory Sep 23 12:56:49 36com72 mesos-agent[15633]: 32 26 0:28 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,devices Sep 23 12:56:49 36com72 mesos-agent[15633]: 33 26 0:29 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,freezer Sep 23 12:56:49 36com72 mesos-agent[15633]: 34 26 0:30 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,net_cls,net_prio Sep 23 12:56:49 36com72 mesos-agent[15633]: 35 26 0:31 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,perf_event Sep 23 12:56:49 36com72 mesos-agent[15633]: 36 26 0:32 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,hugetlb Sep 23 12:56:49 36com72 mesos-agent[15633]: 37 26 0:33 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,pids Sep 23 12:56:49 36com72 mesos-agent[15633]: 38 18 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:23 - autofs systemd-1 rw,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct Sep 23 12:56:49 36com72 mesos-agent[15633]: 39 19 0:35 / /dev/hugepages rw,relatime shared:24 - hugetlbfs hugetlbfs rw Sep 23 12:56:49 36com72 mesos-agent[15633]: 40 17 0:8 / /sys/kernel/debug rw,relatime shared:25 - debugfs debugfs rw Sep 23 12:56:49 36com72 mesos-agent[15633]: 41 19 0:15 / /dev/mqueue rw,relatime shared:26 - mqueue mqueue rw Sep 23 12:56:49 36com72 mesos-agent[15633]: 42 1 9:127 / /state rw,relatime shared:27 - ext4 /dev/md127 rw,stripe=384,data=ordered Sep 23 12:56:49 36com72 mesos-agent[15633]: 43 1 0:37 / /srv rw,relatime shared:28 - nfs4 10.36.14.18:/srv/hosts/36com72 rw,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.36.23.25,local_lock=none,addr=10.36.14.18 Sep 23 12:56:49 36com72 mesos-agent[15633]: 44 1 0:37 / /srv-master rw,relatime shared:29 - nfs4 10.36.14.18:/srv rw,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.36.23.25,local_lock=none,addr=10.36.14.18 Sep 23 12:56:49 36com72 mesos-agent[15633]: 45 38 0:36 /
[jira] [Issue Comment Deleted] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Babrou updated MESOS-6118: -- Comment: was deleted (was: I also experience this issue: {noformat} Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763520 4995 slave.cpp:3211] Handling status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 from executor(1)@10.10.23.25:46833 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763664 4991 slave.cpp:6014] Terminating task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763825 5002 docker.cpp:972] Running docker -H unix:///var/run/docker.sock inspect mesos-dfc1b04b-941b-4d93-adf4-c65ab307ee2c-S2.c40cea8c-31a9-468f-a183-ed9851cd5aa8 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821267 4987 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821296 4987 status_update_manager.cpp:825] Checkpointing UPDATE for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844871 4987 status_update_manager.cpp:374] Forwarding update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to the agent Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844970 5009 slave.cpp:3604] Forwarding the update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to master@10.10.11.16:5050 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845062 5009 slave.cpp:3498] Status update manager successfully handled status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845074 5009 slave.cpp:3514] Sending acknowledgement for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to executor(1)@10.10.23.25:46833 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.864859 4987 slave.cpp:3686] Received ping from slave-observer(149)@10.10.11.16:5050 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.955936 4995 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.956001 4995 status_update_manager.cpp:825] Checkpointing ACK for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.982950 4995 status_update_manager.cpp:528] Cleaning up status update stream for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983119 4995 slave.cpp:2597] Status update manager successfully handled status update acknowledgement (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983131 4995 slave.cpp:6055] Completing task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667191 4981 process.cpp:3323] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/state' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667413 4983 http.cpp:270] HTTP GET for /slave(1)/state from 10.10.19.24:33570 with User-Agent='Go-http-client/1.1' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.669677 5012 process.cpp:3323] Handling HTTP event for process 'files' with path: '/files/download' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.670250 5005 process.cpp:1280] Sending file at
[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516233#comment-15516233 ] Ian Babrou commented on MESOS-6118: --- I've tried it and it didn't fix the issue for me: {noformat} Sep 23 11:52:01 myhost mesos-agent[10627]: F0923 11:52:01.524873 10648 fs.cpp:140] Check failed: !visitedParents.contains(parentId) Sep 23 11:52:01 myhost mesos-agent[10627]: *** Check failure stack trace: *** Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a253d google::LogMessage::Fail() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a41bd google::LogMessage::SendToLog() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a2102 google::LogMessage::Flush() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a4ba9 google::LogMessageFatal::~LogMessageFatal() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb07183d _ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb0717a5 _ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb078c5a mesos::internal::fs::MountInfoTable::read() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bae2c346 mesos::internal::slave::DockerContainerizerProcess::unmountPersistentVolumes() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bae48157 mesos::internal::slave::DockerContainerizerProcess::___destroy() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb546094 process::ProcessManager::resume() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5463b7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b9c20970 (unknown) Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b973f0a4 start_thread Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b947487d (unknown) {noformat} /proc/mounts: {noformat} rootfs / rootfs rw,size=65513288k,nr_inodes=16378322 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 devtmpfs /dev devtmpfs rw,nosuid,size=65513304k,nr_inodes=16378326,mode=755 0 0 securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0 selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0 cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0 cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0 cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0 cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 mqueue /dev/mqueue mqueue rw,relatime 0 0 /dev/md127 /state ext4 rw,relatime,stripe=384,data=ordered 0 0 10.10.14.18:/srv/hosts/myhost /srv nfs4 rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18 0 0 10.10.14.18:/srv /srv-master nfs4 rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 {noformat} Build procedure changes: {noformat} build: cd $(BUILDDIR)/mesos && \ + patch -p1 < $(TOP)/rb51620.patch && \ autoreconf -f -i -Wall,no-obsolete && \ ./bootstrap
[jira] [Issue Comment Deleted] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Babrou updated MESOS-6118: -- Comment: was deleted (was: I've tried it and it didn't fix the issue for me: {noformat} Sep 23 11:52:01 myhost mesos-agent[10627]: F0923 11:52:01.524873 10648 fs.cpp:140] Check failed: !visitedParents.contains(parentId) Sep 23 11:52:01 myhost mesos-agent[10627]: *** Check failure stack trace: *** Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a253d google::LogMessage::Fail() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a41bd google::LogMessage::SendToLog() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a2102 google::LogMessage::Flush() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5a4ba9 google::LogMessageFatal::~LogMessageFatal() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb07183d _ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb0717a5 _ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb078c5a mesos::internal::fs::MountInfoTable::read() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bae2c346 mesos::internal::slave::DockerContainerizerProcess::unmountPersistentVolumes() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bae48157 mesos::internal::slave::DockerContainerizerProcess::___destroy() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb546094 process::ProcessManager::resume() Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3bb5463b7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b9c20970 (unknown) Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b973f0a4 start_thread Sep 23 11:52:01 myhost mesos-agent[10627]: @ 0x7ff3b947487d (unknown) {noformat} /proc/mounts: {noformat} rootfs / rootfs rw,size=65513288k,nr_inodes=16378322 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 devtmpfs /dev devtmpfs rw,nosuid,size=65513304k,nr_inodes=16378326,mode=755 0 0 securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0 selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0 cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0 cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0 cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0 cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 mqueue /dev/mqueue mqueue rw,relatime 0 0 /dev/md127 /state ext4 rw,relatime,stripe=384,data=ordered 0 0 10.10.14.18:/srv/hosts/myhost /srv nfs4 rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18 0 0 10.10.14.18:/srv /srv-master nfs4 rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 {noformat} Build procedure changes: {noformat} build: cd $(BUILDDIR)/mesos && \ + patch -p1 < $(TOP)/rb51620.patch && \ autoreconf -f -i -Wall,no-obsolete && \ ./bootstrap
[jira] [Commented] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.
[ https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516215#comment-15516215 ] Alexander Rukletsov commented on MESOS-6184: Once we transition to a general solution, there will be no more need to expose {{defaultClone}}. See https://reviews.apache.org/r/51636/. > Health checks should use a general mechanism to enter namespaces of the task. > - > > Key: MESOS-6184 > URL: https://issues.apache.org/jira/browse/MESOS-6184 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent > Labels: health-check, mesosphere > Fix For: 1.1.0 > > > To perform health checks for tasks, we need to enter the corresponding > namespaces of the container. For now health check use custom clone to > implement this > {code} > return process::defaultClone([=]() -> int { > if (taskPid.isSome()) { > foreach (const string& ns, namespaces) { > Try setns = ns::setns(taskPid.get(), ns); > if (setns.isError()) { > ... > } > } > } > return func(); > }); > {code} > After the childHooks patches merged, we could change the health check to use > childHooks to call {{setns}} and make {{process::defaultClone}} private > again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.
[ https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6184: --- Summary: Health checks should use a general mechanism to enter namespaces of the task. (was: Change health check to use childHooks to enter the namespaces of the container) > Health checks should use a general mechanism to enter namespaces of the task. > - > > Key: MESOS-6184 > URL: https://issues.apache.org/jira/browse/MESOS-6184 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent > Labels: health-check, mesosphere > Fix For: 1.1.0 > > > To perform health checks for tasks, we need to enter the corresponding > namespaces of the container. For now health check use custom clone to > implement this > {code} > return process::defaultClone([=]() -> int { > if (taskPid.isSome()) { > foreach (const string& ns, namespaces) { > Try setns = ns::setns(taskPid.get(), ns); > if (setns.isError()) { > ... > } > } > } > return func(); > }); > {code} > After the childHooks patches merged, we could change the health check to use > childHooks to call {{setns}} and make {{process::defaultClone}} private > again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6184) Change health check to use childHooks to enter the namespaces of the container
[ https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6184: --- Shepherd: Alexander Rukletsov Story Points: 3 Labels: health-check mesosphere (was: health-check) Fix Version/s: 1.1.0 > Change health check to use childHooks to enter the namespaces of the container > -- > > Key: MESOS-6184 > URL: https://issues.apache.org/jira/browse/MESOS-6184 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent > Labels: health-check, mesosphere > Fix For: 1.1.0 > > > To perform health checks for tasks, we need to enter the corresponding > namespaces of the container. For now health check use custom clone to > implement this > {code} > return process::defaultClone([=]() -> int { > if (taskPid.isSome()) { > foreach (const string& ns, namespaces) { > Try setns = ns::setns(taskPid.get(), ns); > if (setns.isError()) { > ... > } > } > } > return func(); > }); > {code} > After the childHooks patches merged, we could change the health check to use > childHooks to call {{setns}} and make {{process::defaultClone}} private > again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.
[ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516181#comment-15516181 ] Ian Babrou commented on MESOS-6118: --- I also experience this issue: {noformat} Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763520 4995 slave.cpp:3211] Handling status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 from executor(1)@10.10.23.25:46833 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763664 4991 slave.cpp:6014] Terminating task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.763825 5002 docker.cpp:972] Running docker -H unix:///var/run/docker.sock inspect mesos-dfc1b04b-941b-4d93-adf4-c65ab307ee2c-S2.c40cea8c-31a9-468f-a183-ed9851cd5aa8 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821267 4987 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.821296 4987 status_update_manager.cpp:825] Checkpointing UPDATE for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844871 4987 status_update_manager.cpp:374] Forwarding update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to the agent Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.844970 5009 slave.cpp:3604] Forwarding the update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to master@10.10.11.16:5050 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845062 5009 slave.cpp:3498] Status update manager successfully handled status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.845074 5009 slave.cpp:3514] Sending acknowledgement for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 to executor(1)@10.10.23.25:46833 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.864859 4987 slave.cpp:3686] Received ping from slave-observer(149)@10.10.11.16:5050 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.955936 4995 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.956001 4995 status_update_manager.cpp:825] Checkpointing ACK for status update TASK_FAILED (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.982950 4995 status_update_manager.cpp:528] Cleaning up status update stream for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983119 4995 slave.cpp:2597] Status update manager successfully handled status update acknowledgement (UUID: 084ace64-a1bf-495d-9769-ad831b53d1bf) for task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 of framework 20150606-001827-252388362-5050-5982-0001 Sep 23 11:07:39 myhost mesos-agent[4980]: I0923 11:07:39.983131 4995 slave.cpp:6055] Completing task pdx_phoenix.e7b89f12-817d-11e6-9c3a-2c600cbc2dd1 Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667191 4981 process.cpp:3323] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/state' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.667413 4983 http.cpp:270] HTTP GET for /slave(1)/state from 10.10.19.24:33570 with User-Agent='Go-http-client/1.1' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.669677 5012 process.cpp:3323] Handling HTTP event for process 'files' with path: '/files/download' Sep 23 11:07:40 myhost mesos-agent[4980]: I0923 11:07:40.670250 5005 process.cpp:1280] Sending file at
[jira] [Updated] (MESOS-6236) Launch subprocesses associated with specified namespaces.
[ https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6236: --- Description: Currently there is no standard way in Mesos to launch a child process in a different namespace (e.g. {{net}}, {{mnt}}). A user may leverage {{Subprocess}} and provide its own {{clone}} callback, but this approach is error-prone. One possible solution is to implement a {{Subprocess}}' child hook. In [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have introduced a child hook framework in subprocess and implemented three child hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce another child hook {{SETNS}} so that other components (e.g., health check) can call it to enter the namespaces of a specific process. was:In [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have introduced a child hook framework in subprocess and implemented three child hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. In this ticket, we'd like to introduce another child hook {{SETNS}} so that other components (e.g., health check) can call it to enter the namespaces of a specific process. > Launch subprocesses associated with specified namespaces. > - > > Key: MESOS-6236 > URL: https://issues.apache.org/jira/browse/MESOS-6236 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Qian Zhang > Labels: mesosphere > Fix For: 1.1.0 > > > Currently there is no standard way in Mesos to launch a child process in a > different namespace (e.g. {{net}}, {{mnt}}). A user may leverage > {{Subprocess}} and provide its own {{clone}} callback, but this approach is > error-prone. > One possible solution is to implement a {{Subprocess}}' child hook. In > [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have > introduced a child hook framework in subprocess and implemented three child > hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce > another child hook {{SETNS}} so that other components (e.g., health check) > can call it to enter the namespaces of a specific process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6236) Launch subprocesses associated with specified namespaces.
[ https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6236: --- Summary: Launch subprocesses associated with specified namespaces. (was: Introduce SETNS child hook in subprocess) > Launch subprocesses associated with specified namespaces. > - > > Key: MESOS-6236 > URL: https://issues.apache.org/jira/browse/MESOS-6236 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Qian Zhang > Labels: mesosphere > Fix For: 1.1.0 > > > In [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have > introduced a child hook framework in subprocess and implemented three child > hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. In this ticket, we'd like to > introduce another child hook {{SETNS}} so that other components (e.g., health > check) can call it to enter the namespaces of a specific process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6236) Introduce SETNS child hook in subprocess
[ https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6236: --- Story Points: 8 Labels: mesosphere (was: ) Fix Version/s: 1.1.0 > Introduce SETNS child hook in subprocess > > > Key: MESOS-6236 > URL: https://issues.apache.org/jira/browse/MESOS-6236 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Qian Zhang > Labels: mesosphere > Fix For: 1.1.0 > > > In [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have > introduced a child hook framework in subprocess and implemented three child > hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. In this ticket, we'd like to > introduce another child hook {{SETNS}} so that other components (e.g., health > check) can call it to enter the namespaces of a specific process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)