[jira] [Created] (MESOS-7924) Add a javascript linter to the webui.
Benjamin Mahler created MESOS-7924: -- Summary: Add a javascript linter to the webui. Key: MESOS-7924 URL: https://issues.apache.org/jira/browse/MESOS-7924 Project: Mesos Issue Type: Improvement Components: webui Reporter: Benjamin Mahler As far as I can tell, javascript linters (e.g. ESLint) help catch some functional errors as well, for example, we've made some "strict" mistakes a few times that ESLint can catch: MESOS-6624, MESOS-7912. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7912) Master WebUI not working in Chrome.
[ https://issues.apache.org/jira/browse/MESOS-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7912: --- Shepherd: Benjamin Mahler > Master WebUI not working in Chrome. > --- > > Key: MESOS-7912 > URL: https://issues.apache.org/jira/browse/MESOS-7912 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.3.0 > Environment: Mesos Master Version 1.3.0 > Chrome Windows Version Version 37.0.2062.102 m >Reporter: Alastair Montgomery >Assignee: Alastair Montgomery >Priority: Critical > Fix For: 1.3.2, 1.4.1 > > > Just getting "No master is currently leading ..." when browsing to Mesos > Master UI using Chrome. > Although displays correctly on IE. > The following is displayed the Chrome console, > {noformat} > Uncaught SyntaxError: In strict mode code, functions can only be declared at > top level or immediately within another function. controllers.js:848 > Error: [ng:areq] > http://errors.angularjs.org/1.2.3/ng/areq?p0=MainCtrl&p1=not%20a%20function%2C%20got%20undefined > at Error (native) > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:6:449 > at tb > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:18:250) > at Oa > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:18:337) > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:62:96 > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:49:117 > at q > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:7:361) > at Q > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:48:492) > at f > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:43:24) > at f > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:43:41) > angular-1.2.3.min.js:84 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular.min.js.map > 404 (Not Found) :5050/static/js/angular.min.js.map:1 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-route.min.js.map > 404 (Not Found) :5050/static/js/angular-route.min.js.map:1 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/css/bootstrap.min.css.map > 404 (Not Found) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7912) Master WebUI not working in Chrome.
[ https://issues.apache.org/jira/browse/MESOS-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7912: --- Fix Version/s: 1.3.2 > Master WebUI not working in Chrome. > --- > > Key: MESOS-7912 > URL: https://issues.apache.org/jira/browse/MESOS-7912 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.3.0 > Environment: Mesos Master Version 1.3.0 > Chrome Windows Version Version 37.0.2062.102 m >Reporter: Alastair Montgomery >Assignee: Alastair Montgomery >Priority: Critical > Fix For: 1.3.2, 1.4.1 > > > Just getting "No master is currently leading ..." when browsing to Mesos > Master UI using Chrome. > Although displays correctly on IE. > The following is displayed the Chrome console, > {noformat} > Uncaught SyntaxError: In strict mode code, functions can only be declared at > top level or immediately within another function. controllers.js:848 > Error: [ng:areq] > http://errors.angularjs.org/1.2.3/ng/areq?p0=MainCtrl&p1=not%20a%20function%2C%20got%20undefined > at Error (native) > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:6:449 > at tb > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:18:250) > at Oa > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:18:337) > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:62:96 > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:49:117 > at q > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:7:361) > at Q > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:48:492) > at f > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:43:24) > at f > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:43:41) > angular-1.2.3.min.js:84 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular.min.js.map > 404 (Not Found) :5050/static/js/angular.min.js.map:1 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-route.min.js.map > 404 (Not Found) :5050/static/js/angular-route.min.js.map:1 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/css/bootstrap.min.css.map > 404 (Not Found) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7912) Master WebUI not working in Chrome.
[ https://issues.apache.org/jira/browse/MESOS-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7912: --- Summary: Master WebUI not working in Chrome. (was: Master WebUI not working in Chrome) > Master WebUI not working in Chrome. > --- > > Key: MESOS-7912 > URL: https://issues.apache.org/jira/browse/MESOS-7912 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.3.0 > Environment: Mesos Master Version 1.3.0 > Chrome Windows Version Version 37.0.2062.102 m >Reporter: Alastair Montgomery >Assignee: Alastair Montgomery >Priority: Critical > Fix For: 1.4.1 > > > Just getting "No master is currently leading ..." when browsing to Mesos > Master UI using Chrome. > Although displays correctly on IE. > The following is displayed the Chrome console, > {noformat} > Uncaught SyntaxError: In strict mode code, functions can only be declared at > top level or immediately within another function. controllers.js:848 > Error: [ng:areq] > http://errors.angularjs.org/1.2.3/ng/areq?p0=MainCtrl&p1=not%20a%20function%2C%20got%20undefined > at Error (native) > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:6:449 > at tb > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:18:250) > at Oa > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:18:337) > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:62:96 > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:49:117 > at q > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:7:361) > at Q > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:48:492) > at f > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:43:24) > at f > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:43:41) > angular-1.2.3.min.js:84 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular.min.js.map > 404 (Not Found) :5050/static/js/angular.min.js.map:1 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-route.min.js.map > 404 (Not Found) :5050/static/js/angular-route.min.js.map:1 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/css/bootstrap.min.css.map > 404 (Not Found) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7912) Master WebUI not working in Chrome
[ https://issues.apache.org/jira/browse/MESOS-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-7912: -- Assignee: Alastair Montgomery > Master WebUI not working in Chrome > -- > > Key: MESOS-7912 > URL: https://issues.apache.org/jira/browse/MESOS-7912 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.3.0 > Environment: Mesos Master Version 1.3.0 > Chrome Windows Version Version 37.0.2062.102 m >Reporter: Alastair Montgomery >Assignee: Alastair Montgomery >Priority: Critical > > Just getting "No master is currently leading ..." when browsing to Mesos > Master UI using Chrome. > Although displays correctly on IE. > The following is displayed the Chrome console, > {noformat} > Uncaught SyntaxError: In strict mode code, functions can only be declared at > top level or immediately within another function. controllers.js:848 > Error: [ng:areq] > http://errors.angularjs.org/1.2.3/ng/areq?p0=MainCtrl&p1=not%20a%20function%2C%20got%20undefined > at Error (native) > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:6:449 > at tb > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:18:250) > at Oa > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:18:337) > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:62:96 > at > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:49:117 > at q > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:7:361) > at Q > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:48:492) > at f > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:43:24) > at f > (http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-1.2.3.min.js:43:41) > angular-1.2.3.min.js:84 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular.min.js.map > 404 (Not Found) :5050/static/js/angular.min.js.map:1 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/js/angular-route.min.js.map > 404 (Not Found) :5050/static/js/angular-route.min.js.map:1 > GET > http://pp3xmes01mst001.pp3.williamhill.plc:5050/static/css/bootstrap.min.css.map > 404 (Not Found) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7922) Fix communication between old masters and new agents.
[ https://issues.apache.org/jira/browse/MESOS-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144546#comment-16144546 ] Michael Park commented on MESOS-7922: - {noformat} commit 30e2b2ad818e4e90c8df03b9802a4b1a431605c7 Author: Michael Park Date: Mon Aug 28 15:19:31 2017 -0700 Fixed the communication between old masters and new agents. For re-registration, 1.4 agents used to send the resources in tasks and executors to the master in the "post-reservation-refinement" format, which is incompatible for pre-1.4 masters. This patch changes the agent such that it always downgrades the resources to the "pre-reservation-refinement" format, and the master unconditionally upgrades the resources to "post-reservation-refinement" format. Review: https://reviews.apache.org/r/61952/ {noformat} > Fix communication between old masters and new agents. > - > > Key: MESOS-7922 > URL: https://issues.apache.org/jira/browse/MESOS-7922 > Project: Mesos > Issue Type: Bug > Components: agent, master >Reporter: Michael Park >Assignee: Michael Park >Priority: Blocker > Fix For: 1.4.0 > > > For re-registration, agents currently send the resources in tasks > and executors to the master in the "post-reservation-refinement" format, > which is incompatible for pre-1.4 masters. We should change the agent > such that it always downgrades the resources to > the "pre-reservation-refinement" format, and the master unconditionally > upgrade the resources to "post-reservation-refinement" format. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7921) process::EventQueue sometimes crashes
[ https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-7921: -- Description: The following segfault is found on [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky and shows up in other tests and environments (with or without --enable-lock-free-event-queue) as well. {noformat: title=Configuration} ./bootstrap '&&' ./configure --verbose '&&' make -j6 distcheck {noformat} {noformat:title=} *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are using GNU date *** PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack trace: *** @ 0x2b9e29d26330 (unknown) @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() @ 0x2b9e25800a40 process::ProcessManager::resume() @ 0x2b9e2580f891 process::ProcessManager::init_threads()::$_9::operator()() @ 0x2b9e2580f7d5 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()() @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run() @ 0x2b9e29fe5a60 (unknown) @ 0x2b9e29d1e184 start_thread @ 0x2b9e2a851ffd (unknown) make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped) {noformat} A bui...@mesos.apache.org query shows many such instances: https://lists.apache.org/list.html?bui...@mesos.apache.org:lte=1M:process%3A%3AEventQueue%3A%3AConsumer%3A%3Aempty was: The following segfault is found on [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky and shows up in other tests and environments (with or without --enable-lock-free-event-queue) as well. {noformat: title=Configuration} ./bootstrap '&&' ./configure --verbose '&&' make -j6 distcheck {noformat} {noformat:title=} *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are using GNU date *** PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack trace: *** @ 0x2b9e29d26330 (unknown) @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() @ 0x2b9e25800a40 process::ProcessManager::resume() @ 0x2b9e2580f891 process::ProcessManager::init_threads()::$_9::operator()() @ 0x2b9e2580f7d5 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()() @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run() @ 0x2b9e29fe5a60 (unknown) @ 0x2b9e29d1e184 start_thread @ 0x2b9e2a851ffd (unknown) make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped) {noformat} > process::EventQueue sometimes crashes > - > > Key: MESOS-7921 > URL: https://issues.apache.org/jira/browse/MESOS-7921 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 1.4.0 > Environment: autotools,gcc,--verbose,GLOG_v=1 > MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2) > Note that --enable-lock-free-event-queue is not enabled. > Details: > https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/ >Reporter: Yan Xu >Priority: Blocker > Attachments: > MesosContainerizerSlaveRecoveryTest.ResourceStatisticsFullLog.txt > > > The following segfault is found on > [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] > in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky > and shows up in other tests and environments (with or without > --enable-lock-free-event-queue) as well. > {noformat: title=Configuration} > ./bootstrap '&&' ./configure --verbose '&&' make -j6 distcheck > {noformat} > {noformat:title=} > *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are > using GNU date *** > PC: @ 0x2b9e2581caa0 process::EventQueue:
[jira] [Updated] (MESOS-7921) process::EventQueue sometimes crashes
[ https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7921: --- Description: The following segfault is found on [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky and shows up in other tests and environments (with or without --enable-lock-free-event-queue) as well. {noformat: title=Configuration} ./bootstrap '&&' ./configure --verbose '&&' make -j6 distcheck {noformat} {noformat:title=} *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are using GNU date *** PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack trace: *** @ 0x2b9e29d26330 (unknown) @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() @ 0x2b9e25800a40 process::ProcessManager::resume() @ 0x2b9e2580f891 process::ProcessManager::init_threads()::$_9::operator()() @ 0x2b9e2580f7d5 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()() @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run() @ 0x2b9e29fe5a60 (unknown) @ 0x2b9e29d1e184 start_thread @ 0x2b9e2a851ffd (unknown) make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped) {noformat} was: The following segfault is found on [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky and shows up in other tests and environments (with or without --enable-lock-free-event-queue) as well. {noformat:title=} *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are using GNU date *** PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack trace: *** @ 0x2b9e29d26330 (unknown) @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() @ 0x2b9e25800a40 process::ProcessManager::resume() @ 0x2b9e2580f891 process::ProcessManager::init_threads()::$_9::operator()() @ 0x2b9e2580f7d5 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()() @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run() @ 0x2b9e29fe5a60 (unknown) @ 0x2b9e29d1e184 start_thread @ 0x2b9e2a851ffd (unknown) make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped) {noformat} > process::EventQueue sometimes crashes > - > > Key: MESOS-7921 > URL: https://issues.apache.org/jira/browse/MESOS-7921 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 1.4.0 > Environment: autotools,gcc,--verbose,GLOG_v=1 > MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2) > Note that --enable-lock-free-event-queue is not enabled. > Details: > https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/ >Reporter: Yan Xu >Priority: Blocker > Attachments: > MesosContainerizerSlaveRecoveryTest.ResourceStatisticsFullLog.txt > > > The following segfault is found on > [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] > in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky > and shows up in other tests and environments (with or without > --enable-lock-free-event-queue) as well. > {noformat: title=Configuration} > ./bootstrap '&&' ./configure --verbose '&&' make -j6 distcheck > {noformat} > {noformat:title=} > *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are > using GNU date *** > PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() > *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack > trace: *** > @ 0x2b9e29d26330 (unknown) > @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() > @ 0x2b9e25800a40 process::ProcessManage
[jira] [Commented] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery
[ https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144543#comment-16144543 ] Gilbert Song commented on MESOS-7801: - [~xds2000], sorry for the delay on this optimization change. Jie and I are under heavy workloads and did not get a chance onto this issue. We will try to prioritize this from our side. Any comments from you on those two patches will be absolutely welcome.:) > Retry logic for unsuccessful `docker rm` during agent recovery > -- > > Key: MESOS-7801 > URL: https://issues.apache.org/jira/browse/MESOS-7801 > Project: Mesos > Issue Type: Improvement > Components: docker >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > > In MESOS- we skip the failure when `docker rm` fails due to mount leakage > during agent recovery. In order not to leave residual docker containers in > the docker daemon, we could do a best-effort `docker rm` retry with an > exponential backoff since we cannot control when the leakage would be > terminated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery
[ https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144522#comment-16144522 ] Deshi Xiao commented on MESOS-7801: --- MESOS- already resolve my case, i have not provide another issue right now, i just curious that why the optimization patch is always pending? it let me confuse abt the general workflow. > Retry logic for unsuccessful `docker rm` during agent recovery > -- > > Key: MESOS-7801 > URL: https://issues.apache.org/jira/browse/MESOS-7801 > Project: Mesos > Issue Type: Improvement > Components: docker >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > > In MESOS- we skip the failure when `docker rm` fails due to mount leakage > during agent recovery. In order not to leave residual docker containers in > the docker daemon, we could do a best-effort `docker rm` retry with an > exponential backoff since we cannot control when the leakage would be > terminated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned
[ https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144518#comment-16144518 ] Deshi Xiao commented on MESOS-1871: --- [~idownes] i have some cycle to work on this issue, could you please shepherd me. where is best start on fix it? > Sending SIGTERM to a task command may render it orphaned > > > Key: MESOS-1871 > URL: https://issues.apache.org/jira/browse/MESOS-1871 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Alexander Rukletsov >Priority: Minor > > {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means > signals are sent to the top process—that is {{sh -c}}—and not to the task > directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process > tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates > reporting success to the {{CommandExecutor}}, rendering the task detached > from the parent process and still running. Because the {{CommandExecutor}} > thinks the command terminated normally, its OS process exits normally and may > not trigger containerizer's escalation which destroys cgroups. > Here is the test related to the first part: > [https://gist.github.com/rukletsov/68259dfb02421813f9e6]. > Here is the test related to the second part: > [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7921) process::EventQueue sometimes crashes
[ https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-7921: -- Target Version/s: 1.4.0 Priority: Blocker (was: Major) > process::EventQueue sometimes crashes > - > > Key: MESOS-7921 > URL: https://issues.apache.org/jira/browse/MESOS-7921 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 1.4.0 > Environment: autotools,gcc,--verbose,GLOG_v=1 > MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2) > Note that --enable-lock-free-event-queue is not enabled. > Details: > https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/ >Reporter: Yan Xu >Priority: Blocker > Attachments: > MesosContainerizerSlaveRecoveryTest.ResourceStatisticsFullLog.txt > > > The following segfault is found on > [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] > in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky > and shows up in other tests and environments (with or without > --enable-lock-free-event-queue) as well. > {noformat:title=} > *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are > using GNU date *** > PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() > *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack > trace: *** > @ 0x2b9e29d26330 (unknown) > @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() > @ 0x2b9e25800a40 process::ProcessManager::resume() > @ 0x2b9e2580f891 > process::ProcessManager::init_threads()::$_9::operator()() > @ 0x2b9e2580f7d5 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()() > @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run() > @ 0x2b9e29fe5a60 (unknown) > @ 0x2b9e29d1e184 start_thread > @ 0x2b9e2a851ffd (unknown) > make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7923) Make args optional in mesos port mapper plugin
Deepak Goel created MESOS-7923: -- Summary: Make args optional in mesos port mapper plugin Key: MESOS-7923 URL: https://issues.apache.org/jira/browse/MESOS-7923 Project: Mesos Issue Type: Bug Components: network Reporter: Deepak Goel Assignee: Deepak Goel Current implementation of the mesos-port-mapper plugin fails if the args field is absent in the cni config which makes it very specific to mesos. Instead, if args could be optional then this plugin could be used in a more generic environment. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7223) Linux filesystem isolator cannot mount host volume /dev/log.
[ https://issues.apache.org/jira/browse/MESOS-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-7223: - Assignee: Jie Yu (was: Gilbert Song) > Linux filesystem isolator cannot mount host volume /dev/log. > > > Key: MESOS-7223 > URL: https://issues.apache.org/jira/browse/MESOS-7223 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.2, 1.1.0, 1.2.0 >Reporter: Haralds Ulmanis >Assignee: Jie Yu > Labels: mesosphere, volumes > > I'm trying to mount /dev/log. > ls -l /dev/log > lrwxrwxrwx 1 root root 28 Mar 9 01:49 /dev/log -> > /run/systemd/journal/dev-log > # ls -l /run/systemd/journal/dev-log > srw-rw-rw- 1 root root 0 Mar 9 01:49 /run/systemd/journal/dev-log > I have tried mounting /dev/log and /run/systemd/journal/dev-log, both produce > same errors: > from stdout: > Executing pre-exec command > '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/lib\/mesos\/mesos-containerizer"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/data\/mesos-agent\/slaves\/9b7ad711-9381-4338-b3c0-dac86253701e-S93\/frameworks\/a872f621-d10f-4021-a886-c5d564df104e-\/executors\/services_dev-2_lb-6.b8202973-04b0-11e7-be02-0a2b9a5c33cf\/runs\/cfb170f0-6c69-4475-9dbe-bb9967e19b42","\/data\/mesos-agent\/provisioner\/containers\/cfb170f0-6c69-4475-9dbe-bb9967e19b42\/backends\/overlay\/rootfses\/890a25e6-cb15-42e3-be9c-0aa3baf889f8\/data\/mesos-agent\/sandbox"],"shell":false,"value":"mount"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/run\/systemd\/journal\/dev-log","\/data\/mesos-agent\/provisioner\/containers\/cfb170f0-6c69-4475-9dbe-bb9967e19b42\/backends\/overlay\/rootfses\/890a25e6-cb15-42e3-be9c-0aa3baf889f8\/dev\/log"],"shell":false,"value":"mount"}' > from stderr: > mount: mount(2) failed: > /data/mesos-agent/provisioner/containers/cfb170f0-6c69-4475-9dbe-bb9967e19b42/backends/overlay/rootfses/890a25e6-cb15-42e3-be9c-0aa3baf889f8/dev/log: > Not a directory > Failed to execute pre-exec command > '{"arguments":["mount","-n","--rbind","\/run\/systemd\/journal\/dev-log","\/data\/mesos-agent\/provisioner\/containers\/cfb170f0-6c69-4475-9dbe-bb9967e19b42\/backends\/overlay\/rootfses\/890a25e6-cb15-42e3-be9c-0aa3baf889f8\/dev\/log"],"shell":false,"value":"mount"}' > This particular job i start from marathon and have the following definition > (if I change MESOS to DOCKER - it works): > "container": { > "type": "MESOS", > "volumes": [ > { > "hostPath": "/run/systemd/journal/dev-log", > "containerPath": "/dev/log", > "mode": "RW" > } > ], > "docker": { > "image": "", > "credential": null, > "forcePullImage": true > } > }, -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7917) Docker statistics not reported on Windows.
[ https://issues.apache.org/jira/browse/MESOS-7917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7917: --- Shepherd: Alexander Rukletsov Story Points: 3 > Docker statistics not reported on Windows. > -- > > Key: MESOS-7917 > URL: https://issues.apache.org/jira/browse/MESOS-7917 > Project: Mesos > Issue Type: Bug > Components: docker > Environment: Windows 10 >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer > Labels: docker, microsoft, windows > > On Windows, the JSON information provided by the agent at the /container API > does not contain the expected {{statistics}} object for Docker containers on > Windows. This breaks the dcos-metrics tool, required for DC/OS integration on > Windows. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7922) Fix communication between old masters and new agents.
[ https://issues.apache.org/jira/browse/MESOS-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-7922: Target Version/s: 1.4.0 (was: 1.4.1) Priority: Blocker (was: Major) > Fix communication between old masters and new agents. > - > > Key: MESOS-7922 > URL: https://issues.apache.org/jira/browse/MESOS-7922 > Project: Mesos > Issue Type: Bug > Components: agent, master >Reporter: Michael Park >Assignee: Michael Park >Priority: Blocker > > For re-registration, agents currently send the resources in tasks > and executors to the master in the "post-reservation-refinement" format, > which is incompatible for pre-1.4 masters. We should change the agent > such that it always downgrades the resources to > the "pre-reservation-refinement" format, and the master unconditionally > upgrade the resources to "post-reservation-refinement" format. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7922) Fix communication between old masters and new agents.
Michael Park created MESOS-7922: --- Summary: Fix communication between old masters and new agents. Key: MESOS-7922 URL: https://issues.apache.org/jira/browse/MESOS-7922 Project: Mesos Issue Type: Bug Components: agent, master Reporter: Michael Park Assignee: Michael Park For re-registration, agents currently send the resources in tasks and executors to the master in the "post-reservation-refinement" format, which is incompatible for pre-1.4 masters. We should change the agent such that it always downgrades the resources to the "pre-reservation-refinement" format, and the master unconditionally upgrade the resources to "post-reservation-refinement" format. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7922) Fix communication between old masters and new agents.
[ https://issues.apache.org/jira/browse/MESOS-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144168#comment-16144168 ] Michael Park commented on MESOS-7922: - https://reviews.apache.org/r/61952 > Fix communication between old masters and new agents. > - > > Key: MESOS-7922 > URL: https://issues.apache.org/jira/browse/MESOS-7922 > Project: Mesos > Issue Type: Bug > Components: agent, master >Reporter: Michael Park >Assignee: Michael Park > > For re-registration, agents currently send the resources in tasks > and executors to the master in the "post-reservation-refinement" format, > which is incompatible for pre-1.4 masters. We should change the agent > such that it always downgrades the resources to > the "pre-reservation-refinement" format, and the master unconditionally > upgrade the resources to "post-reservation-refinement" format. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
[ https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7643: -- Target Version/s: 1.5.0, 1.4.1 (was: 1.4.0) > The order of isolators provided in '--isolation' flag is not preserved and > instead sorted alphabetically > > > Key: MESOS-7643 > URL: https://issues.apache.org/jira/browse/MESOS-7643 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.2, 1.2.0, 1.3.0 >Reporter: Michael Cherny >Assignee: James Peach >Priority: Critical > Labels: isolation > > According to documentation and comments in code the order of the entries in > the --isolation flag should specify the ordering of the isolators. > Specifically, the `create` and `prepare` calls for each isolator should run > serially in the order in which they appear in the --isolation flag, while the > `cleanup` call should be serialized in reverse order (with exception of > filesystem isolator which is always first). > But in fact, the isolators provided in '--isolation' flag are sorted > alphabetically. > That happens in [this line of > code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377]. > In this line use of 'set' is done (apparently instead of list or > vector) and set is a sorted container. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7921) process::EventQueue sometimes crashes
[ https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-7921: -- Attachment: MesosContainerizerSlaveRecoveryTest.ResourceStatisticsFullLog.txt Attached the full log on ASF CI for this instance. > process::EventQueue sometimes crashes > - > > Key: MESOS-7921 > URL: https://issues.apache.org/jira/browse/MESOS-7921 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 1.4.0 > Environment: autotools,gcc,--verbose,GLOG_v=1 > MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2) > Note that --enable-lock-free-event-queue is not enabled. > Details: > https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/ >Reporter: Yan Xu > Attachments: > MesosContainerizerSlaveRecoveryTest.ResourceStatisticsFullLog.txt > > > The following segfault is found on > [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] > in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky > and shows up in other tests and environments (with or without > --enable-lock-free-event-queue) as well. > {noformat:title=} > *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are > using GNU date *** > PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() > *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack > trace: *** > @ 0x2b9e29d26330 (unknown) > @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() > @ 0x2b9e25800a40 process::ProcessManager::resume() > @ 0x2b9e2580f891 > process::ProcessManager::init_threads()::$_9::operator()() > @ 0x2b9e2580f7d5 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()() > @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run() > @ 0x2b9e29fe5a60 (unknown) > @ 0x2b9e29d1e184 start_thread > @ 0x2b9e2a851ffd (unknown) > make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7921) process::EventQueue sometimes crashes
[ https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144134#comment-16144134 ] Yan Xu commented on MESOS-7921: --- [~benjaminhindman] [~bmahler] > process::EventQueue sometimes crashes > - > > Key: MESOS-7921 > URL: https://issues.apache.org/jira/browse/MESOS-7921 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 1.4.0 > Environment: autotools,gcc,--verbose,GLOG_v=1 > MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2) > Note that --enable-lock-free-event-queue is not enabled. > Details: > https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/ >Reporter: Yan Xu > > The following segfault is found on > [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] > in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky > and shows up in other tests and environments (with or without > --enable-lock-free-event-queue) as well. > {noformat:title=} > *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are > using GNU date *** > PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() > *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack > trace: *** > @ 0x2b9e29d26330 (unknown) > @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() > @ 0x2b9e25800a40 process::ProcessManager::resume() > @ 0x2b9e2580f891 > process::ProcessManager::init_threads()::$_9::operator()() > @ 0x2b9e2580f7d5 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()() > @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run() > @ 0x2b9e29fe5a60 (unknown) > @ 0x2b9e29d1e184 start_thread > @ 0x2b9e2a851ffd (unknown) > make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7921) process::EventQueue sometimes crashes
Yan Xu created MESOS-7921: - Summary: process::EventQueue sometimes crashes Key: MESOS-7921 URL: https://issues.apache.org/jira/browse/MESOS-7921 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 1.4.0 Environment: autotools,gcc,--verbose,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2) Note that --enable-lock-free-event-queue is not enabled. Details: https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/ Reporter: Yan Xu The following segfault is found on [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/] in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky and shows up in other tests and environments (with or without --enable-lock-free-event-queue) as well. {noformat:title=} *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are using GNU date *** PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack trace: *** @ 0x2b9e29d26330 (unknown) @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty() @ 0x2b9e25800a40 process::ProcessManager::resume() @ 0x2b9e2580f891 process::ProcessManager::init_threads()::$_9::operator()() @ 0x2b9e2580f7d5 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()() @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run() @ 0x2b9e29fe5a60 (unknown) @ 0x2b9e29d1e184 start_thread @ 0x2b9e2a851ffd (unknown) make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery
[ https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144116#comment-16144116 ] Chun-Hung Hsiao commented on MESOS-7801: Hi [~xds2000]. This patch is an optimization for MESOS-, which has been landed a while ago. I'd like to see this patch landed but since it is just an optimization, it might not receive as high priority as other pending issues. I was wondering if you encounter any problem that cannot be resolved by the patch for MESOS-. Could you provide more information to help us understand the severity? Thanks! > Retry logic for unsuccessful `docker rm` during agent recovery > -- > > Key: MESOS-7801 > URL: https://issues.apache.org/jira/browse/MESOS-7801 > Project: Mesos > Issue Type: Improvement > Components: docker >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > > In MESOS- we skip the failure when `docker rm` fails due to mount leakage > during agent recovery. In order not to leave residual docker containers in > the docker daemon, we could do a best-effort `docker rm` retry with an > exponential backoff since we cannot control when the leakage would be > terminated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned
[ https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144062#comment-16144062 ] Ian Downes commented on MESOS-1871: --- [~xds2000] Might be the same underlying problem but this looks to be a different manifestation. This ticket is strictly about correctness of killing all processes in the container. The linked ticket is related to graceful shutdown. > Sending SIGTERM to a task command may render it orphaned > > > Key: MESOS-1871 > URL: https://issues.apache.org/jira/browse/MESOS-1871 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Alexander Rukletsov >Priority: Minor > > {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means > signals are sent to the top process—that is {{sh -c}}—and not to the task > directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process > tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates > reporting success to the {{CommandExecutor}}, rendering the task detached > from the parent process and still running. Because the {{CommandExecutor}} > thinks the command terminated normally, its OS process exits normally and may > not trigger containerizer's escalation which destroys cgroups. > Here is the test related to the first part: > [https://gist.github.com/rukletsov/68259dfb02421813f9e6]. > Here is the test related to the second part: > [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
[ https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143936#comment-16143936 ] James Peach commented on MESOS-7643: No, it is still not preserving the order. > The order of isolators provided in '--isolation' flag is not preserved and > instead sorted alphabetically > > > Key: MESOS-7643 > URL: https://issues.apache.org/jira/browse/MESOS-7643 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.2, 1.2.0, 1.3.0 >Reporter: Michael Cherny >Assignee: James Peach >Priority: Critical > Labels: isolation > > According to documentation and comments in code the order of the entries in > the --isolation flag should specify the ordering of the isolators. > Specifically, the `create` and `prepare` calls for each isolator should run > serially in the order in which they appear in the --isolation flag, while the > `cleanup` call should be serialized in reverse order (with exception of > filesystem isolator which is always first). > But in fact, the isolators provided in '--isolation' flag are sorted > alphabetically. > That happens in [this line of > code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377]. > In this line use of 'set' is done (apparently instead of list or > vector) and set is a sorted container. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6428) Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe
[ https://issues.apache.org/jira/browse/MESOS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143878#comment-16143878 ] Andrei Budnik commented on MESOS-6428: -- https://reviews.apache.org/r/61800/ > Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe > > > Key: MESOS-6428 > URL: https://issues.apache.org/jira/browse/MESOS-6428 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.0 >Reporter: Benjamin Bannier >Assignee: Jing Chen > Labels: newbie, tech-debt > > In {{src/slave/containerizer/mesos/launch.cpp}} a helper function > {{signalSafeWriteStatus}} is defined. Its name seems to suggest that this > function is safe to call in e.g., signal handlers, and it is used in this > file's {{signalHandler}} for exactly that purpose. > Currently this function is not AS-Safe since it e.g., allocates memory via > construction of {{string}} instances, and might destructively modify > {{errno}}. > We should clean up this function to be in fact AS-Safe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7088) Support private registry credential per container.
[ https://issues.apache.org/jira/browse/MESOS-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7088: -- Fix Version/s: 1.4.0 > Support private registry credential per container. > -- > > Key: MESOS-7088 > URL: https://issues.apache.org/jira/browse/MESOS-7088 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: containerizer > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7909) Ordering dependency between 'linux/capabilities' and 'docker/runtime' isolator.
[ https://issues.apache.org/jira/browse/MESOS-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7909: -- Fix Version/s: (was: 1.4.1) (was: 1.5.0) 1.4.0 > Ordering dependency between 'linux/capabilities' and 'docker/runtime' > isolator. > --- > > Key: MESOS-7909 > URL: https://issues.apache.org/jira/browse/MESOS-7909 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.2, 1.3.1, 1.4.0 >Reporter: Jie Yu >Assignee: Jie Yu > Fix For: 1.2.3, 1.3.2, 1.4.0 > > > Looks like there is an unintentional ordering dependency between > linux/capabilities isolator and docker/runtime isolator. > For the command task case, since both isolators set > ContainerLaunchInfo.command. When merging ContainerLaunchInfo.command, > docker/runtime isolator assumes its command is before linux/capabilities > isolator's command because 'mesos-execute' should be used as argv[0]. > We should try to eliminate this dependency. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7863) Agent may drop pending kill task status updates.
[ https://issues.apache.org/jira/browse/MESOS-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7863: -- Fix Version/s: (was: 1.4.1) 1.4.0 > Agent may drop pending kill task status updates. > > > Key: MESOS-7863 > URL: https://issues.apache.org/jira/browse/MESOS-7863 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler >Priority: Critical > Fix For: 1.1.3, 1.2.3, 1.3.2, 1.4.0 > > > Currently there is an assumption that when a pending task is killed, the > framework will still be stored in the agent. However, this assumption can be > violated in two cases: > # Another pending task was killed and we removed the framework in > 'Slave::run' thinking it was idle, because pending tasks were empty (we > remove from pending tasks when processing the kill). (MESOS-7783 is an > example instance of this). > # The last executor terminated without tasks to send terminal updates for, or > the last terminated executor received its last acknowledgement. At this > point, we remove the framework thinking there were no pending tasks if the > task was killed (removed from pending). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7744: -- Fix Version/s: (was: 1.4.1) 1.4.0 > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Critical > Labels: reliability > Fix For: 1.1.3, 1.2.3, 1.3.2, 1.4.0 > > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7783) Framework might not receive status update when a just launched task is killed immediately
[ https://issues.apache.org/jira/browse/MESOS-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7783: -- Fix Version/s: (was: 1.4.1) 1.4.0 > Framework might not receive status update when a just launched task is killed > immediately > - > > Key: MESOS-7783 > URL: https://issues.apache.org/jira/browse/MESOS-7783 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.2.0 >Reporter: Benjamin Bannier >Assignee: Benjamin Mahler >Priority: Critical > Labels: reliability > Fix For: 1.1.3, 1.2.3, 1.3.2, 1.4.0 > > Attachments: GroupDeployIntegrationTest.log.zip, logs > > > Our Marathon team are seeing issues in their integration test suite when > Marathon gets stuck in an infinite loop trying to kill a just launched task. > In their test a task launched which is immediately followed by killing the > task -- the framework does e.g., not wait for any task status update. > In this case the launch and kill messages arrive at the agent in the correct > order, but both the launch and kill paths in the agent do not reach the point > where a status update is sent to the framework. Since the framework has seen > no status update on the task it re-triggers a kill, causing an infinite loop. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7865) Agent may process a kill task and still launch the task.
[ https://issues.apache.org/jira/browse/MESOS-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7865: -- Fix Version/s: (was: 1.4.1) 1.4.0 > Agent may process a kill task and still launch the task. > > > Key: MESOS-7865 > URL: https://issues.apache.org/jira/browse/MESOS-7865 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler >Priority: Critical > Fix For: 1.1.3, 1.2.3, 1.3.2, 1.4.0 > > > Based on the investigation of MESOS-7744, the agent has a race in which > "queued" tasks can still be launched after the agent has processed a kill > task for them. This race was introduced when {{Slave::statusUpdate}} was made > asynchronous: > (1) {{Slave::__run}} completes, task is now within {{Executor::queuedTasks}} > (2) {{Slave::killTask}} locates the executor based on the task ID residing in > queuedTasks, calls {{Slave::statusUpdate()}} with {{TASK_KILLED}} > (3) {{Slave::___run}} assumes that killed tasks have been removed from > {{Executor::queuedTasks}}, but this now occurs asynchronously in > {{Slave::_statusUpdate}}. So, the executor still sees the queued task and > delivers it and adds the task to {{Executor::launchedTasks}}. > (3) {{Slave::_statusUpdate}} runs, removes the task from > {{Executor::launchedTasks}} and adds it to {{Executor::terminatedTasks}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7586) Make use of cout/cerr and glog consistent.
[ https://issues.apache.org/jira/browse/MESOS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik updated MESOS-7586: - Description: Some parts of mesos use glog before initialization of glog, hence messages via glog might not end up in a logdir: bq. WARNING: Logging before InitGoogleLogging() is written to STDERR The solution might be: {{cout/cerr}} should be used before logging initialization. {{glog}} should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. In addition, initialization of glog should be added, where it's necessary. was: Some parts of mesos use glog before initialization of glog. This leads to message like: bq. WARNING: Logging before InitGoogleLogging() is written to STDERR Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: {{cout/cerr}} should be used before logging initialization. {{glog}} should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. In addition, initialization of glog should be added, where it's necessary. > Make use of cout/cerr and glog consistent. > -- > > Key: MESOS-7586 > URL: https://issues.apache.org/jira/browse/MESOS-7586 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Armand Grillet >Priority: Minor > Labels: debugging, log, newbie > > Some parts of mesos use glog before initialization of glog, hence messages > via glog might not end up in a logdir: > bq. WARNING: Logging before InitGoogleLogging() is written to STDERR > The solution might be: > {{cout/cerr}} should be used before logging initialization. > {{glog}} should be used after logging initialization. > > Usually, main function has initialization pattern like: > # load = flags.load(argc, argv) // Load flags from command line. > # Check if flags are correct, otherwise print error message to cerr and then > exit. > # Check if user passed --help flag to print help message to cout and then > exit. > # Parsing and setup of environment variables. If this fails, EXIT macro is > used to print error message via glog. > # process::initialize() > # logging::initialize() > > Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information > generated by glog like current time, date and log level. > It would be preferable to move step 6 between steps 3 and 4 safely, because > {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. > In addition, initialization of glog should be added, where it's necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7586) Make use of cout/cerr and glog consistent.
[ https://issues.apache.org/jira/browse/MESOS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik updated MESOS-7586: - Description: Some parts of mesos use glog before initialization of glog. This leads to message like: bq. WARNING: Logging before InitGoogleLogging() is written to STDERR Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: {{cout/cerr}} should be used before logging initialization. {{glog}} should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. In addition, initialization of glog should be added, where it's necessary. was: Some parts of mesos use glog before initialization of glog. This leads to message like: bq. WARNING: Logging before InitGoogleLogging() is written to STDERR Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: {{cout/cerr}} should be used before logging initialization. {{glog}} should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. In addition, initialization of glog should be added, where it necessary. > Make use of cout/cerr and glog consistent. > -- > > Key: MESOS-7586 > URL: https://issues.apache.org/jira/browse/MESOS-7586 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Armand Grillet >Priority: Minor > Labels: debugging, log, newbie > > Some parts of mesos use glog before initialization of glog. This leads to > message like: > bq. WARNING: Logging before InitGoogleLogging() is written to STDERR > Also, messages via glog before logging is initialized might not end up in a > logdir. > > The solution might be: > {{cout/cerr}} should be used before logging initialization. > {{glog}} should be used after logging initialization. > > Usually, main function has initialization pattern like: > # load = flags.load(argc, argv) // Load flags from command line. > # Check if flags are correct, otherwise print error message to cerr and then > exit. > # Check if user passed --help flag to print help message to cout and then > exit. > # Parsing and setup of environment variables. If this fails, EXIT macro is > used to print error message via glog. > # process::initialize() > # logging::initialize() > > Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information > generated by glog like current time, date and log level. > It would be preferable to move step 6 between steps 3 and 4 safely, because > {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. > In addition, initialization of glog should be added, where it's necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7586) Make use of cout/cerr and glog consistent.
[ https://issues.apache.org/jira/browse/MESOS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik updated MESOS-7586: - Description: Some parts of mesos use glog before initialization of glog. This leads to message like: bq. WARNING: Logging before InitGoogleLogging() is written to STDERR Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: {{cout/cerr}} should be used before logging initialization. {{glog}} should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. In addition, initialization of glog should be added, where it necessary. was: Some parts of mesos use glog before initialization of glog. This leads to message like: “WARNING: Logging before InitGoogleLogging() is written to STDERR” Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: {{cout/cerr}} should be used before logging initialization. {{glog}} should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. In addition, initialization of glog should be added, where it necessary. > Make use of cout/cerr and glog consistent. > -- > > Key: MESOS-7586 > URL: https://issues.apache.org/jira/browse/MESOS-7586 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Armand Grillet >Priority: Minor > Labels: debugging, log, newbie > > Some parts of mesos use glog before initialization of glog. This leads to > message like: > bq. WARNING: Logging before InitGoogleLogging() is written to STDERR > Also, messages via glog before logging is initialized might not end up in a > logdir. > > The solution might be: > {{cout/cerr}} should be used before logging initialization. > {{glog}} should be used after logging initialization. > > Usually, main function has initialization pattern like: > # load = flags.load(argc, argv) // Load flags from command line. > # Check if flags are correct, otherwise print error message to cerr and then > exit. > # Check if user passed --help flag to print help message to cout and then > exit. > # Parsing and setup of environment variables. If this fails, EXIT macro is > used to print error message via glog. > # process::initialize() > # logging::initialize() > > Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information > generated by glog like current time, date and log level. > It would be preferable to move step 6 between steps 3 and 4 safely, because > {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. > In addition, initialization of glog should be added, where it necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7586) Make use of cout/cerr and glog consistent.
[ https://issues.apache.org/jira/browse/MESOS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik updated MESOS-7586: - Description: Some parts of mesos use glog before initialization of glog. This leads to message like: “WARNING: Logging before InitGoogleLogging() is written to STDERR” Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: {{cout/cerr}} should be used before logging initialization. {{glog}} should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. In addition, initialization of glog should be added, where it necessary. was: Some parts of mesos use glog before initialization of glog. This leads to message like: “WARNING: Logging before InitGoogleLogging() is written to STDERR” Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: cout/cerr should be used before logging initialization. glog should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use cout/cerr to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on process::initialize(). Some parts of mesos don’t call logging::initialize(). This should also be fixed. > Make use of cout/cerr and glog consistent. > -- > > Key: MESOS-7586 > URL: https://issues.apache.org/jira/browse/MESOS-7586 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Armand Grillet >Priority: Minor > Labels: debugging, log, newbie > > Some parts of mesos use glog before initialization of glog. This leads to > message like: > “WARNING: Logging before InitGoogleLogging() is written to STDERR” > Also, messages via glog before logging is initialized might not end up in a > logdir. > > The solution might be: > {{cout/cerr}} should be used before logging initialization. > {{glog}} should be used after logging initialization. > > Usually, main function has initialization pattern like: > # load = flags.load(argc, argv) // Load flags from command line. > # Check if flags are correct, otherwise print error message to cerr and then > exit. > # Check if user passed --help flag to print help message to cout and then > exit. > # Parsing and setup of environment variables. If this fails, EXIT macro is > used to print error message via glog. > # process::initialize() > # logging::initialize() > > Steps 2 and 3 should use {{cout/cerr}} to eliminate any extra information > generated by glog like current time, date and log level. > It would be preferable to move step 6 between steps 3 and 4 safely, because > {{logging::initialize()}} doesn’t depend on {{process::initialize()}}. > In addition, initialization of glog should be added, where it necessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7586) Make use of cout/cerr and glog consistent.
[ https://issues.apache.org/jira/browse/MESOS-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik updated MESOS-7586: - Description: Some parts of mesos use glog before initialization of glog. This leads to message like: “WARNING: Logging before InitGoogleLogging() is written to STDERR” Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: cout/cerr should be used before logging initialization. glog should be used after logging initialization. Usually, main function has initialization pattern like: # load = flags.load(argc, argv) // Load flags from command line. # Check if flags are correct, otherwise print error message to cerr and then exit. # Check if user passed --help flag to print help message to cout and then exit. # Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. # process::initialize() # logging::initialize() Steps 2 and 3 should use cout/cerr to eliminate any extra information generated by glog like current time, date and log level. It would be preferable to move step 6 between steps 3 and 4 safely, because {{logging::initialize()}} doesn’t depend on process::initialize(). Some parts of mesos don’t call logging::initialize(). This should also be fixed. was: Some parts of mesos use glog before initialization of glog. This leads to message like: “WARNING: Logging before InitGoogleLogging() is written to STDERR” Also, messages via glog before logging is initialized might not end up in a logdir. The solution might be: cout/cerr should be used before logging initialization. glog should be used after logging initialization. Usually, main function has pattern like: 1. load = flags.load(argc, argv) // Load flags from command line. 2. Check if flags are correct, otherwise print error message to cerr and then exit. 3. Check if user passed --help flag to print help message to cout and then exit. 4. Parsing and setup of environment variables. If this fails, EXIT macro is used to print error message via glog. 5. process::initialize() 6. logging::initialize() 7. ... Steps 2 and 3 should use cout/cerr to eliminate any extra information generated by glog like current time, date and log level. It is possible to move step 6 between steps 3 and 4 safely, because logging::initialize() doesn’t depend on process::initialize(). Some parts of mesos don’t call logging::initialize(). This should also be fixed. > Make use of cout/cerr and glog consistent. > -- > > Key: MESOS-7586 > URL: https://issues.apache.org/jira/browse/MESOS-7586 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Budnik >Assignee: Armand Grillet >Priority: Minor > Labels: debugging, log, newbie > > Some parts of mesos use glog before initialization of glog. This leads to > message like: > “WARNING: Logging before InitGoogleLogging() is written to STDERR” > Also, messages via glog before logging is initialized might not end up in a > logdir. > > The solution might be: > cout/cerr should be used before logging initialization. > glog should be used after logging initialization. > > Usually, main function has initialization pattern like: > # load = flags.load(argc, argv) // Load flags from command line. > # Check if flags are correct, otherwise print error message to cerr and then > exit. > # Check if user passed --help flag to print help message to cout and then > exit. > # Parsing and setup of environment variables. If this fails, EXIT macro is > used to print error message via glog. > # process::initialize() > # logging::initialize() > > Steps 2 and 3 should use cout/cerr to eliminate any extra information > generated by glog like current time, date and log level. > It would be preferable to move step 6 between steps 3 and 4 safely, because > {{logging::initialize()}} doesn’t depend on process::initialize(). > Some parts of mesos don’t call logging::initialize(). This should also be > fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
[ https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143589#comment-16143589 ] Deshi Xiao commented on MESOS-7643: --- this issue already fixed ? https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L156 > The order of isolators provided in '--isolation' flag is not preserved and > instead sorted alphabetically > > > Key: MESOS-7643 > URL: https://issues.apache.org/jira/browse/MESOS-7643 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.2, 1.2.0, 1.3.0 >Reporter: Michael Cherny >Assignee: James Peach >Priority: Critical > Labels: isolation > > According to documentation and comments in code the order of the entries in > the --isolation flag should specify the ordering of the isolators. > Specifically, the `create` and `prepare` calls for each isolator should run > serially in the order in which they appear in the --isolation flag, while the > `cleanup` call should be serialized in reverse order (with exception of > filesystem isolator which is always first). > But in fact, the isolators provided in '--isolation' flag are sorted > alphabetically. > That happens in [this line of > code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377]. > In this line use of 'set' is done (apparently instead of list or > vector) and set is a sorted container. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery
[ https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143577#comment-16143577 ] Deshi Xiao commented on MESOS-7801: --- [~jieyu] does this patch can summited? > Retry logic for unsuccessful `docker rm` during agent recovery > -- > > Key: MESOS-7801 > URL: https://issues.apache.org/jira/browse/MESOS-7801 > Project: Mesos > Issue Type: Improvement > Components: docker >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > > In MESOS- we skip the failure when `docker rm` fails due to mount leakage > during agent recovery. In order not to leave residual docker containers in > the docker daemon, we could do a best-effort `docker rm` retry with an > exponential backoff since we cannot control when the leakage would be > terminated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned
[ https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143572#comment-16143572 ] Deshi Xiao commented on MESOS-1871: --- [~idownes] this ticket is duplicated by MESOS-6933, close it ? > Sending SIGTERM to a task command may render it orphaned > > > Key: MESOS-1871 > URL: https://issues.apache.org/jira/browse/MESOS-1871 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Alexander Rukletsov >Priority: Minor > > {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means > signals are sent to the top process—that is {{sh -c}}—and not to the task > directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process > tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates > reporting success to the {{CommandExecutor}}, rendering the task detached > from the parent process and still running. Because the {{CommandExecutor}} > thinks the command terminated normally, its OS process exits normally and may > not trigger containerizer's escalation which destroys cgroups. > Here is the test related to the first part: > [https://gist.github.com/rukletsov/68259dfb02421813f9e6]. > Here is the test related to the second part: > [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6615) Running mesos-slave in the docker that leave many zombie process
[ https://issues.apache.org/jira/browse/MESOS-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143516#comment-16143516 ] Stefan Eder commented on MESOS-6615: Maybe this helps: https://github.com/mesosphere/docker-containers/issues/9 (it's not a bug per se, you just need to run the agent docker container with --pid=host) > Running mesos-slave in the docker that leave many zombie process > > > Key: MESOS-6615 > URL: https://issues.apache.org/jira/browse/MESOS-6615 > Project: Mesos > Issue Type: Bug > Components: agent, containerization >Affects Versions: 0.28.2 > Environment: Mesos 0.28.2 > Docker 1.12.1 >Reporter: Lei Xu >Priority: Critical > > Here are some zombie process if I run mesos-slave in the docker. > {code} > root 10547 19464 0 Oct25 ?00:00:00 [docker] > root 14505 19464 0 Oct25 ?00:00:00 [docker] > root 16069 19464 0 Oct25 ?00:00:00 [docker] > root 19962 19464 0 Oct25 ?00:00:00 [docker] > root 23346 19464 0 Oct25 ?00:00:00 [docker] > root 24544 19464 0 Oct25 ?00:00:00 [docker] > {code} > And I find the zombies come from {{mesos-slave}} process: > {code} > pstree -p -s 10547 > systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547) > {code} > The logs has been deleted by the cron job a few weeks ago, but I remember so > many {{Failed to shutdown socket with fd xx: Transport endpoint is not > connected}} in the log. -- This message was sent by Atlassian JIRA (v6.4.14#64029)