[jira] [Created] (MESOS-7920) Mesos Agents Crash If Loss Connection to ZK
Miguel Bernadin created MESOS-7920: -- Summary: Mesos Agents Crash If Loss Connection to ZK Key: MESOS-7920 URL: https://issues.apache.org/jira/browse/MESOS-7920 Project: Mesos Issue Type: Bug Components: agent Affects Versions: 1.3.1 Reporter: Miguel Bernadin Assignee: Vinod Kone There is an issue where by if mesos agents are dead because they lost access to the zookeeper quorum. Once when the the {{dcos-mesos-slave.service}} main process exited, code=killed, status=6/ABRT. _*Mesos Agents Exiting with Loss of ZK Connectivity*_ {code:java} mesos-slave[12971]: 2017-08-09 20:12:44,698:12971(0x7fd2161d3700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=leader.mesos:2181 sessionTimeout=1 watcher=0x7fd22188d250 sessionId=0 sessionPasswd= con mesos-slave[12971]: 2017-08-09 20:12:44,713:12971(0x7fd2161d3700):ZOO_ERROR@getaddrs@599: getaddrinfo: No such file or directory mesos-slave[12971]: F0809 20:12:44.713604 12988 zookeeper.cpp:132] Failed to create ZooKeeper, zookeeper_init: No such file or directory [2] mesos-slave[12971]: *** Check failure stack trace: *** mesos-slave[12971]: @ 0x7fd22075a9fd google::LogMessage::Fail() mesos-slave[12971]: @ 0x7fd22075c89d google::LogMessage::SendToLog() mesos-slave[12971]: @ 0x7fd22075a5ec google::LogMessage::Flush() mesos-slave[12971]: @ 0x7fd22075a7f9 google::LogMessage::~LogMessage() mesos-slave[12971]: @ 0x7fd22075b76e google::ErrnoLogMessage::~ErrnoLogMessage() mesos-slave[12971]: @ 0x7fd22188daf3 ZooKeeperProcess::initialize() mesos-slave[12971]: @ 0x7fd221c424c1 process::ProcessManager::resume() mesos-slave[12971]: @ 0x7fd221c42777 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv mesos-slave[12971]: @ 0x7fd2201fad73 (unknown) mesos-slave[12971]: @ 0x7fd21fcfb52c (unknown) mesos-slave[12971]: @ 0x7fd21fa391dd (unknown) systemd[1]: dcos-mesos-slave.service: Main process exited, code=killed, status=6/ABRT systemd[1]: dcos-mesos-slave.service: Unit entered failed state. systemd[1]: dcos-mesos-slave.service: Failed with result 'signal'. systemd[1]: dcos-mesos-slave.service: Service hold-off time over, scheduling restart.{code} *NEXT STEP* Determine if we can change the behavior how Mesos responds to loss of access to ZK. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI
[ https://issues.apache.org/jira/browse/MESOS-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin reassigned MESOS-6369: -- Assignee: Miguel Bernadin > Add a column for FrameworkID when displaying tasks in the WebUI > --- > > Key: MESOS-6369 > URL: https://issues.apache.org/jira/browse/MESOS-6369 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Joseph Wu >Assignee: Miguel Bernadin >Priority: Minor > Labels: mesosphere, newbie > > The Mesos Web UI home page shows a list of active/completed/orphan tasks > tasks like this: > || ID || Name || State || Started || Host || || > | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > When you start multiple frameworks, the task IDs and names show in the UI may > be ambiguous, requiring extra clicks/investigation to disambiguate. > In the above case, to disambiguate between the two tasks with ID {{1}}, the > user would need to navigate to each sandbox and check the associated > frameworkID in the {{/browse}} view. > We could add a column showing the {{FrameworkID}} next to each task: > || Framework || ID || Name || State || Started || Host || || > | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > The {{FrameworkID}} s could be links to the associated framework > {code} > > {{framework.id | truncateMesosID}} > > {code} > - > This involves additions to three tables: > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L152-L157 > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L199-L205 > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L246-L252 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI
[ https://issues.apache.org/jira/browse/MESOS-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623594#comment-15623594 ] Miguel Bernadin commented on MESOS-6369: [~kaysoky], Sumbitted for review https://reviews.apache.org/r/53324/ to this that changes that look like this with modified field names: || Framework ID || Task ID || Task Name || State || Started || Host || || | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > Add a column for FrameworkID when displaying tasks in the WebUI > --- > > Key: MESOS-6369 > URL: https://issues.apache.org/jira/browse/MESOS-6369 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Joseph Wu >Assignee: Miguel Bernadin >Priority: Minor > Labels: mesosphere, newbie > > The Mesos Web UI home page shows a list of active/completed/orphan tasks > tasks like this: > || ID || Name || State || Started || Host || || > | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > When you start multiple frameworks, the task IDs and names show in the UI may > be ambiguous, requiring extra clicks/investigation to disambiguate. > In the above case, to disambiguate between the two tasks with ID {{1}}, the > user would need to navigate to each sandbox and check the associated > frameworkID in the {{/browse}} view. > We could add a column showing the {{FrameworkID}} next to each task: > || Framework || ID || Name || State || Started || Host || || > | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > The {{FrameworkID}} s could be links to the associated framework > {code} > > {{framework.id | truncateMesosID}} > > {code} > - > This involves additions to three tables: > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L152-L157 > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L199-L205 > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L246-L252 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI
[ https://issues.apache.org/jira/browse/MESOS-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623594#comment-15623594 ] Miguel Bernadin edited comment on MESOS-6369 at 10/31/16 10:21 PM: --- [~kaysoky], Submitted for review https://reviews.apache.org/r/53324/ to this that changes that look like this with modified field names: || Framework ID || Task ID || Task Name || State || Started || Host || || | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | was (Author: bernadinm): [~kaysoky], Sumbitted for review https://reviews.apache.org/r/53324/ to this that changes that look like this with modified field names: || Framework ID || Task ID || Task Name || State || Started || Host || || | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > Add a column for FrameworkID when displaying tasks in the WebUI > --- > > Key: MESOS-6369 > URL: https://issues.apache.org/jira/browse/MESOS-6369 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Joseph Wu >Assignee: Miguel Bernadin >Priority: Minor > Labels: mesosphere, newbie > > The Mesos Web UI home page shows a list of active/completed/orphan tasks > tasks like this: > || ID || Name || State || Started || Host || || > | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | > Sandbox | > When you start multiple frameworks, the task IDs and names show in the UI may > be ambiguous, requiring extra clicks/investigation to disambiguate. > In the above case, to disambiguate between the two tasks with ID {{1}}, the > user would need to navigate to each sandbox and check the associated > frameworkID in the {{/browse}} view. > We could add a column showing the {{FrameworkID}} next to each task: > || Framework || ID || Name || State || Started || Host || || > | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | > RUNNING | 1 minute ago | 10.10.0.1 | Sandbox | > The {{FrameworkID}} s could be links to the associated framework > {code} > > {{framework.id | truncateMesosID}} > > {code} > - > This involves additions to three tables: > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L152-L157 > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L199-L205 > https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L246-L252 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6598) Broken Link Framework Development Page
Miguel Bernadin created MESOS-6598: -- Summary: Broken Link Framework Development Page Key: MESOS-6598 URL: https://issues.apache.org/jira/browse/MESOS-6598 Project: Mesos Issue Type: Bug Components: project website Reporter: Miguel Bernadin Priority: Trivial http://mesos.apache.org/documentation/latest/app-framework-development-guide/ The link to this page is broken: Create your Framework Scheduler If you are writing a scheduler against Mesos 1.0 or newer, it is recommended to use the new HTTP API (BROKEN LINK) to talk to Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6598) Broken Link Framework Development Page
[ https://issues.apache.org/jira/browse/MESOS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin reassigned MESOS-6598: -- Assignee: Miguel Bernadin > Broken Link Framework Development Page > -- > > Key: MESOS-6598 > URL: https://issues.apache.org/jira/browse/MESOS-6598 > Project: Mesos > Issue Type: Bug > Components: project website >Reporter: Miguel Bernadin >Assignee: Miguel Bernadin >Priority: Trivial > > http://mesos.apache.org/documentation/latest/app-framework-development-guide/ > The link to this page is broken: > Create your Framework Scheduler > If you are writing a scheduler against Mesos 1.0 or newer, it is recommended > to use the new HTTP API (BROKEN LINK) to talk to Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6598) Broken Link Framework Development Page
[ https://issues.apache.org/jira/browse/MESOS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672212#comment-15672212 ] Miguel Bernadin commented on MESOS-6598: https://reviews.apache.org/r/53832/ > Broken Link Framework Development Page > -- > > Key: MESOS-6598 > URL: https://issues.apache.org/jira/browse/MESOS-6598 > Project: Mesos > Issue Type: Bug > Components: project website >Reporter: Miguel Bernadin >Assignee: Miguel Bernadin >Priority: Trivial > > http://mesos.apache.org/documentation/latest/app-framework-development-guide/ > The link to this page is broken: > Create your Framework Scheduler > If you are writing a scheduler against Mesos 1.0 or newer, it is recommended > to use the new HTTP API (BROKEN LINK) to talk to Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6598) Broken Link Framework Development Page
[ https://issues.apache.org/jira/browse/MESOS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin updated MESOS-6598: --- Assignee: Joseph Wu (was: Miguel Bernadin) > Broken Link Framework Development Page > -- > > Key: MESOS-6598 > URL: https://issues.apache.org/jira/browse/MESOS-6598 > Project: Mesos > Issue Type: Bug > Components: project website >Reporter: Miguel Bernadin >Assignee: Joseph Wu >Priority: Trivial > > http://mesos.apache.org/documentation/latest/app-framework-development-guide/ > The link to this page is broken: > Create your Framework Scheduler > If you are writing a scheduler against Mesos 1.0 or newer, it is recommended > to use the new HTTP API (BROKEN LINK) to talk to Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6841) Allow Dynamic CNI Configuration
Miguel Bernadin created MESOS-6841: -- Summary: Allow Dynamic CNI Configuration Key: MESOS-6841 URL: https://issues.apache.org/jira/browse/MESOS-6841 Project: Mesos Issue Type: Improvement Components: agent Reporter: Miguel Bernadin Priority: Minor Agents has the ability to set resources dynamically without agent restart. After seeing that CNI is configured at agent startup we could add Dynamic CNI Configuration support in the future. Creating this JIRA is to track this effort. {quote} Note that the network/cni isolator learns all the available networks by looking at the CNI configuration in the --network_cni_config_dir at startup. This implies that if a new CNI network needs to be added after Agent startup, the Agent needs to be restarted. The network/cni isolator has been designed with recover capabilities and hence restarting the Agent (and therefore the network/cni isolator) will not affect container orchestration. {quote} Sourced from http://mesos.apache.org/documentation/latest/cni/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6898) Allow Master Node Maintenance Primitives
Miguel Bernadin created MESOS-6898: -- Summary: Allow Master Node Maintenance Primitives Key: MESOS-6898 URL: https://issues.apache.org/jira/browse/MESOS-6898 Project: Mesos Issue Type: Improvement Components: master Reporter: Miguel Bernadin Assignee: Gilbert Song Currently it's very useful to allow agents to exit the cluster when using Mesos Primitives which can be helpful with adding/removing agents dynamically when replacing nodes for compliance, upgrades, or other reasons. We see there is a benefit to allow Mesos Masters on Mesos Maintenance Primitives so Masters can be dynamically removed from a cluster programmatically. Creating this JIRA so we can track this work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6898) Allow Master Node Maintenance Primitives
[ https://issues.apache.org/jira/browse/MESOS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin updated MESOS-6898: --- Affects Version/s: 1.2.0 > Allow Master Node Maintenance Primitives > - > > Key: MESOS-6898 > URL: https://issues.apache.org/jira/browse/MESOS-6898 > Project: Mesos > Issue Type: Improvement > Components: master >Affects Versions: 1.2.0 >Reporter: Miguel Bernadin >Assignee: Gilbert Song > > Currently it's very useful to allow agents to exit the cluster when using > Mesos Primitives which can be helpful with adding/removing agents dynamically > when replacing nodes for compliance, upgrades, or other reasons. We see there > is a benefit to allow Mesos Masters on Mesos Maintenance Primitives so > Masters can be dynamically removed from a cluster programmatically. > Creating this JIRA so we can track this work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems
Miguel Bernadin created MESOS-3586: -- Summary: Installing Mesos 0.24.0 on multiple systems Key: MESOS-3586 URL: https://issues.apache.org/jira/browse/MESOS-3586 Project: Mesos Issue Type: Bug Affects Versions: 0.24.0 Environment: Ubuntu 14.04, 3.13.0-32 generic Reporter: Miguel Bernadin I am install Mesos 0.24.0 on 4 servers which have very similar hardware and software configurations. After performing ../configure, make, and make check some servers have completed successfully and other failed on test [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. Is there something I should check in this test? PERFORMED MAKE CHECK NODE-001 [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 20151005-143735-2393768202-35106-27900-S0 Registered executor on svdidac038.techlabs.accenture.com Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 Forked command at 38510 sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' PERFORMED MAKE CHECK NODE-002 [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 20151005-143857-2360213770-50427-26325-S0 Registered executor on svdidac039.techlabs.accenture.com Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' Forked command at 37028 ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure Expected: (usage.get().mem_medium_pressure_counter()) >= (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 2015-10-05 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server refused to accept the client [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin updated MESOS-3586: --- Summary: Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (was: Installing Mesos 0.24.0 on multiple systems) > Installing Mesos 0.24.0 on multiple systems. Failed test on > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > --- > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic >Reporter: Miguel Bernadin > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6098) Frameworks UI shows metrics for used plus offers
Miguel Bernadin created MESOS-6098: -- Summary: Frameworks UI shows metrics for used plus offers Key: MESOS-6098 URL: https://issues.apache.org/jira/browse/MESOS-6098 Project: Mesos Issue Type: Improvement Components: webui Affects Versions: 1.0.1 Reporter: Miguel Bernadin Assignee: Miguel Bernadin Priority: Minor When a framework is receiving many offers and it is denying them, the frameworks UI will show the metrics fluctuating for mem, cpu, gpu, and disk. >From a mesos perspective, those offers are given to the framework until the >framework declines them, so depending on the time the mesos UI gets updated, >its has combined all the used resources and offers (that have not been >accepted) to the framework and is reflected on the framework UI. If a >framework does not implement suppressiveOffers(), it will continue to deny >offers from mesos, which leads to the sporadic changes of metrics on the >framework UI. >From the operator's perspective, the user would expect to see used resources >consumed by the framework. Any offered resources can be viewed instead by >Mesos's Offers tab. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6098) Frameworks UI shows metrics for used resources plus offers
[ https://issues.apache.org/jira/browse/MESOS-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin updated MESOS-6098: --- Summary: Frameworks UI shows metrics for used resources plus offers (was: Frameworks UI shows metrics for used plus offers) > Frameworks UI shows metrics for used resources plus offers > -- > > Key: MESOS-6098 > URL: https://issues.apache.org/jira/browse/MESOS-6098 > Project: Mesos > Issue Type: Improvement > Components: webui >Affects Versions: 1.0.1 >Reporter: Miguel Bernadin >Assignee: Miguel Bernadin >Priority: Minor > > When a framework is receiving many offers and it is denying them, the > frameworks UI will show the metrics fluctuating for mem, cpu, gpu, and disk. > From a mesos perspective, those offers are given to the framework until the > framework declines them, so depending on the time the mesos UI gets updated, > its has combined all the used resources and offers (that have not been > accepted) to the framework and is reflected on the framework UI. If a > framework does not implement suppressiveOffers(), it will continue to deny > offers from mesos, which leads to the sporadic changes of metrics on the > framework UI. > From the operator's perspective, the user would expect to see used resources > consumed by the framework. Any offered resources can be viewed instead by > Mesos's Offers tab. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6098) Frameworks UI shows metrics for used resources plus offers
[ https://issues.apache.org/jira/browse/MESOS-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin updated MESOS-6098: --- Assignee: Joseph Wu (was: Miguel Bernadin) > Frameworks UI shows metrics for used resources plus offers > -- > > Key: MESOS-6098 > URL: https://issues.apache.org/jira/browse/MESOS-6098 > Project: Mesos > Issue Type: Improvement > Components: webui >Affects Versions: 1.0.1 >Reporter: Miguel Bernadin >Assignee: Joseph Wu >Priority: Minor > > When a framework is receiving many offers and it is denying them, the > frameworks UI will show the metrics fluctuating for mem, cpu, gpu, and disk. > From a mesos perspective, those offers are given to the framework until the > framework declines them, so depending on the time the mesos UI gets updated, > its has combined all the used resources and offers (that have not been > accepted) to the framework and is reflected on the framework UI. If a > framework does not implement suppressiveOffers(), it will continue to deny > offers from mesos, which leads to the sporadic changes of metrics on the > framework UI. > From the operator's perspective, the user would expect to see used resources > consumed by the framework. Any offered resources can be viewed instead by > Mesos's Offers tab. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2982) Make Check Fails on RHEL 6
Miguel Bernadin created MESOS-2982: -- Summary: Make Check Fails on RHEL 6 Key: MESOS-2982 URL: https://issues.apache.org/jira/browse/MESOS-2982 Project: Mesos Issue Type: Bug Components: build Affects Versions: 0.22.1 Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.6 (Santiago) Reporter: Miguel Bernadin After downloading Mesos 22.1 and attemted to build it, I've encountered failrues on the build process below: FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms) [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total) [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6
[ https://issues.apache.org/jira/browse/MESOS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611127#comment-14611127 ] Miguel Bernadin commented on MESOS-2982: After reading further, I've attempted to skip this Perf checking process since I've read that Perf is specific to the kernel version and different versions have different flags and output formats. Specifically, the code requires a kernel release >= 2.6.39 but I am running a 2.6.32 kernel: my version of perf is not currently supported and I should skip those tests. The only effect of this is that I cannot use the optional perf_event isolator. That said, I've attempted these commands below to try to skip it but, it seems to ignore my environment variables and flags: build]# GTEST_FILTER="-Perf*:-UserCgroupIsolatorTest*UserCgroup"; export GTEST_FILTER build]# make check GTEST_FILTER="$GTEST_FILTER" That failed above, then I tried this below: build]# ./bin/mesos-tests.sh --gtest_filter="-Perf*:-UserCgroupIsolatorTest*UserCgroup" > Make Check Fails on RHEL 6 > -- > > Key: MESOS-2982 > URL: https://issues.apache.org/jira/browse/MESOS-2982 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 0.22.1 > Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 > 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 6.6 (Santiago) >Reporter: Miguel Bernadin > > After downloading Mesos 22.1 and attemted to build it, I've encountered > failrues on the build process below: > FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam > = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms) > [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total) > [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = > mesos::internal::slave::CgroupsCpushareIsolatorProcess > userdel: user 'mesos.test.unprivileged.user' does not exist > [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup > -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or > directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2982) Make Check Fails on RHEL 6
[ https://issues.apache.org/jira/browse/MESOS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin updated MESOS-2982: --- Description: After downloading Mesos 22.1 and attemted to build it, I've encountered failures on the build process below: FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms) [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total) [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or directory was: After downloading Mesos 22.1 and attemted to build it, I've encountered failrues on the build process below: FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms) [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total) [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or directory > Make Check Fails on RHEL 6 > -- > > Key: MESOS-2982 > URL: https://issues.apache.org/jira/browse/MESOS-2982 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 0.22.1 > Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 > 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 6.6 (Santiago) >Reporter: Miguel Bernadin > > After downloading Mesos 22.1 and attemted to build it, I've encountered > failures on the build process below: > FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam > = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms) > [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total) > [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = > mesos::internal::slave::CgroupsCpushareIsolatorProcess > userdel: user 'mesos.test.unprivileged.user' does not exist > [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup > -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or > directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6
[ https://issues.apache.org/jira/browse/MESOS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611191#comment-14611191 ] Miguel Bernadin commented on MESOS-2982: Thanks a lot Ian! The build seem to have completed and returned these messages below. Do you know if they any of concern? 2015-07-01 16:39:23,342:90964(0x7f33f7fff700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52417] zk retcode=-4, errno=111(Connection refused): server refused to accept the client [ OK ] Strict/RegistrarTest.abort/1 (1021 ms) [--] 16 tests from Strict/RegistrarTest (27443 ms total) [--] Global test environment tear-down [==] 601 tests from 101 test cases ran. (745589 ms total) [ PASSED ] 570 tests. [ FAILED ] 31 tests, listed below: [ FAILED ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf [ FAILED ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverUnregisteredExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverTerminatedExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RecoverCompletedExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.CleanupExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RemoveNonCheckpointingFramework, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.NonCheckpointingFramework, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.NonCheckpointingSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.KillTask, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.Reboot, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.GCExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ShutdownSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ShutdownSlaveSIGUSR1, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RegisterDisconnectedSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ReconcileKillTask, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ReconcileShutdownFramework, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.ReconcileTasksMissingFromSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.SchedulerFailover, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.PartitionedSlave, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.MasterFailover, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.MultipleFrameworks, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.MultipleSlaves, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch, where TypeParam = mesos::internal::slave::MesosContainerizer [ FAILED ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward [ FAILED ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward [ FAILED ] ExamplesTest.JavaFramework 31 FAILED TESTS YOU HAVE 9 DISABLED TESTS [root@build]# > Make Check Fails on RHEL 6 > -- > > Key: MESOS-2982 > URL: https://issues.apache.org/jira/browse/MESOS-2982 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 0.22.1 > Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 > 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 6.6 (Santiago) >Reporter: Miguel Bernadin > > After downloading Mesos 22.1 and attemted to build it, I've encountered > failures on the build process below: > FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam > = mesos::internal::
[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6
[ https://issues.apache.org/jira/browse/MESOS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611216#comment-14611216 ] Miguel Bernadin commented on MESOS-2982: Thanks again. Yes, here are the test below as requested. I will be able to update the kernel to something more recent. I will do so and provide and update here. MESOS_VERBOSE=1 ./bin/mesos-tests.sh --gtest_filter="SlaveRecoveryTest/0.RecoverSlaveState" Source directory: /usr/local/mesos-0.22.1 Build directory: /usr/local/mesos-0.22.1/build - We cannot run any cgroups tests that require mounting hierarchies because you have the following hierarchies mounted: /cgroup/blkio, /cgroup/cpu, /cgroup/cpuacct, /cgroup/cpuset, /cgroup/devices, /cgroup/freezer, /cgroup/memory, /cgroup/net_cls, /tmp/mesos_test_cgroup/perf_event We'll disable the CgroupsNoHierarchyTest test fixture for now. - Note: Google Test filter = SlaveRecoveryTest/0.RecoverSlaveState-CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3 [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer ../../src/tests/mesos.cpp:501: Failure (cgroups::cleanup(hierarchy)).failure(): Failed to remove cgroup '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy [ RUN ] SlaveRecoveryTest/0.RecoverSlaveState Using temporary directory '/tmp/SlaveRecoveryTest_0_RecoverSlaveState_InpqgG' ../../src/tests/mesos.cpp:562: Failure cgroups::mount(hierarchy, subsystem): 'freezer' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'freezer' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/freezer'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:598: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy [ FAILED ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = mesos::internal::slave::MesosContainerizer (7 ms) ../../src/tests/mesos.cpp:519: Failure (cgroups::cleanup(hierarchy)).failure(): Failed to remove cgroup '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy [--] 1 test from SlaveRecoveryTest/0 (7 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (32 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = mesos::internal::slave::MesosContainerizer 1 FAILED TEST YOU HAVE 9 DISABLED TESTS > Make Check Fails on RHEL 6 > -- > > Key: MESOS-2982 > URL: https://issues.apache.org/jira/browse/MESOS-2982 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 0.22.1 > Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 > 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 6.6 (Santiago) >Reporter: Miguel Bernadin > > After downloading Mesos 22.1 and attemted to build it, I've encountered > failures on the build process below: > FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam > = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms) > [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total) > [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = > mesos::internal::slave::CgroupsCpushareIsolatorProcess > userdel: user 'mesos.test.unprivileged.user' does not exist > [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup > -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or > directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-7170) Allow for custom filters on Mesos APIs
Miguel Bernadin created MESOS-7170: -- Summary: Allow for custom filters on Mesos APIs Key: MESOS-7170 URL: https://issues.apache.org/jira/browse/MESOS-7170 Project: Mesos Issue Type: Improvement Components: HTTP API Reporter: Miguel Bernadin Assignee: Gilbert Song Priority: Minor For tasks.json API and others like state.json, etc, on larger clusters the data that Mesos master sends is quite lengthy. It would be good to provide filters in the API to allow Mesos to just send only the RUNNING tasks in the cluster so it does less work. Creating this JIRA so we can have intelligent filters to pick what data to send on the server side, rather than filtering it out on the client side. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7171) Mesos Containerizer Change Size of SHM
Miguel Bernadin created MESOS-7171: -- Summary: Mesos Containerizer Change Size of SHM Key: MESOS-7171 URL: https://issues.apache.org/jira/browse/MESOS-7171 Project: Mesos Issue Type: Improvement Reporter: Miguel Bernadin Priority: Minor like the ability to adjust the size of the shared memory device just like this can be performed on docker. For example: To be able to change this on docker you can specify how much space you would like to allocate as a parameter in the app definition in marathon. {code} "parameters": [ { "key": "shm-size", "value": "256mb" } {code} As you can see below, here is an example of a container running and how much space is available on disk reflecting this change. Modified Parameter Container: {code} { "id": "/ubuntu-withshm", "cmd": "sleep 1000\n", "cpus": 1, "mem": 128, "disk": 0, "instances": 1, "container": { "type": "DOCKER", "volumes": [], "docker": { "image": "ubuntu", "network": "HOST", "privileged": false, "parameters": [ { "key": "shm-size", "value": "256mb" } ], "forcePullImage": false } }, "portDefinitions": [ { "port": 10005, "protocol": "tcp", "labels": {} } ] } {code} Modified Parameter Container: {code} core@ip-10-0-0-19 ~ $ docker exec -it a818cf2277a5 bash root@ip-10-0-0-19:/# df -h Filesystem Size Used Avail Use% Mounted on overlay 37G 2.0G 33G 6% / tmpfs 7.4G 0 7.4G 0% /dev tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup /dev/xvdb37G 2.0G 33G 6% /etc/hostname shm 256M 0 256M 0% /dev/shm {code} Standard Container: {code} { "id": "/ubuntu-withoutshm", "cmd": "sleep 1", "cpus": 1, "mem": 128, "disk": 0, "instances": 1, "container": { "type": "DOCKER", "volumes": [], "docker": { "image": "ubuntu", "network": "HOST", "privileged": false, "parameters": [], "forcePullImage": false } }, "portDefinitions": [ { "port": 10006, "protocol": "tcp", "labels": {} } ] } {code} Standard Container: {code} root@ip-10-0-0-19:/# exit exit core@ip-10-0-0-19 ~ $ docker exec -it c85433062e78 bash root@ip-10-0-0-19:/# df -h Filesystem Size Used Avail Use% Mounted on overlay 37G 2.0G 33G 6% / tmpfs 7.4G 0 7.4G 0% /dev tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup /dev/xvdb37G 2.0G 33G 6% /etc/hostname shm 64M 0 64M 0% /dev/shm {code} How can this be done on mesos containerizer? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7171) Mesos Containerizer Change Size of SHM
[ https://issues.apache.org/jira/browse/MESOS-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Bernadin reassigned MESOS-7171: -- Assignee: Joseph Wu > Mesos Containerizer Change Size of SHM > -- > > Key: MESOS-7171 > URL: https://issues.apache.org/jira/browse/MESOS-7171 > Project: Mesos > Issue Type: Improvement >Reporter: Miguel Bernadin >Assignee: Joseph Wu >Priority: Minor > > like the ability to adjust the size of the shared memory device just like > this can be performed on docker. > For example: To be able to change this on docker you can specify how much > space you would like to allocate as a parameter in the app definition in > marathon. > {code} > "parameters": [ > { > "key": "shm-size", > "value": "256mb" > } > {code} > As you can see below, here is an example of a container running and how much > space is available on disk reflecting this change. > Modified Parameter Container: > {code} > { > "id": "/ubuntu-withshm", > "cmd": "sleep 1000\n", > "cpus": 1, > "mem": 128, > "disk": 0, > "instances": 1, > "container": { > "type": "DOCKER", > "volumes": [], > "docker": { > "image": "ubuntu", > "network": "HOST", > "privileged": false, > "parameters": [ > { > "key": "shm-size", > "value": "256mb" > } > ], > "forcePullImage": false > } > }, > "portDefinitions": [ > { > "port": 10005, > "protocol": "tcp", > "labels": {} > } > ] > } > {code} > Modified Parameter Container: > {code} > core@ip-10-0-0-19 ~ $ docker exec -it a818cf2277a5 bash > root@ip-10-0-0-19:/# df -h > Filesystem Size Used Avail Use% Mounted on > overlay 37G 2.0G 33G 6% / > tmpfs 7.4G 0 7.4G 0% /dev > tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup > /dev/xvdb37G 2.0G 33G 6% /etc/hostname > shm 256M 0 256M 0% /dev/shm > {code} > Standard Container: > {code} > { > "id": "/ubuntu-withoutshm", > "cmd": "sleep 1", > "cpus": 1, > "mem": 128, > "disk": 0, > "instances": 1, > "container": { > "type": "DOCKER", > "volumes": [], > "docker": { > "image": "ubuntu", > "network": "HOST", > "privileged": false, > "parameters": [], > "forcePullImage": false > } > }, > "portDefinitions": [ > { > "port": 10006, > "protocol": "tcp", > "labels": {} > } > ] > } > {code} > Standard Container: > {code} > root@ip-10-0-0-19:/# exit > exit > core@ip-10-0-0-19 ~ $ docker exec -it c85433062e78 bash > root@ip-10-0-0-19:/# df -h > Filesystem Size Used Avail Use% Mounted on > overlay 37G 2.0G 33G 6% / > tmpfs 7.4G 0 7.4G 0% /dev > tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup > /dev/xvdb37G 2.0G 33G 6% /etc/hostname > shm 64M 0 64M 0% /dev/shm > {code} > How can this be done on mesos containerizer? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7474) Mesos Fetcher Cache Doesn't Retry when Missed
Miguel Bernadin created MESOS-7474: -- Summary: Mesos Fetcher Cache Doesn't Retry when Missed Key: MESOS-7474 URL: https://issues.apache.org/jira/browse/MESOS-7474 Project: Mesos Issue Type: Bug Components: fetcher Affects Versions: 1.2.0 Reporter: Miguel Bernadin Assignee: Joseph Wu Mesos Fetcher doesn't retry when a cache is missed. It needs to have the ability to pull from source when it fails. 421 15:52:53.022902 32751 fetcher.cpp:498] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/","items":[\{"action":"RETRIEVE_FROM_CACHE","cache_filename":")","uri":\{"cache":true,"executable":false,"extract":true,"value":"https:\/\/\/"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/"} I0421 15:52:53.024926 32751 fetcher.cpp:409] Fetching URI '"https:\/\/\/" I0421 15:52:53.024942 32751 fetcher.cpp:306] Fetching from cache I0421 15:52:53.024958 32751 fetcher.cpp:84] Extracting with command: tar -C "\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/' -xf '/tmp/mesos/fetch/slaves/f3feeab8-a2fe-4ac1-afeb-ec7bd4ce7b0d-S29/c1-docker-hub.tar.gz' tar: /"https:\/\/\/": Cannot open: No such file or directory tar: Error is not recoverable: exiting now Failed to fetch '"https:\/\/\/"': Failed to extract: command tar -C '"\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/' -xf '/tmp/mesos/fetch/slaves/"' exited with status: 512 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-9233) Mesos Fetcher failes for windows with linux tars
Miguel Bernadin created MESOS-9233: -- Summary: Mesos Fetcher failes for windows with linux tars Key: MESOS-9233 URL: https://issues.apache.org/jira/browse/MESOS-9233 Project: Mesos Issue Type: Bug Components: fetcher Reporter: Miguel Bernadin When trying to extract a linux tar on windows, tar fails with the command below: {code} tar xf server-jre-8u162-linux-x64.tar jdk1.8.0_162/man/ja: Can't create '?\\C:\\Users\\andrew\\Downloads\\jdk1.8.0_162\\man\\ja' jdk1.8.0_162/jre/lib/amd64/server/libjsig.so: Can't create '?\\C:\\Users\\andrew\\Downloads\\jdk1.8.0_162\\jre\\lib\\amd64\\server\\libjsig.so' tar.exe: Error exit delayed from previous errors. {code} [~andschwa] has found that someone has attempted to get this to work for windows which should resolve this problem for Mesos. https://github.com/libarchive/libarchive/pull/1030 Marathon app def to reproduce as well: {code:java} { "id": "/sleep", "backoffFactor": 1.15, "backoffSeconds": 1, "cmd": "powershell -c start-sleep 999", "container": { "type": "MESOS", "volumes": [] }, "cpus": 0.1, "disk": 0, "fetch": [ { "uri": https://downloads.mesosphere.com/java/server-jre-8u162-linux-x64.tar.gz";, "extract": true, "executable": false, "cache": false } ], "instances": 1, "maxLaunchDelaySeconds": 3600, "mem": 128, "gpus": 0, "networks": [ { "mode": "host" } ], "portDefinitions": [], "requirePorts": false, "upgradeStrategy": { "maximumOverCapacity": 1, "minimumHealthCapacity": 1 }, "killSelection": "YOUNGEST_FIRST", "unreachableStrategy": { "inactiveAfterSeconds": 0, "expungeAfterSeconds": 0 }, "healthChecks": [], "constraints": [] }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)