[jira] [Commented] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.
[ https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15635400#comment-15635400 ] Benjamin Mahler commented on MESOS-6544: This will be fixed via MESOS-6545. > MasterMaintenanceTest.InverseOffersFilters is flaky. > > > Key: MESOS-6544 > URL: https://issues.apache.org/jira/browse/MESOS-6544 > Project: Mesos > Issue Type: Bug > Components: technical debt, test >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler > > This test can crash when launching two executors concurrently because the > test containerizer is not thread-safe! (see MESOS-6545). > {noformat} > [...truncated 78174 lines...] > I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master > master@172.17.0.2:58302 > I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 > authenticatee > I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master > I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL > connection > I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable > resources {} from the resource estimator > I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating > slave(150)@172.17.0.2:58302 > I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication > session for crammd5-authenticatee(357)@172.17.0.2:58302 > I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL > connection > I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL > authentication mechanisms: CRAM-MD5 > I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate > with mechanism 'CRAM-MD5' > I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL > authentication start > I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires > more steps > I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL > authentication step > I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL > authentication step > I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for > user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' > SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false > SASL_AUXPROP_AUTHZID: false > I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property > '*userPassword' > I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property > '*cmusaslsecretCRAM-MD5' > I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for > user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' > SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false > SASL_AUXPROP_AUTHZID: true > I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property > '*userPassword' since SASL_AUXPROP_AUTHZID == true > I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property > '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true > I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success > I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success > I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated > principal 'test-principal' at slave(150)@172.17.0.2:58302 > I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session > cleanup for crammd5-authenticatee(357)@172.17.0.2:58302 > I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with > master master@172.17.0.2:58302 > I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in > 12.590371ms if necessary > I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at > slave(150)@172.17.0.2:58302 (maintenance-host-2) with id > 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1 > I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in > 94467ns; attempting to update the registry > I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in > 36.501523ms if necessary > I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message > from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already > in progress > I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 48.099208ms > I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at > position 4 > I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in > 26.127711ms if necessary > I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message > from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already > in progress
[jira] [Assigned] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.
[ https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-6544: -- Assignee: Benjamin Mahler > MasterMaintenanceTest.InverseOffersFilters is flaky. > > > Key: MESOS-6544 > URL: https://issues.apache.org/jira/browse/MESOS-6544 > Project: Mesos > Issue Type: Bug > Components: technical debt, test >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler > > This test can crash when launching two executors concurrently because the > test containerizer is not thread-safe! (see MESOS-6545). > {noformat} > [...truncated 78174 lines...] > I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master > master@172.17.0.2:58302 > I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 > authenticatee > I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master > I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL > connection > I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable > resources {} from the resource estimator > I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating > slave(150)@172.17.0.2:58302 > I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication > session for crammd5-authenticatee(357)@172.17.0.2:58302 > I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL > connection > I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL > authentication mechanisms: CRAM-MD5 > I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate > with mechanism 'CRAM-MD5' > I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL > authentication start > I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires > more steps > I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL > authentication step > I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL > authentication step > I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for > user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' > SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false > SASL_AUXPROP_AUTHZID: false > I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property > '*userPassword' > I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property > '*cmusaslsecretCRAM-MD5' > I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for > user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' > SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false > SASL_AUXPROP_AUTHZID: true > I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property > '*userPassword' since SASL_AUXPROP_AUTHZID == true > I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property > '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true > I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success > I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success > I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated > principal 'test-principal' at slave(150)@172.17.0.2:58302 > I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session > cleanup for crammd5-authenticatee(357)@172.17.0.2:58302 > I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with > master master@172.17.0.2:58302 > I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in > 12.590371ms if necessary > I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at > slave(150)@172.17.0.2:58302 (maintenance-host-2) with id > 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1 > I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in > 94467ns; attempting to update the registry > I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in > 36.501523ms if necessary > I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message > from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already > in progress > I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 48.099208ms > I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at > position 4 > I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in > 26.127711ms if necessary > I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message > from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already > in progress > I1103 01:40:55.609695 29097 leveldb.cpp:341] Persisti
[jira] [Assigned] (MESOS-6545) TestContainerizer is not thread-safe.
[ https://issues.apache.org/jira/browse/MESOS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-6545: -- Assignee: Benjamin Mahler > TestContainerizer is not thread-safe. > - > > Key: MESOS-6545 > URL: https://issues.apache.org/jira/browse/MESOS-6545 > Project: Mesos > Issue Type: Bug > Components: technical debt, test >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler > > The TestContainerizer is currently not backed by a Process and does not do > any explicit synchronization and so is not thread safe. > Most tests currently cannot trip the concurrency issues, but this surfaced > recently in MESOS-6544. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6547) Update the mesos containerizer to launch per-container I/O switchboards
[ https://issues.apache.org/jira/browse/MESOS-6547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6547: --- Summary: Update the mesos containerizer to launch per-container I/O switchboards (was: Update the mesos containerizer to launch the per-container I/O switchboard) > Update the mesos containerizer to launch per-container I/O switchboards > --- > > Key: MESOS-6547 > URL: https://issues.apache.org/jira/browse/MESOS-6547 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > With the introduction of the new per-container I/O switchboard component, we > need to update the mesos containerizer to actually launch one for each > container as well as maintain any checkpointed {{pid}} information so it can > reattach to it on {{recovery()}}. > As part of this, we will likely move the existing logger logic inside the I/O > switchboard and have it own the logger going forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6547) Update the mesos containerizer to launch the per-container I/O switchboard
Kevin Klues created MESOS-6547: -- Summary: Update the mesos containerizer to launch the per-container I/O switchboard Key: MESOS-6547 URL: https://issues.apache.org/jira/browse/MESOS-6547 Project: Mesos Issue Type: Task Reporter: Kevin Klues Assignee: Kevin Klues With the introduction of the new per-container I/O switchboard component, we need to update the mesos containerizer to actually launch one for each container as well as maintain any checkpointed {{pid}} information so it can reattach to it on {{recovery()}}. As part of this, we will likely move the existing logger logic inside the I/O switchboard and have it own the logger going forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6546) Update the Containerizer API to include attachInput and attachOutput calls.
Kevin Klues created MESOS-6546: -- Summary: Update the Containerizer API to include attachInput and attachOutput calls. Key: MESOS-6546 URL: https://issues.apache.org/jira/browse/MESOS-6546 Project: Mesos Issue Type: Task Reporter: Kevin Klues Assignee: Kevin Klues With the per-container I/O switchboard we are adding, the containerizer should be responsible for both launching the I/O switchboard process, as well as allowing external components to interface with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6472) Build support for ATTACH_CONTAINER_INPUT into the Agent API in Mesos
[ https://issues.apache.org/jira/browse/MESOS-6472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6472: --- Assignee: Vinod Kone (was: Anand Mazumdar) > Build support for ATTACH_CONTAINER_INPUT into the Agent API in Mesos > > > Key: MESOS-6472 > URL: https://issues.apache.org/jira/browse/MESOS-6472 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Vinod Kone > Labels: debugging, mesosphere > > Coupled with the ATTACH_CONTAINER_OUTPUT call, this call will attach a remote > client to the the input/output of the entrypoint of a container. All > input/output data will be packed into I/O messages and interleaved with > control messages sent between a client and the agent. A single chunked > request will be used to stream messages to the agent over the input stream, > and a single chunked response will be used to stream messages to the client > over the output stream. > This call will integrate with the I/O switchboard to stream data between the > container and the HTTP stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6469) Build an Attach Container Actor
[ https://issues.apache.org/jira/browse/MESOS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634825#comment-15634825 ] Kevin Klues commented on MESOS-6469: We talked through exactly what needs to happen between the HTTP handlers and the I/O switchboard and figured out we can split all of the logic that would have been in the {{AttachContainerActor}} between the containerizer and the HTTP handlers themselves. As designed previously, the HTTP handlers would have been extremely trivial and definitely not taken 5 days to do. Now they will be a bit beefier (implementing the logic we thought would go in the {{AttachContainerActor}}), but we had already allocated ample time for them. > Build an Attach Container Actor > --- > > Key: MESOS-6469 > URL: https://issues.apache.org/jira/browse/MESOS-6469 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > The new agent API calls for ATTACH_CONTAINER_INPUT and > ATTACH_CONTAINER_OUTPUT are intimately intertwined. That is, most attach > operations will likely want to call both ATTACH_CONTAINER_INPUT and > ATTACH_CONTAINER_OUTPUT in order to attach all three of stdin, stdout and > stderr to a local terminal. > Moreover, we plan to allow multiple ATTACH_CONTAINER_OUTPUT calls can be made > for the same container (i.e. from multiple clients), while only one > ATTACH_CONTAINER_INPUT call will be allowed to connect at a time. > In order to ensure that these calls are properly grouped (as well as to > ensure that any state they need to share is properly confined), we will > lazily launch a “per-container” actor to manage all ATTACH_CONTAINER_OUTPUT > and ATTACH_CONTAINER_INPUT calls on behalf of a container. > It will be the responsibility of this actor to: > * Manage the read end of the pipe set up by the HTTP handler for the > ATTACH_CONTAINER_INPUT call for a given container. > * Manage the write end of the pipes set up by the HTTP handler for all > ATTACH_CONTAINER_OUTPUT calls for a given container. > * Establish a connection to a per-container “I/O switchboard” (discussed > below) in order to forward data coming from the ATTACH_CONTAINER_INPUT pipe > to the switchboard. > * Establish a second connection to the per-container “I/O switchboard” to > stream all stdout data coming from the switchboard to all > ATTACH_CONTAINER_OUTPUT pipes. > * Establish a third connection to the per-container “I/O switchboard” to > stream all stderr data coming from the switchboard to all > ATTACH_CONTAINER_OUTPUT pipes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6469) Build an Attach Container Actor
[ https://issues.apache.org/jira/browse/MESOS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6469: --- Story Points: 0 (was: 8) > Build an Attach Container Actor > --- > > Key: MESOS-6469 > URL: https://issues.apache.org/jira/browse/MESOS-6469 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > The new agent API calls for ATTACH_CONTAINER_INPUT and > ATTACH_CONTAINER_OUTPUT are intimately intertwined. That is, most attach > operations will likely want to call both ATTACH_CONTAINER_INPUT and > ATTACH_CONTAINER_OUTPUT in order to attach all three of stdin, stdout and > stderr to a local terminal. > Moreover, we plan to allow multiple ATTACH_CONTAINER_OUTPUT calls can be made > for the same container (i.e. from multiple clients), while only one > ATTACH_CONTAINER_INPUT call will be allowed to connect at a time. > In order to ensure that these calls are properly grouped (as well as to > ensure that any state they need to share is properly confined), we will > lazily launch a “per-container” actor to manage all ATTACH_CONTAINER_OUTPUT > and ATTACH_CONTAINER_INPUT calls on behalf of a container. > It will be the responsibility of this actor to: > * Manage the read end of the pipe set up by the HTTP handler for the > ATTACH_CONTAINER_INPUT call for a given container. > * Manage the write end of the pipes set up by the HTTP handler for all > ATTACH_CONTAINER_OUTPUT calls for a given container. > * Establish a connection to a per-container “I/O switchboard” (discussed > below) in order to forward data coming from the ATTACH_CONTAINER_INPUT pipe > to the switchboard. > * Establish a second connection to the per-container “I/O switchboard” to > stream all stdout data coming from the switchboard to all > ATTACH_CONTAINER_OUTPUT pipes. > * Establish a third connection to the per-container “I/O switchboard” to > stream all stderr data coming from the switchboard to all > ATTACH_CONTAINER_OUTPUT pipes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6472) Build support for ATTACH_CONTAINER_INPUT into the Agent API in Mesos
[ https://issues.apache.org/jira/browse/MESOS-6472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6472: --- Description: Coupled with the ATTACH_CONTAINER_OUTPUT call, this call will attach a remote client to the the input/output of the entrypoint of a container. All input/output data will be packed into I/O messages and interleaved with control messages sent between a client and the agent. A single chunked request will be used to stream messages to the agent over the input stream, and a single chunked response will be used to stream messages to the client over the output stream. This call will integrate with the I/O switchboard to stream data between the container and the HTTP stream. was: Coupled with the ATTACH_CONTAINER_OUTPUT call, this call will attach a remote client to the the input/output of the entrypoint of a container. All input/output data will be packed into I/O messages and interleaved with control messages sent between a client and the agent. A single chunked request will be used to stream messages to the agent over the input stream, and a single chunked response will be used to stream messages to the client over the output stream. This call will integrate with the Mesos internal support for "attaching" to an already running container through the new logger interfaces. > Build support for ATTACH_CONTAINER_INPUT into the Agent API in Mesos > > > Key: MESOS-6472 > URL: https://issues.apache.org/jira/browse/MESOS-6472 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Anand Mazumdar > Labels: debugging, mesosphere > > Coupled with the ATTACH_CONTAINER_OUTPUT call, this call will attach a remote > client to the the input/output of the entrypoint of a container. All > input/output data will be packed into I/O messages and interleaved with > control messages sent between a client and the agent. A single chunked > request will be used to stream messages to the agent over the input stream, > and a single chunked response will be used to stream messages to the client > over the output stream. > This call will integrate with the I/O switchboard to stream data between the > container and the HTTP stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6473) Build support for ATTACH_CONTAINER_OUTPUT into the Agent API in Mesos
[ https://issues.apache.org/jira/browse/MESOS-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6473: --- Description: Coupled with the ATTACH_CONTAINER_INPUT call, this call will attach a remote client to the the input/output of the entrypoint of a container. All input/output data will be packed into I/O messages and interleaved with control messages sent between a client and the agent. A single chunked request will be used to stream messages to the agent over the input stream, and a single chunked response will be used to stream messages to the client over the output stream. This call will integrate with the I/O switchboard to stream data between the container and the HTTP stream. was: Coupled with the ATTACH_CONTAINER_INPUT call, this call will attach a remote client to the the input/output of the entrypoint of a container. All input/output data will be packed into I/O messages and interleaved with control messages sent between a client and the agent. A single chunked request will be used to stream messages to the agent over the input stream, and a single chunked response will be used to stream messages to the client over the output stream. This call will integrate with the Mesos internal support for "attaching" to an already running container through the new logger interfaces. > Build support for ATTACH_CONTAINER_OUTPUT into the Agent API in Mesos > - > > Key: MESOS-6473 > URL: https://issues.apache.org/jira/browse/MESOS-6473 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Coupled with the ATTACH_CONTAINER_INPUT call, this call will attach a remote > client to the the input/output of the entrypoint of a container. All > input/output data will be packed into I/O messages and interleaved with > control messages sent between a client and the agent. A single chunked > request will be used to stream messages to the agent over the input stream, > and a single chunked response will be used to stream messages to the client > over the output stream. > This call will integrate with the I/O switchboard to stream data between the > container and the HTTP stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6471) Build support for LAUNCH_NESTED_CONTAINER_SESSION call into the Agent API in Mesos
[ https://issues.apache.org/jira/browse/MESOS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6471: --- Description: This HTTP API call will launch a nested container whose life-cycle is tied to the lifetime of the connection used to make this call. Once the agent receives the request, it will hold onto it until the container runs to completion or there is an error. As it holds onto it, it will stream the {{stdout}} and {{stderr}} of the container over the HTTP connection in a streaming response body. This response will mimic the response body returned by a call to ATTACH_NESTED_CONTAINER. Upon success, the error code of the response will be 200. On error, an appropriate 400 error will be returned. If the connection is ever broken by either the client or the agent, the container will be destroyed. was: This HTTP API call will launch a nested container whose life-cycle is tied to the lifetime of the connection used to make this call. Once the agent receives the request, it will hold onto it until the container runs to completion or there is an error. Upon success, a 200 response will be initiated with an “infinite” chunked response (but no data will ever be sent over this connection). On error, an appropriate 400 error will be returned. If the connection is ever broken by the client, the container will be destroyed. This will likely involve modifications to some existing existing protobuf messages. It will also involve changes to {{launch.cpp}} to satisfy the new namespace requirements. We will create subtickets as we figure out the details for this. > Build support for LAUNCH_NESTED_CONTAINER_SESSION call into the Agent API in > Mesos > -- > > Key: MESOS-6471 > URL: https://issues.apache.org/jira/browse/MESOS-6471 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > This HTTP API call will launch a nested container whose life-cycle is tied to > the lifetime of the connection used to make this call. Once the agent > receives the request, it will hold onto it until the container runs to > completion or there is an error. As it holds onto it, it will stream the > {{stdout}} and {{stderr}} of the container over the HTTP connection in a > streaming response body. This response will mimic the response body returned > by a call to ATTACH_NESTED_CONTAINER. > Upon success, the error code of the response will be 200. On error, an > appropriate 400 error will be returned. If the connection is ever broken by > either the client or the agent, the container will be destroyed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6471) Build support for LAUNCH_NESTED_CONTAINER_SESSION call into the Agent API in Mesos
[ https://issues.apache.org/jira/browse/MESOS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6471: --- Description: This HTTP API call will launch a nested container whose life-cycle is tied to the lifetime of the connection used to make this call. Once the agent receives the request, it will hold onto it until the container runs to completion or there is an error. As it holds onto it, it will stream the {{stdout}} and {{stderr}} of the container over the HTTP connection in a streaming response body. This response will mimic the response body returned by a call to ATTACH_NESTED_CONTAINER. Upon success, the error code of the response will be 200. On error, an appropriate 400 error will be returned. If the connection is ever broken by either the client or the agent, the container will be destroyed. was: This HTTP API call will launch a nested container whose life-cycle is tied to the lifetime of the connection used to make this call. Once the agent receives the request, it will hold onto it until the container runs to completion or there is an error. As it holds onto it, it will stream the {{stdout}} and {{stderr}} of the container over the HTTP connection in a streaming response body. This response will mimic the response body returned by a call to ATTACH_NESTED_CONTAINER. Upon success, the error code of the response will be 200. On error, an appropriate 400 error will be returned. If the connection is ever broken by either the client or the agent, the container will be destroyed. > Build support for LAUNCH_NESTED_CONTAINER_SESSION call into the Agent API in > Mesos > -- > > Key: MESOS-6471 > URL: https://issues.apache.org/jira/browse/MESOS-6471 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > This HTTP API call will launch a nested container whose life-cycle is tied to > the lifetime of the connection used to make this call. Once the agent > receives the request, it will hold onto it until the container runs to > completion or there is an error. As it holds onto it, it will stream the > {{stdout}} and {{stderr}} of the container over the HTTP connection in a > streaming response body. This response will mimic the response body returned > by a call to ATTACH_NESTED_CONTAINER. > Upon success, the error code of the response will be 200. On error, an > appropriate 400 error will be returned. If the connection is ever broken by > either the client or the agent, the container will be destroyed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6528) Container status of a task in a pod is not correct.
[ https://issues.apache.org/jira/browse/MESOS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6528: --- Target Version/s: 1.2.0 Issue Type: Task (was: Bug) > Container status of a task in a pod is not correct. > --- > > Key: MESOS-6528 > URL: https://issues.apache.org/jira/browse/MESOS-6528 > Project: Mesos > Issue Type: Task > Components: containerization, slave >Affects Versions: 1.1.0 >Reporter: Jie Yu >Assignee: Jie Yu > Labels: mesosphere > > Currently, the container status is for the top level executor container. This > is not ideal. Ideally, we should get the container status for the > corresponding nested container and report that with the task status update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6545) TestContainerizer is not thread-safe.
Benjamin Mahler created MESOS-6545: -- Summary: TestContainerizer is not thread-safe. Key: MESOS-6545 URL: https://issues.apache.org/jira/browse/MESOS-6545 Project: Mesos Issue Type: Bug Components: technical debt, test Reporter: Benjamin Mahler The TestContainerizer is currently not backed by a Process and does not do any explicit synchronization and so is not thread safe. Most tests currently cannot trip the concurrency issues, but this surfaced recently in MESOS-6544. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.
[ https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-6544: --- Description: This test can crash when launching two executors concurrently because the test containerizer is not thread-safe! (see MESOS-6545). {noformat} [...truncated 78174 lines...] I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master master@172.17.0.2:58302 I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 authenticatee I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL connection I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable resources {} from the resource estimator I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating slave(150)@172.17.0.2:58302 I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(357)@172.17.0.2:58302 I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL connection I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL authentication start I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires more steps I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL authentication step I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL authentication step I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property '*userPassword' I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated principal 'test-principal' at slave(150)@172.17.0.2:58302 I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(357)@172.17.0.2:58302 I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with master master@172.17.0.2:58302 I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 12.590371ms if necessary I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1 I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 94467ns; attempting to update the registry I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 36.501523ms if necessary I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already in progress I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 48.099208ms I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at position 4 I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 26.127711ms if necessary I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already in progress I1103 01:40:55.609695 29097 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 42.905697ms I1103 01:40:55.609860 29097 leveldb.cpp:399] Deleting ~2 keys from leveldb took 96623ns I1103 01:40:55.609899 29097 replica.cpp:708] Persisted action TRUNCATE at position 4 I1103 01:40:55.611063 29106 log.cpp:577] Attempting to append 513 bytes to the log I1103 01:40:55.611229 29097 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 5 I1103 01:40:55.611498 29100 slave.cpp:1483] Will retry registration in 85.55417ms if necessary I1103 01:40:55.612069 29105 master.cpp:5139] Ignoring registe
[jira] [Created] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.
Benjamin Mahler created MESOS-6544: -- Summary: MasterMaintenanceTest.InverseOffersFilters is flaky. Key: MESOS-6544 URL: https://issues.apache.org/jira/browse/MESOS-6544 Project: Mesos Issue Type: Bug Components: technical debt, test Reporter: Benjamin Mahler This test can crash when launching two executors concurrently because the test containerizer is not thread-safe! (see MESOS-). {noformat} [...truncated 78174 lines...] I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master master@172.17.0.2:58302 I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 authenticatee I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL connection I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable resources {} from the resource estimator I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating slave(150)@172.17.0.2:58302 I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(357)@172.17.0.2:58302 I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL connection I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL authentication start I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires more steps I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL authentication step I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL authentication step I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property '*userPassword' I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated principal 'test-principal' at slave(150)@172.17.0.2:58302 I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(357)@172.17.0.2:58302 I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with master master@172.17.0.2:58302 I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 12.590371ms if necessary I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1 I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 94467ns; attempting to update the registry I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 36.501523ms if necessary I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already in progress I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 48.099208ms I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at position 4 I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 26.127711ms if necessary I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already in progress I1103 01:40:55.609695 29097 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 42.905697ms I1103 01:40:55.609860 29097 leveldb.cpp:399] Deleting ~2 keys from leveldb took 96623ns I1103 01:40:55.609899 29097 replica.cpp:708] Persisted action TRUNCATE at position 4 I1103 01:40:55.611063 29106 log.cpp:577] Attempting to append 513 bytes to the log I1103 01:40:55.611229 29097 coordinator.cpp:348] Coordinator attempting to write APPEND acti
[jira] [Assigned] (MESOS-6520) Make errno an explicit argument for ErrnoError.
[ https://issues.apache.org/jira/browse/MESOS-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-6520: -- Assignee: James Peach > Make errno an explicit argument for ErrnoError. > --- > > Key: MESOS-6520 > URL: https://issues.apache.org/jira/browse/MESOS-6520 > Project: Mesos > Issue Type: Bug > Components: technical debt >Reporter: James Peach >Assignee: James Peach >Priority: Minor > > Make {{errno}} an explicit argument to {{ErrnoError}}. Right now, the > constructor to {{ErrnoError}} references {{errno}} directly, which makes it > awkward to pass a custom {{errno}} value (you have to set {{errno}} globally). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2437) SlaveTest.CommandExecutorWithOverride segfault on OSX
[ https://issues.apache.org/jira/browse/MESOS-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634717#comment-15634717 ] Benjamin Mahler commented on MESOS-2437: [~kaysoky] can this be closed now? > SlaveTest.CommandExecutorWithOverride segfault on OSX > - > > Key: MESOS-2437 > URL: https://issues.apache.org/jira/browse/MESOS-2437 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Jie Yu > > OSX 10.8.5 > gcc-4.8 > {noformat} > [ RUN ] SlaveTest.CommandExecutorWithOverride > 2015-03-03 > 13:55:27,650:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server > refused to accept the client > 2015-03-03 > 13:55:30,983:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server > refused to accept the client > 2015-03-03 > 13:55:34,318:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server > refused to accept the client > 2015-03-03 > 13:55:37,651:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server > refused to accept the client > 2015-03-03 > 13:55:40,985:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server > refused to accept the client > ../../../../mesos/src/tests/slave_tests.cpp:367: Failure > Failed to wait 15secs for status1 > ../../../../mesos/src/tests/slave_tests.cpp:352: Failure > Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(_, > _))... > Expected: to be called twice >Actual: never called - unsatisfied and active > F0303 13:55:42.214505 2103935360 logging.cpp:57] RAW: Pure virtual method > called > *** Aborted at 1425419742 (unix time) try "date -d @1425419742" if you are > using GNU date *** > PC: @ 0xfde4858b097e (unknown) > @0x10a84cdbb google::LogMessage::Fail() > *** SIGSEGV (@0xfde4858b097e) received by PID 33696 (TID 0x110e1) > stack trace: *** > @0x10a852598 google::RawLog__() > @0x111629f87 os::Bsd::chained_handler() > @0x109f02507 __cxa_pure_virtual > @0x11162d422 JVM_handle_bsd_signal > @0x106663cee mesos::internal::tests::Cluster::Slaves::shutdown() > @ 0x7fff8ffe990a _sigtramp > @0x0 (unknown) > @0x106b8cc6e mesos::internal::tests::MesosTest::ShutdownSlaves() > @0x10a1801e3 > _ZZN7process8dispatchIN5mesos8internal5slave22ResourceMonitorProcessERKNS1_11ContainerIDERK8DurationS5_S8_EEvRKNS_3PIDIT_EEMSC_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESN_ > @0x106b8cc32 mesos::internal::tests::MesosTest::Shutdown() > @0x10a18a5bc > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave22ResourceMonitorProcessERKNS5_11ContainerIDERK8DurationS9_SC_EEvRKNS0_3PIDIT_EEMSG_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @0x106b8a259 mesos::internal::tests::MesosTest::TearDown() > @0x10a7d1177 std::function<>::operator()() > @0x106e668b0 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x10a7b70f1 process::ProcessBase::visit() > @0x106e619dc > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x10a7bbf76 process::DispatchEvent::visit() > @0x106e4927e testing::Test::Run() > @0x10696bd6e process::ProcessBase::serve() > @0x106e49c58 testing::TestInfo::Run() > @0x10a7b3987 process::ProcessManager::resume() > @0x106e4a3d8 testing::TestCase::Run() > @0x10a7a8b68 process::schedule() > @0x106e4f462 testing::internal::UnitTestImpl::RunAllTests() > @ 0x7fff8fffb772 _pthread_start > @0x106e67721 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x7fff8ffe81a1 thread_start > Killed: 9 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6142) Frameworks may RESERVE for an arbitrary role.
[ https://issues.apache.org/jira/browse/MESOS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-6142: --- Target Version/s: 0.28.3, 1.1.1, 1.2.0, 1.0.3 (was: 1.2.0) > Frameworks may RESERVE for an arbitrary role. > - > > Key: MESOS-6142 > URL: https://issues.apache.org/jira/browse/MESOS-6142 > Project: Mesos > Issue Type: Bug > Components: allocation, master >Affects Versions: 1.0.0, 1.1.0 >Reporter: Alexander Rukletsov >Assignee: Gastón Kleiman >Priority: Critical > Labels: mesosphere, reservations > > The master does not validate that resources from a reservation request have > the same role the framework is registered with. As a result, frameworks may > reserve resources for arbitrary roles. > I've modified the role in [the {{ReserveThenUnreserve}} > test|https://github.com/apache/mesos/blob/bca600cf5602ed8227d91af9f73d689da14ad786/src/tests/reservation_tests.cpp#L117] > to "yoyo" and observed the following in the test's log: > {noformat} > I0908 18:35:43.379122 2138112 master.cpp:3362] Processing ACCEPT call for > offers: [ dfaf67e6-7c1c-4988-b427-c49842cb7bb7-O0 ] on agent > dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 > (alexr.railnet.train) for framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- > (default) at > scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 > I0908 18:35:43.379170 2138112 master.cpp:3022] Authorizing principal > 'test-principal' to reserve resources 'cpus(yoyo, test-principal):1; > mem(yoyo, test-principal):512' > I0908 18:35:43.379678 2138112 master.cpp:3642] Applying RESERVE operation for > resources cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 from > framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- (default) at > scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 to agent > dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 > (alexr.railnet.train) > I0908 18:35:43.379767 2138112 master.cpp:7341] Sending checkpointed resources > cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 to agent > dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 > (alexr.railnet.train) > I0908 18:35:43.380273 3211264 slave.cpp:2497] Updated checkpointed resources > from to cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 > I0908 18:35:43.380574 2674688 hierarchical.cpp:760] Updated allocation of > framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- on agent > dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 from cpus(*):1; mem(*):512; > disk(*):470841; ports(*):[31000-32000] to ports(*):[31000-32000]; cpus(yoyo, > test-principal):1; disk(*):470841; mem(yoyo, test-principal):512 with RESERVE > operation > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6542) Pull the current "init" process for a container out of the container.
[ https://issues.apache.org/jira/browse/MESOS-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634572#comment-15634572 ] Yan Xu commented on MESOS-6542: --- Should this be optional? i.e., can {{containerizer launch}} serve as a default "init" to ease the burden of users/framework developers? > Pull the current "init" process for a container out of the container. > - > > Key: MESOS-6542 > URL: https://issues.apache.org/jira/browse/MESOS-6542 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues > > Currently the mesos agent is in control of the "init" process launched inside > of a container. However, in order to properly support things like > systemd-in-a-container, we need to allow users to control the init process > that ultimately gets launched. > We will still need to fork a process equivalent to the current "init" > process, but it shouldn't be placed inside the container itself (instead, it > should be the parent process of whatever init process it is directed to > launch). > In order to do this properly, we will need to rework some of the logic in > {{launcher->fork()}} to allow this new parent process to do the namespace > entering / cloning instead of {{launcher->fork()}} itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6364) Support the rest of the cgroups subsystems
[ https://issues.apache.org/jira/browse/MESOS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-6364: -- Summary: Support the rest of the cgroups subsystems (was: Support rest cgroups subsystems) > Support the rest of the cgroups subsystems > -- > > Key: MESOS-6364 > URL: https://issues.apache.org/jira/browse/MESOS-6364 > Project: Mesos > Issue Type: Epic >Reporter: haosdent > > This is a follow up epic to MESOS-4697 to capture further improvements and > changes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6364) Support rest cgroups subsystems
[ https://issues.apache.org/jira/browse/MESOS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-6364: Summary: Support rest cgroups subsystems (was: Improvements to cgroups isolator) > Support rest cgroups subsystems > --- > > Key: MESOS-6364 > URL: https://issues.apache.org/jira/browse/MESOS-6364 > Project: Mesos > Issue Type: Epic >Reporter: haosdent > > This is a follow up epic to MESOS-4697 to capture further improvements and > changes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6077) Added a default (task group) executor.
[ https://issues.apache.org/jira/browse/MESOS-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6077: --- Summary: Added a default (task group) executor. (was: Implement a basic default pod executor.) > Added a default (task group) executor. > -- > > Key: MESOS-6077 > URL: https://issues.apache.org/jira/browse/MESOS-6077 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > Fix For: 1.1.0 > > > We would like to build a basic default pod executor that upon receiving a > {{LAUNCH_GROUP}} event from the agent, sends a {{TASK_RUNNING}} status > update. This would be a good building block for getting to a fully functional > pod based default command executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6014) Added port mapping CNI plugin.
[ https://issues.apache.org/jira/browse/MESOS-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6014: --- Summary: Added port mapping CNI plugin. (was: Create a CNI plugin that provides port mapping functionality for various CNI plugins.) > Added port mapping CNI plugin. > -- > > Key: MESOS-6014 > URL: https://issues.apache.org/jira/browse/MESOS-6014 > Project: Mesos > Issue Type: Epic > Components: containerization > Environment: Linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan >Priority: Blocker > Labels: mesosphere > Fix For: 1.1.0 > > > Currently there is no CNI plugin that supports port mapping. Given that the > unified containerizer is starting to become the de-facto container run time, > having a CNI plugin that provides port mapping is a must have. This is > primarily required for support BRIDGE networking mode, similar to docker > bridge networking that users expect to have when using docker containers. > While the most obvious use case is that of using the port-mapper plugin with > the bridge plugin, the port-mapping functionality itself is generic and > should be usable with any CNI plugin that needs it. > Keeping port-mapping as a CNI plugin gives operators the ability to use the > default port-mapper (CNI plugin) that Mesos provides, or use their own plugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5788) Added JAVA API adapter for seamless transition to new scheduler API.
[ https://issues.apache.org/jira/browse/MESOS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-5788: --- Summary: Added JAVA API adapter for seamless transition to new scheduler API. (was: Consider adding a Java Scheduler Shim/Adapter for the new/old API.) > Added JAVA API adapter for seamless transition to new scheduler API. > > > Key: MESOS-5788 > URL: https://issues.apache.org/jira/browse/MESOS-5788 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > Fix For: 1.1.0 > > > Currently, for existing JAVA based frameworks, moving to try out the new API > can be cumbersome. This change intends to introduce a shim/adapter interface > that makes this easier by allowing to toggle between the old/new API > (driver/new scheduler library) implementation via an environment variable. > This would allow framework developers to transition their older frameworks to > the new API rather seamlessly. > This would look similar to the work done for the executor shim for C++ > (command/docker executor). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4364) Add roles validation code to master
[ https://issues.apache.org/jira/browse/MESOS-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-4364: - Fix Version/s: (was: 0.28.3) > Add roles validation code to master > --- > > Key: MESOS-4364 > URL: https://issues.apache.org/jira/browse/MESOS-4364 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Benjamin Bannier >Assignee: Qian Zhang > Labels: mesosphere > > A {{FrameworkInfo}} can only have one of role or roles. A natural location > for this appears to be under {{validation::operation::validate}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5763) Task stuck in fetching is not cleaned up after --executor_registration_timeout.
[ https://issues.apache.org/jira/browse/MESOS-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5763: -- Target Version/s: 0.28.3 Fix Version/s: (was: 0.27.4) (was: 0.28.3) > Task stuck in fetching is not cleaned up after > --executor_registration_timeout. > --- > > Key: MESOS-5763 > URL: https://issues.apache.org/jira/browse/MESOS-5763 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 0.28.0, 1.0.0 >Reporter: Yan Xu >Assignee: Yan Xu >Priority: Blocker > Fix For: 1.0.0 > > > When the fetching process hangs forever due to reasons such as HDFS issues, > Mesos containerizer would attempt to destroy the container and kill the > executor after {{--executor_registration_timeout}}. However this reliably > fails for us: the executor would be killed by the launcher destroy and the > container would be destroyed but the agent would never find out that the > executor is terminated thus leaving the task in the STAGING state forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6541) Mesos test should mount cgroups_root
[ https://issues.apache.org/jira/browse/MESOS-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634334#comment-15634334 ] Jason Lai commented on MESOS-6541: -- It would be good to call `unshare(2)` with `CLONE_NEWNS` to fork a new mount namespace for the test suite process. In this case, you can do whatever you want with the process' mount table without affecting other processes' view of their root FS. However, during the test runs, it is important to make sure that minimum changes are made to cgroup hierarchies or the changes should be idempotent, since such changes do have side effects across mount namespaces. > Mesos test should mount cgroups_root > > > Key: MESOS-6541 > URL: https://issues.apache.org/jira/browse/MESOS-6541 > Project: Mesos > Issue Type: Bug > Components: cgroups, test >Reporter: Yan Xu > > Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, > mesos tests would fail like this: > {noformat:title=} > [ RUN ] HTTPCommandExecutorTest.TerminateWithACK > F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] > CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux > launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr > eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or > directory > {noformat} > This is because the agent chooses to use {{LinuxLauncher}} based on > availability of the {{freezer}} subsystem alone. However for it to work, one > needs to do the following > {noformat:title=} > mount -t tmpfs cgroup_root /sys/fs/cgroup > {noformat} > in order to make {{/sys/fs/cgroup}} writable. > I have always run the command manually in the past when this failure happens > but this could be baffling especially to new developers. Mesos tests should > just mount it if it's not already done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6391) Command task's sandbox should not be owned by root if it uses container image.
[ https://issues.apache.org/jira/browse/MESOS-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-6391: -- Target Version/s: 1.0.2, 1.1.0 (was: 0.28.3, 1.0.2, 1.1.0) Removing the target version from 0.28.3 since it's not a trivial backport. cc: [~jieyu] > Command task's sandbox should not be owned by root if it uses container image. > -- > > Key: MESOS-6391 > URL: https://issues.apache.org/jira/browse/MESOS-6391 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.2, 1.0.1 >Reporter: Jie Yu >Assignee: Jie Yu >Priority: Blocker > Fix For: 1.0.2, 1.1.0 > > > Currently, if the task defines a container image, the command executor will > be run under root because it needs to perform pivot_root. > That means if the task wants to run under an unprivileged user, the sandbox > of that task will not be writable because it's owned by root. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6525) Add API protos for managing debug containers
[ https://issues.apache.org/jira/browse/MESOS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6525: --- Labels: debugging mesosphere (was: ) > Add API protos for managing debug containers > > > Key: MESOS-6525 > URL: https://issues.apache.org/jira/browse/MESOS-6525 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Vinod Kone > Labels: debugging, mesosphere > > The API calls that we should add are: > LAUNCH_NESTED_CONTAINER_SESSION > ATTACH_CONTAINER_INPUT > ATTACH_CONTAINER_OUTPUT -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6543) Add special case for entering the "mount" namespace of a parent container
Kevin Klues created MESOS-6543: -- Summary: Add special case for entering the "mount" namespace of a parent container Key: MESOS-6543 URL: https://issues.apache.org/jira/browse/MESOS-6543 Project: Mesos Issue Type: Task Reporter: Kevin Klues Assignee: Kevin Klues Currently, tasks launched with the command executor have a hierarchy of processes inside their container that looks as follows: {noformat} | - mesos-containerizer launch | | - mesos-executor | | | - task process {noformat} However, the only pid from this hierarchy of processes that the agent is aware of is the the pid for the top-level {{mesos-containerizer launch}} binary. If all of these binaries were part of the same set of namespaces, then this would be sufficient to discover the namespaces of the {{task process}} (we could simply inspect the namespaces of the {{mesos-containerizer launch}} pid and know they were the same for the {{task process}}. This is true for most of the namespaces that each of these processes exist in. However, the {{mnt}} namespace of the two may differ. That is, the {{mesos-containerizer launch}} binary is always in the same {{mnt}} namespace as the host, while the {{task process}} binary may be in its own {{mnt}} namespace if file system isolation is turned on and it has a new rootfs provisioned for it (e.g. a docker image was provided for it). This has not been a problem until now because we never wanted to simply _enter_ the {{mnt}} namespace of a container before. Even with nested containers for pods, we always create a new {{mnt}} namespace branched off the host {{mnt}} namespace (in order to support the injection of host-mounted volumes). However, with the new debugging support we are adding, we need a way of entering the {{mnt}} namespace of a parent container instead of cloning a new one. Since we only have access to the {{pid}} of the container's init process, we can simply enter all namespaces associated with that pid except the {{mnt}} namespace. For the {{mnt}} namespace, we need to special case it to walk the process hierarchy until we find the first process in a different {{mnt}} namespace and enter that one instead. If none are found, simply enter the {{mnt}} namespace of the "init" process. This is a dirty dirty hack, but should be sufficient for now. Eventually we want to completely eliminate the command executor in favor of the "pod" (i.e. "default") executor, which doesn't have this problem at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6540: --- Description: Right now the agent only knows about the pid of the "init" process forked by {{launcher->fork()}}. However, in order to properly enter the namespaces of a task for a nested container, we actually need the pid of the process that gets launched by the {{containerizer launch}} binary. Using this pid, isolators can properly enter the namespaces of the actual *task* or *executor* launched by the {{containerizer launch}} binary instead of just the namespaces of the "init" process (which may be different). In order to do this properly, we should pull the "init" process out of the container and update was: Right now the agent only knows about the pid of the "init" process forked by {{launcher->fork()}}. However, in order to properly enter the namespaces of a task for a nested container, we actually need the pid of the process that gets launched by the {{containerizer launch}} binary. Using this pid, isolators can properly enter the namespaces of the actual *task* or *executor* launched by the {{containerizer launch}} binary instead of just the namespaces of the "init" process (which may be different). This will involve opening a domain socket with the {{containerizer launch}} binary and passing the translated pid from the forked process back to the agent. We can chieve this by opening the socket on the agent and passing the path to it using {{launchFlags}}. > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > In order to do this properly, we should pull the "init" process out of the > container and update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6541) Mesos test should mount cgroups_root
[ https://issues.apache.org/jira/browse/MESOS-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-6541: -- Component/s: cgroups > Mesos test should mount cgroups_root > > > Key: MESOS-6541 > URL: https://issues.apache.org/jira/browse/MESOS-6541 > Project: Mesos > Issue Type: Bug > Components: cgroups, test >Reporter: Yan Xu > > Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, > mesos tests would fail like this: > {noformat:title=} > [ RUN ] HTTPCommandExecutorTest.TerminateWithACK > F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] > CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux > launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr > eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or > directory > {noformat} > This is because the agent chooses to use {{LinuxLauncher}} based on > availability of the {{freezer}} subsystem alone. However for it to work, one > needs to do the following > {noformat:title=} > mount -t tmpfs cgroup_root /sys/fs/cgroup > {noformat} > in order to make {{/sys/fs/cgroup}} writable. > I have always run the command manually in the past when this failure happens > but this could be baffling especially to new developers. Mesos tests should > just mount it if it's not already done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6542) Pull the current "init" process for a container out of the container.
Kevin Klues created MESOS-6542: -- Summary: Pull the current "init" process for a container out of the container. Key: MESOS-6542 URL: https://issues.apache.org/jira/browse/MESOS-6542 Project: Mesos Issue Type: Task Reporter: Kevin Klues Currently the mesos agent is in control of the "init" process launched inside of a container. However, in order to properly support things like systemd-in-a-container, we need to allow users to control the init process that ultimately gets launched. We will still need to fork a process equivalent to the current "init" process, but it shouldn't be placed inside the container itself (instead, it should be the parent process of whatever init process it is directed to launch). In order to do this properly, we will need to rework some of the logic in {{launcher->fork()}} to allow this new parent process to do the namespace entering / cloning instead of {{launcher->fork()}} itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6541) Mesos test should mount cgroups_root
Yan Xu created MESOS-6541: - Summary: Mesos test should mount cgroups_root Key: MESOS-6541 URL: https://issues.apache.org/jira/browse/MESOS-6541 Project: Mesos Issue Type: Bug Components: test Reporter: Yan Xu Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, mesos tests would fail like this: {noformat:title=} [ RUN ] HTTPCommandExecutorTest.TerminateWithACK F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or directory {noformat} This is because the agent chooses to use {{LinuxLauncher}} based on availability of the {{freezer}} subsystem alone. However for it to work, one needs to do the following {noformat:title=} mount -t tmpfs cgroup_root /sys/fs/cgroup {noformat} in order to make {{/sys/fs/cgroup}} writable. I have always run the command manually in the past when this failure happens but this could be baffling especially to new developers. Mesos tests should just mount it if it's not already done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634165#comment-15634165 ] Kevin Klues commented on MESOS-6540: I'm pretty sure that's not what we said at the end yesterday, but I'm fine with that so long as you are. I'll take this ticket out of this EPIC and create a new one to deal with that. > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634135#comment-15634135 ] Jie Yu commented on MESOS-6540: --- Yeah, i thought we agreed to walk the process tree. I'd like to avoid doing another domain socket to translate the pid because it'll be deprecated eventually. > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2115) Improve recovering Docker containers when slave is contained
[ https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634080#comment-15634080 ] QIHANG CHEN commented on MESOS-2115: Thanks I actually make it work by using https://hub.docker.com/r/mesoscloud/mesos-slave/ with `--docker_mesos_image` since that docker image has docker client embedded . Thanks for your solution ! > Improve recovering Docker containers when slave is contained > > > Key: MESOS-2115 > URL: https://issues.apache.org/jira/browse/MESOS-2115 > Project: Mesos > Issue Type: Epic > Components: docker >Reporter: Timothy Chen >Assignee: Timothy Chen > Labels: docker > Fix For: 0.23.0 > > > Currently when docker containerizer is recovering it checks the checkpointed > executor pids to recover which containers are still running, and remove the > rest of the containers from docker ps that isn't recognized. > This is problematic when the slave itself was in a docker container, as when > the slave container dies all the forked processes are removed as well, so the > checkpointed executor pids are no longer valid. > We have to assume the docker containers might be still running even though > the checkpointed executor pids are not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6357) `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.
[ https://issues.apache.org/jira/browse/MESOS-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634059#comment-15634059 ] Alexander Rukletsov commented on MESOS-6357: Just saw that in the internal CI: {noformat} [17:41:36] : [Step 10/10] [ RUN ] NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit [17:41:36] : [Step 10/10] I1103 17:41:36.569166 23098 containerizer.cpp:201] Using isolation: cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image [17:41:36] : [Step 10/10] I1103 17:41:36.572278 23098 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [17:41:36] : [Step 10/10] I1103 17:41:36.577702 23118 containerizer.cpp:557] Recovering containerizer [17:41:36] : [Step 10/10] I1103 17:41:36.578866 23118 provisioner.cpp:253] Provisioner recovery complete [17:41:36] : [Step 10/10] I1103 17:41:36.579076 23114 containerizer.cpp:940] Starting container 0aedeff2-9704-42a1-a245-dff493c1ae73 for executor 'executor' of framework [17:41:36] : [Step 10/10] I1103 17:41:36.579404 23115 cgroups.cpp:405] Creating cgroup at '/sys/fs/cgroup/cpu,cpuacct/mesos_test_c7d3570c-5d28-43a2-88cc-6a8877d9d527/0aedeff2-9704-42a1-a245-dff493c1ae73' for container 0aedeff2-9704-42a1-a245-dff493c1ae73 [17:41:36] : [Step 10/10] I1103 17:41:36.580543 23112 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 1) for container 0aedeff2-9704-42a1-a245-dff493c1ae73 [17:41:36] : [Step 10/10] I1103 17:41:36.581063 23115 containerizer.cpp:1469] Launching 'mesos-containerizer' with flags '--command="{"arguments":["bash","-c","read key <&29"],"shell":false,"value":"\/bin\/bash"}" --environment="{"MESOS_SANDBOX":"\/mnt\/teamcity\/temp\/buildTmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8"}" --help="false" --pipe_read="29" --pipe_write="32" --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_kML9EY/containers/0aedeff2-9704-42a1-a245-dff493c1ae73" --unshare_namespace_mnt="false" --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8"' [17:41:36] : [Step 10/10] I1103 17:41:36.581171 23112 linux_launcher.cpp:421] Launching container 0aedeff2-9704-42a1-a245-dff493c1ae73 and cloning with namespaces CLONE_NEWNS | CLONE_NEWPID [17:41:36] : [Step 10/10] I1103 17:41:36.584450 23115 containerizer.cpp:1506] Checkpointing container's forked pid 17644 to '/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_bajwqO/meta/slaves/frameworks/executors/executor/runs/0aedeff2-9704-42a1-a245-dff493c1ae73/pids/forked.pid' [17:41:36] : [Step 10/10] I1103 17:41:36.585805 23115 fetcher.cpp:345] Starting to fetch URIs for container: 0aedeff2-9704-42a1-a245-dff493c1ae73, directory: /mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8 [17:41:36] : [Step 10/10] I1103 17:41:36.586761 23118 containerizer.cpp:1674] Starting nested container 0aedeff2-9704-42a1-a245-dff493c1ae73.37cb3ef8-056a-438e-a3ed-f852e51e4197 [17:41:36] : [Step 10/10] I1103 17:41:36.587481 23119 containerizer.cpp:1469] Launching 'mesos-containerizer' with flags '--command="{"shell":true,"value":"sleep 1000"}" --environment="{"MESOS_SANDBOX":"\/mnt\/teamcity\/temp\/buildTmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8\/containers\/37cb3ef8-056a-438e-a3ed-f852e51e4197"}" --help="false" --pipe_read="29" --pipe_write="32" --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_kML9EY/containers/0aedeff2-9704-42a1-a245-dff493c1ae73/containers/37cb3ef8-056a-438e-a3ed-f852e51e4197" --unshare_namespace_mnt="false" --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8/containers/37cb3ef8-056a-438e-a3ed-f852e51e4197"' [17:41:36] : [Step 10/10] I1103 17:41:36.587601 23113 linux_launcher.cpp:421] Launching nested container 0aedeff2-9704-42a1-a245-dff493c1ae73.37cb3ef8-056a-438e-a3ed-f852e51e4197 and cloning with namespaces CLONE_NEWNS | CLONE_NEWPID [17:41:36] : [Step 10/10] Executing pre-exec command '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rs
[jira] [Commented] (MESOS-6496) Support construction of Shared and Owned from managed Derived*
[ https://issues.apache.org/jira/browse/MESOS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634058#comment-15634058 ] Benjamin Bannier commented on MESOS-6496: - Currently a number of tests use raw pointers to objects of mock classes and construct {{Shared}}/{{Owned}} to inject, e.g., {code} MockProvisioner* provisioner = new MockProvisioner(); // ... EXPECT_CALL(*provisioner, destroy(_)) .WillOnce(Return(true)); // ... MesosContainerizerProcess* process = new MesosContainerizerProcess( // ... Shared(provisioner), // ...); {code} (see e.g., https://reviews.apache.org/r/53387/#comment224436, https://reviews.apache.org/r/53387/#comment224429, https://reviews.apache.org/r/53387/#comment224430, or probably many other instances in the code base). This code can leak if test expectations fail before {{provisioner}} is wrapped into a smart ptr. What we would really like to do here is to construct a managed ptr, {{Shared provisioner(new MockProvisioner())}} , access the mocked functions with {{EXPECT_CALL}} which expected a mock class, and then implicitly convert to a {{Shared}} when injecting into the {{MesosContainerizerProcess}} ctr. > Support construction of Shared and Owned from managed Derived* > -- > > Key: MESOS-6496 > URL: https://issues.apache.org/jira/browse/MESOS-6496 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere, tech-debt > > It should be possible to pass a {{Shared}} value to an object that > takes a parameter of type {{Shared}}. Similarly for {{Owned}}. In > general, {{Shared}} should be implicitly convertable to {{Shared}} > iff {{T2*}} is implicitly convertable to {{T1*}}. In C++11, this works > because they define the appropriate conversion constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6496) Support construction of Shared and Owned from managed Derived*
[ https://issues.apache.org/jira/browse/MESOS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-6496: Summary: Support construction of Shared and Owned from managed Derived* (was: Support up-casting of Shared and Owned) > Support construction of Shared and Owned from managed Derived* > -- > > Key: MESOS-6496 > URL: https://issues.apache.org/jira/browse/MESOS-6496 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere, tech-debt > > It should be possible to pass a {{Shared}} value to an object that > takes a parameter of type {{Shared}}. Similarly for {{Owned}}. In > general, {{Shared}} should be implicitly convertable to {{Shared}} > iff {{T2*}} is implicitly convertable to {{T1*}}. In C++11, this works > because they define the appropriate conversion constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633975#comment-15633975 ] Kevin Klues commented on MESOS-6540: I agree with everything you said. For 1, though, I thought we agreed that the "short term workaround" would be to pass the pid of the forked process back to the agent somehow. If not, what other solution are you proposing? The one where we just walk the process tree from the init process and find the first child in a different mount namespace and enter that one? > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6465) Add a task_id -> container_id mapping in state.json
[ https://issues.apache.org/jira/browse/MESOS-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633921#comment-15633921 ] Kevin Klues commented on MESOS-6465: Jie can you comment on how you plan to do this now? > Add a task_id -> container_id mapping in state.json > --- > > Key: MESOS-6465 > URL: https://issues.apache.org/jira/browse/MESOS-6465 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Jie Yu > Labels: debugging, mesosphere > > Currently, there is no way to get the {{container-id}} of a task from hitting > the mesos master alone. You must first hit the master to get the {{task_id > -> agent_id}} and {{task_id -> executor_id}} mappings, then hit the > corresponding agent with {{agent_id}} to get the {{executor_id -> > container_id}} mapping. > It would simplify things alot if the {{container_id}} information was > immediately available in the {{/tasks}} and {{/state}} endpoints of the > master itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633801#comment-15633801 ] Jie Yu commented on MESOS-6540: --- I think there are two problems here we try to solve: 1) Solving the issue for old style command task so that we can find the proper namespace to enter for debugging support 2) Moving containerizer launch to the host namespaces and let the process it exes be the PID 1 For 1), containerizer launch and the executor are in the namespaces (except for mnt namespace), i think we can use a short term work around to solve that because we will eventually deprecate the old style command task. For 2), it a more boarder discussion. that probably means that we need to ns::clone from linux launcher to containerizer launch. That means the Launcher interface should hide the details about how user process are created. It should take a ContainerLaunchInfo in fork and returns a pid that containerizer will checkpoint. The pid will be the pid of the actual user process. The mesos-containerizer launch helper will be a detail to Launcher. If ns::clone is in containerizer launch, then it'll properly send back the translated pid to containerizer launch which can communicate this pid back to agent using a simple pipe. > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6540: --- Comment: was deleted (was: Are you sure? I thought about this a bit more after our conversation yesterday, and it's not clear to me how we could do it with just a pipe. The {{contiainerizer launch}} binary is already cloned into the new pid namespace, so if we just passed the pid back that it forks, it will be the wrong pid from the perspective of the agent. Am Donnerstag, 3. November 2016 schrieb Jie Yu (JIRA) : [ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633681#comment-15633681 ] Jie Yu commented on MESOS-6540: --- aha, ic. It can just be a pipe (rather than a domain socket). checkpoint it. --- by {{launcher->fork()}}. However, in order to properly enter the namespaces of a task for a nested container, we actually need the pid of the process that gets launched by the {{containerizer launch}} binary. *task* or *executor* launched by the {{containerizer launch}} binary instead of just the namespaces of the "init" process (which may be different). launch}} binary and passing the translated pid from the forked process back to the agent. We can chieve this by opening the socket on the agent and passing the path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) ) > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633713#comment-15633713 ] Kevin Klues commented on MESOS-6540: Are you sure? I thought about this a bit more after our conversation yesterday, and it's not clear to me how we could do it with just a pipe. The {{contiainerizer launch}} binary is already cloned into the new pid namespace, so if we just passed the pid back that it forks, it will be the wrong pid from the perspective of the agent. > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633715#comment-15633715 ] Kevin Klues commented on MESOS-6540: Are you sure? I thought about this a bit more after our conversation yesterday, and it's not clear to me how we could do it with just a pipe. The {{contiainerizer launch}} binary is already cloned into the new pid namespace, so if we just passed the pid back that it forks, it will be the wrong pid from the perspective of the agent. Am Donnerstag, 3. November 2016 schrieb Jie Yu (JIRA) : [ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633681#comment-15633681 ] Jie Yu commented on MESOS-6540: --- aha, ic. It can just be a pipe (rather than a domain socket). checkpoint it. --- by {{launcher->fork()}}. However, in order to properly enter the namespaces of a task for a nested container, we actually need the pid of the process that gets launched by the {{containerizer launch}} binary. *task* or *executor* launched by the {{containerizer launch}} binary instead of just the namespaces of the "init" process (which may be different). launch}} binary and passing the translated pid from the forked process back to the agent. We can chieve this by opening the socket on the agent and passing the path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633681#comment-15633681 ] Jie Yu commented on MESOS-6540: --- aha, ic. It can just be a pipe (rather than a domain socket). > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633677#comment-15633677 ] Kevin Klues commented on MESOS-6540: This is what we talked about at lunch yesterday so we can handle entering the proper mount namespace for tasks launched via the command executor. > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6540: --- Description: Right now the agent only knows about the pid of the "init" process forked by {{launcher->fork()}}. However, in order to properly enter the namespaces of a task for a nested container, we actually need the pid of the process that gets launched by the {{containerizer launch}} binary. Using this pid, isolators can properly enter the namespaces of the actual *task* or *executor* launched by the {{containerizer launch}} binary instead of just the namespaces of the "init" process (which may be different). This will involve opening a domain socket with the {{containerizer launch}} binary and passing the translated pid from the forked process back to the agent. We can chieve this by opening the socket on the agent and passing the path to it using {{launchFlags}}. was: Right now the agent only knows about the pid of the "init" process forked by {{launcher->fork()}}. However, in order to properly enter the namespaces of a task for a nested container, we actually need the pid of the process that gets launched by the {{containerizer launch}} binary. Using this pid, isolators can properly enter the namespaces of the actual *task* or *executor* launched by the {{containerizer launch}} binary instead of just the namespaces of the "init" process (which may be different). > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > This will involve opening a domain socket with the {{containerizer launch}} > binary and passing the translated pid from the forked process back to the > agent. We can chieve this by opening the socket on the agent and passing the > path to it using {{launchFlags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633661#comment-15633661 ] Jie Yu commented on MESOS-6540: --- Currently, containerizer launch is in the same namespace as the task it launches (except for the old style command task). What the context of this ticket? > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
Kevin Klues created MESOS-6540: -- Summary: Pass the forked pid from `containerizer launch` to the agent and checkpoint it. Key: MESOS-6540 URL: https://issues.apache.org/jira/browse/MESOS-6540 Project: Mesos Issue Type: Task Reporter: Kevin Klues Assignee: Kevin Klues Right now the agent only knows about the pid of the "init" process forked by {{launcher->fork()}}. However, in order to properly enter the namespaces of a task for a nested container, we actually need the pid of the process that gets launched by the {{containerizer launch}} binary. Using this pid, isolators can properly enter the namespaces of the actual *task* or *executor* launched by the {{containerizer launch}} binary instead of just the namespaces of the "init" process (which may be different). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2115) Improve recovering Docker containers when slave is contained
[ https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633471#comment-15633471 ] Marc Villacorta edited comment on MESOS-2115 at 11/3/16 5:07 PM: - [~SailC] The docker image you specify in {{--docker_mesos_image}} must have a docker client embedded (not bind-mounted) this image will be used to run the mesos executor. I personally use the same image for the mesos-agent and for the executor. In this [commit|https://github.com/katosys/kato/commit/50b7a82d8c63373b53072be33943cb6ff56a20b5] I switch from docker to rocket and it might be of interest to you because it shows how this can be achieved with both container runtimes. was (Author: h0tbird): [~SailC] The docker image you specify in {{--docker_mesos_image}} must have a docker client embedded (not bind-mounted) this image will be used to run the mesos executor. I personally use the same image for the mesos-agent and for the executor. In this [commit|https://github.com/katosys/kato/commit/50b7a82d8c63373b53072be33943cb6ff56a20b5] I switch from docker to rocker and it might be of interest to you because it shows how this can be achieved with both container runtimes. > Improve recovering Docker containers when slave is contained > > > Key: MESOS-2115 > URL: https://issues.apache.org/jira/browse/MESOS-2115 > Project: Mesos > Issue Type: Epic > Components: docker >Reporter: Timothy Chen >Assignee: Timothy Chen > Labels: docker > Fix For: 0.23.0 > > > Currently when docker containerizer is recovering it checks the checkpointed > executor pids to recover which containers are still running, and remove the > rest of the containers from docker ps that isn't recognized. > This is problematic when the slave itself was in a docker container, as when > the slave container dies all the forked processes are removed as well, so the > checkpointed executor pids are no longer valid. > We have to assume the docker containers might be still running even though > the checkpointed executor pids are not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2115) Improve recovering Docker containers when slave is contained
[ https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633471#comment-15633471 ] Marc Villacorta commented on MESOS-2115: [~SailC] The docker image you specify in {{--docker_mesos_image}} must have a docker client embedded (not bind-mounted) this image will be used to run the mesos executor. I personally use the same image for the mesos-agent and for the executor. In this [commit|https://github.com/katosys/kato/commit/50b7a82d8c63373b53072be33943cb6ff56a20b5] I switch from docker to rocker and it might be of interest to you because it shows how this can be achieved with both container runtimes. > Improve recovering Docker containers when slave is contained > > > Key: MESOS-2115 > URL: https://issues.apache.org/jira/browse/MESOS-2115 > Project: Mesos > Issue Type: Epic > Components: docker >Reporter: Timothy Chen >Assignee: Timothy Chen > Labels: docker > Fix For: 0.23.0 > > > Currently when docker containerizer is recovering it checks the checkpointed > executor pids to recover which containers are still running, and remove the > rest of the containers from docker ps that isn't recognized. > This is problematic when the slave itself was in a docker container, as when > the slave container dies all the forked processes are removed as well, so the > checkpointed executor pids are no longer valid. > We have to assume the docker containers might be still running even though > the checkpointed executor pids are not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6539) Compile warning in GMock: "binding dereferenced null pointer to reference"
Neil Conway created MESOS-6539: -- Summary: Compile warning in GMock: "binding dereferenced null pointer to reference" Key: MESOS-6539 URL: https://issues.apache.org/jira/browse/MESOS-6539 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Neil Conway {noformat} In file included from ../gmock-1.7.0/include/gmock/gmock-actions.h:46: ../gmock-1.7.0/include/gmock/internal/gmock-internal-utils.h:371:7: warning: binding dereferenced null pointer to reference has undefined behavior [-Wnull-dereference] *static_cast::type*>(NULL)); ^~~~ ../gmock-1.7.0/include/gmock/gmock-actions.h:78:22: note: in instantiation of function template specialization 'testing::internal::Invalid >' requested here return internal::Invalid(); ^ ../gmock-1.7.0/include/gmock/gmock-actions.h:190:43: note: in instantiation of member function 'testing::internal::BuiltInDefaultValue >::Get' requested here internal::BuiltInDefaultValue::Get() : *value_; ^ ../gmock-1.7.0/include/gmock/gmock-spec-builders.h:1460:34: note: in instantiation of member function 'testing::DefaultValue >::Get' requested here return DefaultValue::Get(); ^ ../gmock-1.7.0/include/gmock/gmock-spec-builders.h:1350:22: note: in instantiation of member function 'testing::internal::FunctionMockerBase (bool)>::PerformDefaultAction' requested here func_mocker->PerformDefaultAction(args, call_description)); ^ ../gmock-1.7.0/include/gmock/gmock-spec-builders.h:1473:26: note: in instantiation of function template specialization 'testing::internal::ActionResultHolder >::PerformDefaultAction (bool)>' requested here return ResultHolder::PerformDefaultAction(this, args, call_description); ^ ../../../mesos/3rdparty/libprocess/src/tests/process_tests.cpp:152:7: note: in instantiation of member function 'testing::internal::FunctionMockerBase (bool)>::UntypedPerformDefaultAction' requested here class DispatchProcess : public Process ^ In file included from ../../../mesos/3rdparty/libprocess/src/tests/process_tests.cpp:20: {noformat} The code in question has changed in upstream GMock: https://github.com/google/googletest/blob/master/googlemock/include/gmock/internal/gmock-internal-utils.h#L377 So the easiest fix is probably to vendor GMock 1.8.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6390) Ensure Python support scripts are linted
[ https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633248#comment-15633248 ] Kevin Klues commented on MESOS-6390: I would hold off on doing this until after https://reviews.apache.org/r/53074/ lands. Additionally, keep in mind that in order to do this properly, we will have to build a common virtualenv for use by the entire mesos source tree instead of just creating one inside the {{src/new_cli}} directory. > Ensure Python support scripts are linted > > > Key: MESOS-6390 > URL: https://issues.apache.org/jira/browse/MESOS-6390 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Bannier >Assignee: Manuwela Kanade > Labels: newbie, python > > Currently {{support/mesos-style.py}} does not lint files under {{support/}}. > This is mostly due to the fact that these scripts are too inconsistent > style-wise that they wouldn't even pass the linter now. > We should clean up all Python scripts under {{support/}} so they pass the > Python linter, and activate that directory in the linter for future > additions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4709) Enable compiler optimization by default
[ https://issues.apache.org/jira/browse/MESOS-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633204#comment-15633204 ] Neil Conway commented on MESOS-4709: See also this mailing list thread: https://lists.apache.org/thread.html/ded657aebe9e3013114a3af584a064bae39246c14e72338c978aa92f@1455755092@%3Cdev.mesos.apache.org%3E > Enable compiler optimization by default > --- > > Key: MESOS-4709 > URL: https://issues.apache.org/jira/browse/MESOS-4709 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Neil Conway >Assignee: Neil Conway > Labels: autoconf, configure, mesosphere > > At present, Mesos defaults to compiling with "-O0"; to enable compiler > optimizations, the user needs to specify "--enable-optimize" when running > {{configure}}. > We should change the default for the following reasons: > (1) The autoconf default for CFLAGS/CXXFLAGS is "-O2 -g". Anecdotally, > I think most software packages compile with a reasonable level of > optimizations enabled by default. > (2) I think we should make the default configure flags appropriate for > end-users (rather than Mesos developers): developers will be familiar > enough with Mesos to tune the configure flags according to their own > preferences. > (3) The performance consequences of not enabling compiler > optimizations can be pretty severe: 5x in a benchmark I just ran, and > we've seen between 2x and 30x (!) performance differences for some > real-world workloads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error
[ https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-6532: -- Affects Version/s: 1.0.1 > I use mesos container type, I set CommandInfo command set shell cmd, eg: > python a.py "xx xxx", but get error > - > > Key: MESOS-6532 > URL: https://issues.apache.org/jira/browse/MESOS-6532 > Project: Mesos > Issue Type: Bug > Components: c++ api >Affects Versions: 1.0.1 >Reporter: yongyu >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error
[ https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632762#comment-15632762 ] Till Toenshoff commented on MESOS-6532: --- Could you please provide the complete app definition as well as the complete resulting task stderr? Mind that currently it is cut off at {{"value":"bash': syntax error at line 1 near:"}} > I use mesos container type, I set CommandInfo command set shell cmd, eg: > python a.py "xx xxx", but get error > - > > Key: MESOS-6532 > URL: https://issues.apache.org/jira/browse/MESOS-6532 > Project: Mesos > Issue Type: Bug > Components: c++ api >Reporter: yongyu >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error
[ https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629152#comment-15629152 ] Till Toenshoff edited comment on MESOS-6532 at 11/3/16 1:44 PM: task'stderr is : {noformat} I1102 20:56:44.266408 13134 exec.cpp:161] Version: 1.0.1 I1102 20:56:44.270975 13129 exec.cpp:236] Executor registered on agent 52245f11-e42d-4d67-b470-80bc9c4b10c2-S0 Failed to parse the flags: Failed to load flag 'command': Failed to load value '{"environment":{"variables":[{"name":"MARATHON_APP_VERSION","value":"2016-11-02T12:44:14.934Z"},{"name":"HOST","value":"10.191.154.105"},{"name":"MARATHON_APP_RESOURCE_CPUS","value":"1.0"},{"name":"MARATHON_APP_RESOURCE_GPUS","value":"0"},{"name":"PORT_10001","value":"31279"},{"name":"MESOS_TASK_ID","value":"2.dbe78f32-a0fb-11e6-afab-d48564cf107d"},{"name":"PORT","value":"31279"},{"name":"MARATHON_APP_RESOURCE_MEM","value":"128.0"},{"name":"PORTS","value":"31279"},{"name":"MARATHON_APP_RESOURCE_DISK","value":"0.0"},{"name":"MARATHON_APP_LABELS","value":""},{"name":"MARATHON_APP_ID","value":"\/2"},{"name":"PORT0","value":"31279"}]},"shell":true,"value":"bash': syntax error at line 1 near: {noformat} was (Author: 2507697...@qq.com): task'stderr is : I1102 20:56:44.266408 13134 exec.cpp:161] Version: 1.0.1 I1102 20:56:44.270975 13129 exec.cpp:236] Executor registered on agent 52245f11-e42d-4d67-b470-80bc9c4b10c2-S0 Failed to parse the flags: Failed to load flag 'command': Failed to load value '{"environment":{"variables":[{"name":"MARATHON_APP_VERSION","value":"2016-11-02T12:44:14.934Z"},{"name":"HOST","value":"10.191.154.105"},{"name":"MARATHON_APP_RESOURCE_CPUS","value":"1.0"},{"name":"MARATHON_APP_RESOURCE_GPUS","value":"0"},{"name":"PORT_10001","value":"31279"},{"name":"MESOS_TASK_ID","value":"2.dbe78f32-a0fb-11e6-afab-d48564cf107d"},{"name":"PORT","value":"31279"},{"name":"MARATHON_APP_RESOURCE_MEM","value":"128.0"},{"name":"PORTS","value":"31279"},{"name":"MARATHON_APP_RESOURCE_DISK","value":"0.0"},{"name":"MARATHON_APP_LABELS","value":""},{"name":"MARATHON_APP_ID","value":"\/2"},{"name":"PORT0","value":"31279"}]},"shell":true,"value":"bash': syntax error at line 1 near: > I use mesos container type, I set CommandInfo command set shell cmd, eg: > python a.py "xx xxx", but get error > - > > Key: MESOS-6532 > URL: https://issues.apache.org/jira/browse/MESOS-6532 > Project: Mesos > Issue Type: Bug > Components: c++ api >Reporter: yongyu >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5533) Agent fails to start on CentOS 6 due to missing cgroup hierarchy.
[ https://issues.apache.org/jira/browse/MESOS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-5533: --- Priority: Major (was: Critical) > Agent fails to start on CentOS 6 due to missing cgroup hierarchy. > - > > Key: MESOS-5533 > URL: https://issues.apache.org/jira/browse/MESOS-5533 > Project: Mesos > Issue Type: Bug > Components: build, isolation >Reporter: Kapil Arya >Assignee: Jie Yu > Labels: mesosphere > > With the network CNI isolator, agent now _requires_ cgroups to be installed > on the system. Can we add some check(s) to either automatically disable CNI > module if cgroup hierarchies are not available or ask the user to > install/enable cgroup hierarchies. > On CentOS 6, cgroup tools aren't installed by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5533) Agent fails to start on CentOS 6 due to missing cgroup hierarchy.
[ https://issues.apache.org/jira/browse/MESOS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-5533: --- Target Version/s: 1.2.0 > Agent fails to start on CentOS 6 due to missing cgroup hierarchy. > - > > Key: MESOS-5533 > URL: https://issues.apache.org/jira/browse/MESOS-5533 > Project: Mesos > Issue Type: Bug > Components: build, isolation >Reporter: Kapil Arya >Assignee: Jie Yu > Labels: mesosphere > > With the network CNI isolator, agent now _requires_ cgroups to be installed > on the system. Can we add some check(s) to either automatically disable CNI > module if cgroup hierarchies are not available or ask the user to > install/enable cgroup hierarchies. > On CentOS 6, cgroup tools aren't installed by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6142) Frameworks may RESERVE for an arbitrary role.
[ https://issues.apache.org/jira/browse/MESOS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6142: --- Priority: Critical (was: Major) > Frameworks may RESERVE for an arbitrary role. > - > > Key: MESOS-6142 > URL: https://issues.apache.org/jira/browse/MESOS-6142 > Project: Mesos > Issue Type: Bug > Components: allocation, master >Affects Versions: 1.0.0, 1.1.0 >Reporter: Alexander Rukletsov >Assignee: Gastón Kleiman >Priority: Critical > Labels: mesosphere, reservations > > The master does not validate that resources from a reservation request have > the same role the framework is registered with. As a result, frameworks may > reserve resources for arbitrary roles. > I've modified the role in [the {{ReserveThenUnreserve}} > test|https://github.com/apache/mesos/blob/bca600cf5602ed8227d91af9f73d689da14ad786/src/tests/reservation_tests.cpp#L117] > to "yoyo" and observed the following in the test's log: > {noformat} > I0908 18:35:43.379122 2138112 master.cpp:3362] Processing ACCEPT call for > offers: [ dfaf67e6-7c1c-4988-b427-c49842cb7bb7-O0 ] on agent > dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 > (alexr.railnet.train) for framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- > (default) at > scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 > I0908 18:35:43.379170 2138112 master.cpp:3022] Authorizing principal > 'test-principal' to reserve resources 'cpus(yoyo, test-principal):1; > mem(yoyo, test-principal):512' > I0908 18:35:43.379678 2138112 master.cpp:3642] Applying RESERVE operation for > resources cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 from > framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- (default) at > scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 to agent > dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 > (alexr.railnet.train) > I0908 18:35:43.379767 2138112 master.cpp:7341] Sending checkpointed resources > cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 to agent > dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 > (alexr.railnet.train) > I0908 18:35:43.380273 3211264 slave.cpp:2497] Updated checkpointed resources > from to cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 > I0908 18:35:43.380574 2674688 hierarchical.cpp:760] Updated allocation of > framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- on agent > dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 from cpus(*):1; mem(*):512; > disk(*):470841; ports(*):[31000-32000] to ports(*):[31000-32000]; cpus(yoyo, > test-principal):1; disk(*):470841; mem(yoyo, test-principal):512 with RESERVE > operation > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6117) TCP health checks are not supported on Windows.
[ https://issues.apache.org/jira/browse/MESOS-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632627#comment-15632627 ] Alexander Rukletsov commented on MESOS-6117: {noformat} Commit: eb225062a40556250e8825d90d2b16b470d1ec4a [eb22506] Author: Alexander Rukletsov al...@apache.org Date: 2 September 2016 at 18:25:38 GMT+2 Commit Date: 3 November 2016 at 11:34:29 GMT+1 Extracted "curl" binary into HTTP_CHECK_COMMAND constant. Review: https://reviews.apache.org/r/51608 {noformat} > TCP health checks are not supported on Windows. > --- > > Key: MESOS-6117 > URL: https://issues.apache.org/jira/browse/MESOS-6117 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov > Labels: health-check, mesosphere > > Currently, TCP health check is only available on Linux. Windows support > should be added to maintain feature parity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error
[ https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongyu updated MESOS-6532: -- Priority: Critical (was: Major) > I use mesos container type, I set CommandInfo command set shell cmd, eg: > python a.py "xx xxx", but get error > - > > Key: MESOS-6532 > URL: https://issues.apache.org/jira/browse/MESOS-6532 > Project: Mesos > Issue Type: Bug > Components: c++ api >Reporter: yongyu >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.
[ https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629370#comment-15629370 ] Gastón Kleiman edited comment on MESOS-6457 at 11/3/16 9:21 AM: Patches: https://reviews.apache.org/r/53378/ https://reviews.apache.org/r/53406/ https://reviews.apache.org/r/53407/ https://reviews.apache.org/r/53385/ was (Author: gkleiman): Patches: https://reviews.apache.org/r/53378/ https://reviews.apache.org/r/53385/ > Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING. > - > > Key: MESOS-6457 > URL: https://issues.apache.org/jira/browse/MESOS-6457 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.2, 1.0.1 >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman >Priority: Blocker > > A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if > for example it starts/stops passing a health check once it got into the > {{TASK_KILLING}} state. > I think that this behaviour is counterintuitive. It also makes the life of > framework/tools developers harder, since they have to keep track of the > complete task status history in order to know if a task is being killed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6390) Ensure Python support scripts are linted
[ https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manuwela Kanade reassigned MESOS-6390: -- Assignee: Manuwela Kanade > Ensure Python support scripts are linted > > > Key: MESOS-6390 > URL: https://issues.apache.org/jira/browse/MESOS-6390 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Bannier >Assignee: Manuwela Kanade > Labels: newbie, python > > Currently {{support/mesos-style.py}} does not lint files under {{support/}}. > This is mostly due to the fact that these scripts are too inconsistent > style-wise that they wouldn't even pass the linter now. > We should clean up all Python scripts under {{support/}} so they pass the > Python linter, and activate that directory in the linter for future > additions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6390) Ensure Python support scripts are linted
[ https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632101#comment-15632101 ] Manuwela Kanade commented on MESOS-6390: Hi [~bbannier]: I would like to work on this issue. I'll assign it to myself for now. Thanks > Ensure Python support scripts are linted > > > Key: MESOS-6390 > URL: https://issues.apache.org/jira/browse/MESOS-6390 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Bannier > Labels: newbie, python > > Currently {{support/mesos-style.py}} does not lint files under {{support/}}. > This is mostly due to the fact that these scripts are too inconsistent > style-wise that they wouldn't even pass the linter now. > We should clean up all Python scripts under {{support/}} so they pass the > Python linter, and activate that directory in the linter for future > additions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)