[jira] [Commented] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.

2016-11-03 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15635400#comment-15635400
 ] 

Benjamin Mahler commented on MESOS-6544:


This will be fixed via MESOS-6545.

> MasterMaintenanceTest.InverseOffersFilters is flaky.
> 
>
> Key: MESOS-6544
> URL: https://issues.apache.org/jira/browse/MESOS-6544
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> This test can crash when launching two executors concurrently because the 
> test containerizer is not thread-safe! (see MESOS-6545).
> {noformat}
> [...truncated 78174 lines...]
> I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master 
> master@172.17.0.2:58302
> I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 
> authenticatee
> I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master
> I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL 
> connection
> I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable 
> resources {} from the resource estimator
> I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating 
> slave(150)@172.17.0.2:58302
> I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication 
> session for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL 
> connection
> I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL 
> authentication mechanisms: CRAM-MD5
> I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL 
> authentication start
> I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires 
> more steps
> I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL 
> authentication step
> I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL 
> authentication step
> I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false
> I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*userPassword'
> I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true
> I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success
> I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success
> I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated 
> principal 'test-principal' at slave(150)@172.17.0.2:58302
> I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session 
> cleanup for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with 
> master master@172.17.0.2:58302
> I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 
> 12.590371ms if necessary
> I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at 
> slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 
> 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1
> I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 
> 94467ns; attempting to update the registry
> I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 
> 36.501523ms if necessary
> I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 48.099208ms
> I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at 
> position 4
> I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 
> 26.127711ms if necessary
> I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress

[jira] [Assigned] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.

2016-11-03 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-6544:
--

Assignee: Benjamin Mahler

> MasterMaintenanceTest.InverseOffersFilters is flaky.
> 
>
> Key: MESOS-6544
> URL: https://issues.apache.org/jira/browse/MESOS-6544
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> This test can crash when launching two executors concurrently because the 
> test containerizer is not thread-safe! (see MESOS-6545).
> {noformat}
> [...truncated 78174 lines...]
> I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master 
> master@172.17.0.2:58302
> I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 
> authenticatee
> I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master
> I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL 
> connection
> I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable 
> resources {} from the resource estimator
> I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating 
> slave(150)@172.17.0.2:58302
> I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication 
> session for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL 
> connection
> I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL 
> authentication mechanisms: CRAM-MD5
> I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL 
> authentication start
> I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires 
> more steps
> I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL 
> authentication step
> I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL 
> authentication step
> I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false
> I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*userPassword'
> I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for 
> user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true
> I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success
> I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success
> I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated 
> principal 'test-principal' at slave(150)@172.17.0.2:58302
> I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session 
> cleanup for crammd5-authenticatee(357)@172.17.0.2:58302
> I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with 
> master master@172.17.0.2:58302
> I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 
> 12.590371ms if necessary
> I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at 
> slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 
> 3167a687-904b-4b57-bc0f-91b67dc7e41d-S1
> I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 
> 94467ns; attempting to update the registry
> I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 
> 36.501523ms if necessary
> I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 48.099208ms
> I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at 
> position 4
> I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 
> 26.127711ms if necessary
> I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message 
> from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
> in progress
> I1103 01:40:55.609695 29097 leveldb.cpp:341] Persisti

[jira] [Assigned] (MESOS-6545) TestContainerizer is not thread-safe.

2016-11-03 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-6545:
--

Assignee: Benjamin Mahler

> TestContainerizer is not thread-safe.
> -
>
> Key: MESOS-6545
> URL: https://issues.apache.org/jira/browse/MESOS-6545
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>
> The TestContainerizer is currently not backed by a Process and does not do 
> any explicit synchronization and so is not thread safe.
> Most tests currently cannot trip the concurrency issues, but this surfaced 
> recently in MESOS-6544.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6547) Update the mesos containerizer to launch per-container I/O switchboards

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6547:
---
Summary: Update the mesos containerizer to launch per-container I/O 
switchboards  (was: Update the mesos containerizer to launch the per-container 
I/O switchboard)

> Update the mesos containerizer to launch per-container I/O switchboards
> ---
>
> Key: MESOS-6547
> URL: https://issues.apache.org/jira/browse/MESOS-6547
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> With the introduction of the new per-container I/O switchboard component, we 
> need to update the mesos containerizer to actually launch one for each 
> container as well as maintain any checkpointed {{pid}} information so it can 
> reattach to it on {{recovery()}}.
> As part of this, we will likely move the existing logger logic inside the I/O 
> switchboard and have it own the logger going forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6547) Update the mesos containerizer to launch the per-container I/O switchboard

2016-11-03 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6547:
--

 Summary: Update the mesos containerizer to launch the 
per-container I/O switchboard
 Key: MESOS-6547
 URL: https://issues.apache.org/jira/browse/MESOS-6547
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues
Assignee: Kevin Klues


With the introduction of the new per-container I/O switchboard component, we 
need to update the mesos containerizer to actually launch one for each 
container as well as maintain any checkpointed {{pid}} information so it can 
reattach to it on {{recovery()}}.

As part of this, we will likely move the existing logger logic inside the I/O 
switchboard and have it own the logger going forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6546) Update the Containerizer API to include attachInput and attachOutput calls.

2016-11-03 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6546:
--

 Summary: Update the Containerizer API to include attachInput and 
attachOutput calls.
 Key: MESOS-6546
 URL: https://issues.apache.org/jira/browse/MESOS-6546
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues
Assignee: Kevin Klues


With the per-container I/O switchboard we are adding, the containerizer should 
be responsible for both launching the I/O switchboard process, as well as 
allowing external components to interface with it.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6472) Build support for ATTACH_CONTAINER_INPUT into the Agent API in Mesos

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6472:
---
Assignee: Vinod Kone  (was: Anand Mazumdar)

> Build support for ATTACH_CONTAINER_INPUT into the Agent API in Mesos
> 
>
> Key: MESOS-6472
> URL: https://issues.apache.org/jira/browse/MESOS-6472
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Vinod Kone
>  Labels: debugging, mesosphere
>
> Coupled with the ATTACH_CONTAINER_OUTPUT call, this call will attach a remote 
> client to the the input/output of the entrypoint of a container. All 
> input/output data will be packed into I/O messages and interleaved with 
> control messages sent between a client and the agent. A single chunked 
> request will be used to stream messages to the agent over the input stream, 
> and a single chunked response will be used to stream messages to the  client 
> over the output stream.
> This call will integrate with the I/O switchboard to stream data between the 
> container and the HTTP stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6469) Build an Attach Container Actor

2016-11-03 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634825#comment-15634825
 ] 

Kevin Klues commented on MESOS-6469:


We talked through exactly what needs to happen between the HTTP handlers and 
the I/O switchboard and figured out we can split all of the logic that would 
have been in the {{AttachContainerActor}} between the containerizer and the 
HTTP handlers themselves.  As designed previously, the HTTP handlers would have 
been extremely trivial and definitely not taken 5 days to do. Now they will be 
a bit beefier (implementing the logic we thought would go in the 
{{AttachContainerActor}}), but we had already allocated ample time for them.

> Build an Attach Container Actor
> ---
>
> Key: MESOS-6469
> URL: https://issues.apache.org/jira/browse/MESOS-6469
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> The new agent API calls for ATTACH_CONTAINER_INPUT and 
> ATTACH_CONTAINER_OUTPUT are intimately intertwined. That is, most attach 
> operations will likely want to call both ATTACH_CONTAINER_INPUT and 
> ATTACH_CONTAINER_OUTPUT in order to attach all three of stdin, stdout and 
> stderr to a local terminal.
> Moreover, we plan to allow multiple ATTACH_CONTAINER_OUTPUT calls can be made 
> for the same container (i.e. from multiple clients), while only one 
> ATTACH_CONTAINER_INPUT call will be allowed to connect at a time.
> In order to ensure that these calls are properly grouped (as well as to 
> ensure that any state they need to share is properly confined), we will 
> lazily launch a “per-container” actor to manage all ATTACH_CONTAINER_OUTPUT 
> and ATTACH_CONTAINER_INPUT calls on behalf of a container.
> It will be the responsibility of this actor to:
>  * Manage the read end of the pipe set up by the HTTP handler for the 
> ATTACH_CONTAINER_INPUT call for a given container.
>  * Manage the write end of the pipes set up by the HTTP handler for all 
> ATTACH_CONTAINER_OUTPUT calls for a given container.
>  * Establish a connection to a per-container “I/O switchboard” (discussed 
> below) in order to forward data coming from the ATTACH_CONTAINER_INPUT pipe 
> to the switchboard.
>  * Establish a second connection to the per-container “I/O switchboard” to 
> stream all stdout data coming from the switchboard to all 
> ATTACH_CONTAINER_OUTPUT pipes.
>  * Establish a third connection to the per-container “I/O switchboard” to 
> stream all stderr data coming from the switchboard to all 
> ATTACH_CONTAINER_OUTPUT pipes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6469) Build an Attach Container Actor

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6469:
---
Story Points: 0  (was: 8)

> Build an Attach Container Actor
> ---
>
> Key: MESOS-6469
> URL: https://issues.apache.org/jira/browse/MESOS-6469
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> The new agent API calls for ATTACH_CONTAINER_INPUT and 
> ATTACH_CONTAINER_OUTPUT are intimately intertwined. That is, most attach 
> operations will likely want to call both ATTACH_CONTAINER_INPUT and 
> ATTACH_CONTAINER_OUTPUT in order to attach all three of stdin, stdout and 
> stderr to a local terminal.
> Moreover, we plan to allow multiple ATTACH_CONTAINER_OUTPUT calls can be made 
> for the same container (i.e. from multiple clients), while only one 
> ATTACH_CONTAINER_INPUT call will be allowed to connect at a time.
> In order to ensure that these calls are properly grouped (as well as to 
> ensure that any state they need to share is properly confined), we will 
> lazily launch a “per-container” actor to manage all ATTACH_CONTAINER_OUTPUT 
> and ATTACH_CONTAINER_INPUT calls on behalf of a container.
> It will be the responsibility of this actor to:
>  * Manage the read end of the pipe set up by the HTTP handler for the 
> ATTACH_CONTAINER_INPUT call for a given container.
>  * Manage the write end of the pipes set up by the HTTP handler for all 
> ATTACH_CONTAINER_OUTPUT calls for a given container.
>  * Establish a connection to a per-container “I/O switchboard” (discussed 
> below) in order to forward data coming from the ATTACH_CONTAINER_INPUT pipe 
> to the switchboard.
>  * Establish a second connection to the per-container “I/O switchboard” to 
> stream all stdout data coming from the switchboard to all 
> ATTACH_CONTAINER_OUTPUT pipes.
>  * Establish a third connection to the per-container “I/O switchboard” to 
> stream all stderr data coming from the switchboard to all 
> ATTACH_CONTAINER_OUTPUT pipes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6472) Build support for ATTACH_CONTAINER_INPUT into the Agent API in Mesos

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6472:
---
Description: 
Coupled with the ATTACH_CONTAINER_OUTPUT call, this call will attach a remote 
client to the the input/output of the entrypoint of a container. All 
input/output data will be packed into I/O messages and interleaved with control 
messages sent between a client and the agent. A single chunked request will be 
used to stream messages to the agent over the input stream, and a single 
chunked response will be used to stream messages to the  client over the output 
stream.

This call will integrate with the I/O switchboard to stream data between the 
container and the HTTP stream.

  was:
Coupled with the ATTACH_CONTAINER_OUTPUT call, this call will attach a remote 
client to the the input/output of the entrypoint of a container. All 
input/output data will be packed into I/O messages and interleaved with control 
messages sent between a client and the agent. A single chunked request will be 
used to stream messages to the agent over the input stream, and a single 
chunked response will be used to stream messages to the  client over the output 
stream.

This call will integrate with the Mesos internal support for "attaching" to an 
already running container through the new logger interfaces.


> Build support for ATTACH_CONTAINER_INPUT into the Agent API in Mesos
> 
>
> Key: MESOS-6472
> URL: https://issues.apache.org/jira/browse/MESOS-6472
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Anand Mazumdar
>  Labels: debugging, mesosphere
>
> Coupled with the ATTACH_CONTAINER_OUTPUT call, this call will attach a remote 
> client to the the input/output of the entrypoint of a container. All 
> input/output data will be packed into I/O messages and interleaved with 
> control messages sent between a client and the agent. A single chunked 
> request will be used to stream messages to the agent over the input stream, 
> and a single chunked response will be used to stream messages to the  client 
> over the output stream.
> This call will integrate with the I/O switchboard to stream data between the 
> container and the HTTP stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6473) Build support for ATTACH_CONTAINER_OUTPUT into the Agent API in Mesos

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6473:
---
Description: 
Coupled with the ATTACH_CONTAINER_INPUT call, this call will attach a remote 
client to the the input/output of the entrypoint of a container. All 
input/output data will be packed into I/O messages and interleaved with control 
messages sent between a client and the agent. A single chunked request will be 
used to stream messages to the agent over the input stream, and a single 
chunked response will be used to stream messages to the client over the output 
stream.

This call will integrate with the I/O switchboard to stream data between the 
container and the HTTP stream.

  was:
Coupled with the ATTACH_CONTAINER_INPUT call, this call will attach a remote 
client to the the input/output of the entrypoint of a container. All 
input/output data will be packed into I/O messages and interleaved with control 
messages sent between a client and the agent. A single chunked request will be 
used to stream messages to the agent over the input stream, and a single 
chunked response will be used to stream messages to the client over the output 
stream.

This call will integrate with the Mesos internal support for "attaching" to an 
already running container through the new logger interfaces.


> Build support for ATTACH_CONTAINER_OUTPUT into the Agent API in Mesos
> -
>
> Key: MESOS-6473
> URL: https://issues.apache.org/jira/browse/MESOS-6473
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Coupled with the ATTACH_CONTAINER_INPUT call, this call will attach a remote 
> client to the the input/output of the entrypoint of a container. All 
> input/output data will be packed into I/O messages and interleaved with 
> control messages sent between a client and the agent. A single chunked 
> request will be used to stream messages to the agent over the input stream, 
> and a single chunked response will be used to stream messages to the client 
> over the output stream.
> This call will integrate with the I/O switchboard to stream data between the 
> container and the HTTP stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6471) Build support for LAUNCH_NESTED_CONTAINER_SESSION call into the Agent API in Mesos

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6471:
---
Description: 
This HTTP API call will launch a nested container whose life-cycle is tied to 
the lifetime of the connection used to make this call. Once the agent receives 
the request, it will hold onto it until the container runs to completion or 
there is an error. As it holds onto it, it will stream the {{stdout}} and 
{{stderr}} of the container over the HTTP connection in a streaming response 
body. This response will mimic the response body returned by a call to 
ATTACH_NESTED_CONTAINER.

Upon success, the error code of the response will be 200. On error, an 
appropriate 400 error will be returned. If the connection is ever broken by 
either the client or the agent, the container will be destroyed. 


  was:
This HTTP API call will launch a nested container whose life-cycle is tied to 
the lifetime of the connection used to make this call. Once the agent receives 
the request, it will hold onto it until the container runs to completion or 
there is an error. Upon success, a 200 response will be initiated with an 
“infinite” chunked response (but no data will ever be sent over this 
connection). On error, an appropriate 400 error will be returned. If the 
connection is ever broken by the client, the container will be destroyed. 

This will likely involve modifications to some existing existing protobuf 
messages. It will also involve changes to {{launch.cpp}} to satisfy the new 
namespace requirements.

We will create subtickets as we figure out the details for this.



> Build support for LAUNCH_NESTED_CONTAINER_SESSION call into the Agent API in 
> Mesos
> --
>
> Key: MESOS-6471
> URL: https://issues.apache.org/jira/browse/MESOS-6471
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> This HTTP API call will launch a nested container whose life-cycle is tied to 
> the lifetime of the connection used to make this call. Once the agent 
> receives the request, it will hold onto it until the container runs to 
> completion or there is an error. As it holds onto it, it will stream the 
> {{stdout}} and {{stderr}} of the container over the HTTP connection in a 
> streaming response body. This response will mimic the response body returned 
> by a call to ATTACH_NESTED_CONTAINER.
> Upon success, the error code of the response will be 200. On error, an 
> appropriate 400 error will be returned. If the connection is ever broken by 
> either the client or the agent, the container will be destroyed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6471) Build support for LAUNCH_NESTED_CONTAINER_SESSION call into the Agent API in Mesos

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6471:
---
Description: 
This HTTP API call will launch a nested container whose life-cycle is tied to 
the lifetime of the connection used to make this call. Once the agent receives 
the request, it will hold onto it until the container runs to completion or 
there is an error. As it holds onto it, it will stream the {{stdout}} and 
{{stderr}} of the container over the HTTP connection in a streaming response 
body. This response will mimic the response body returned by a call to 
ATTACH_NESTED_CONTAINER.

Upon success, the error code of the response will be 200. On error, an 
appropriate 400 error will be returned. If the connection is ever broken by 
either the client or the agent, the container will be destroyed.


  was:
This HTTP API call will launch a nested container whose life-cycle is tied to 
the lifetime of the connection used to make this call. Once the agent receives 
the request, it will hold onto it until the container runs to completion or 
there is an error. As it holds onto it, it will stream the {{stdout}} and 
{{stderr}} of the container over the HTTP connection in a streaming response 
body. This response will mimic the response body returned by a call to 
ATTACH_NESTED_CONTAINER.

Upon success, the error code of the response will be 200. On error, an 
appropriate 400 error will be returned. If the connection is ever broken by 
either the client or the agent, the container will be destroyed. 



> Build support for LAUNCH_NESTED_CONTAINER_SESSION call into the Agent API in 
> Mesos
> --
>
> Key: MESOS-6471
> URL: https://issues.apache.org/jira/browse/MESOS-6471
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> This HTTP API call will launch a nested container whose life-cycle is tied to 
> the lifetime of the connection used to make this call. Once the agent 
> receives the request, it will hold onto it until the container runs to 
> completion or there is an error. As it holds onto it, it will stream the 
> {{stdout}} and {{stderr}} of the container over the HTTP connection in a 
> streaming response body. This response will mimic the response body returned 
> by a call to ATTACH_NESTED_CONTAINER.
> Upon success, the error code of the response will be 200. On error, an 
> appropriate 400 error will be returned. If the connection is ever broken by 
> either the client or the agent, the container will be destroyed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6528) Container status of a task in a pod is not correct.

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6528:
---
Target Version/s: 1.2.0
  Issue Type: Task  (was: Bug)

> Container status of a task in a pod is not correct.
> ---
>
> Key: MESOS-6528
> URL: https://issues.apache.org/jira/browse/MESOS-6528
> Project: Mesos
>  Issue Type: Task
>  Components: containerization, slave
>Affects Versions: 1.1.0
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
>
> Currently, the container status is for the top level executor container. This 
> is not ideal. Ideally, we should get the container status for the 
> corresponding nested container and report that with the task status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6545) TestContainerizer is not thread-safe.

2016-11-03 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6545:
--

 Summary: TestContainerizer is not thread-safe.
 Key: MESOS-6545
 URL: https://issues.apache.org/jira/browse/MESOS-6545
 Project: Mesos
  Issue Type: Bug
  Components: technical debt, test
Reporter: Benjamin Mahler


The TestContainerizer is currently not backed by a Process and does not do any 
explicit synchronization and so is not thread safe.

Most tests currently cannot trip the concurrency issues, but this surfaced 
recently in MESOS-6544.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.

2016-11-03 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6544:
---
Description: 
This test can crash when launching two executors concurrently because the test 
containerizer is not thread-safe! (see MESOS-6545).

{noformat}
[...truncated 78174 lines...]
I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master 
master@172.17.0.2:58302
I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 authenticatee
I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master
I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL 
connection
I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable resources 
{} from the resource estimator
I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating 
slave(150)@172.17.0.2:58302
I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication 
session for crammd5-authenticatee(357)@172.17.0.2:58302
I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL 
connection
I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL authentication 
mechanisms: CRAM-MD5
I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL authentication 
start
I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires more 
steps
I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL authentication 
step
I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL authentication 
step
I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false
I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property 
'*userPassword'
I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true
I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success
I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success
I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated 
principal 'test-principal' at slave(150)@172.17.0.2:58302
I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session 
cleanup for crammd5-authenticatee(357)@172.17.0.2:58302
I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with 
master master@172.17.0.2:58302
I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 
12.590371ms if necessary
I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at 
slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 
3167a687-904b-4b57-bc0f-91b67dc7e41d-S1
I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 94467ns; 
attempting to update the registry
I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 
36.501523ms if necessary
I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message 
from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
in progress
I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to 
leveldb took 48.099208ms
I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at 
position 4
I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice 
for position 4 from @0.0.0.0:0
I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 
26.127711ms if necessary
I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message 
from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
in progress
I1103 01:40:55.609695 29097 leveldb.cpp:341] Persisting action (18 bytes) to 
leveldb took 42.905697ms
I1103 01:40:55.609860 29097 leveldb.cpp:399] Deleting ~2 keys from leveldb took 
96623ns
I1103 01:40:55.609899 29097 replica.cpp:708] Persisted action TRUNCATE at 
position 4
I1103 01:40:55.611063 29106 log.cpp:577] Attempting to append 513 bytes to the 
log
I1103 01:40:55.611229 29097 coordinator.cpp:348] Coordinator attempting to 
write APPEND action at position 5
I1103 01:40:55.611498 29100 slave.cpp:1483] Will retry registration in 
85.55417ms if necessary
I1103 01:40:55.612069 29105 master.cpp:5139] Ignoring registe

[jira] [Created] (MESOS-6544) MasterMaintenanceTest.InverseOffersFilters is flaky.

2016-11-03 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6544:
--

 Summary: MasterMaintenanceTest.InverseOffersFilters is flaky.
 Key: MESOS-6544
 URL: https://issues.apache.org/jira/browse/MESOS-6544
 Project: Mesos
  Issue Type: Bug
  Components: technical debt, test
Reporter: Benjamin Mahler


This test can crash when launching two executors concurrently because the test 
containerizer is not thread-safe! (see MESOS-).

{noformat}
[...truncated 78174 lines...]
I1103 01:40:55.530350 29098 slave.cpp:974] Authenticating with master 
master@172.17.0.2:58302
I1103 01:40:55.530432 29098 slave.cpp:985] Using default CRAM-MD5 authenticatee
I1103 01:40:55.530627 29098 slave.cpp:947] Detecting new master
I1103 01:40:55.530675 29108 authenticatee.cpp:121] Creating new client SASL 
connection
I1103 01:40:55.530743 29098 slave.cpp:5587] Received oversubscribable resources 
{} from the resource estimator
I1103 01:40:55.530961 29099 master.cpp:6742] Authenticating 
slave(150)@172.17.0.2:58302
I1103 01:40:55.531070 29112 authenticator.cpp:414] Starting authentication 
session for crammd5-authenticatee(357)@172.17.0.2:58302
I1103 01:40:55.531328 29106 authenticator.cpp:98] Creating new server SASL 
connection
I1103 01:40:55.531561 29108 authenticatee.cpp:213] Received SASL authentication 
mechanisms: CRAM-MD5
I1103 01:40:55.531604 29108 authenticatee.cpp:239] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I1103 01:40:55.531713 29101 authenticator.cpp:204] Received SASL authentication 
start
I1103 01:40:55.531805 29101 authenticator.cpp:326] Authentication requires more 
steps
I1103 01:40:55.531921 29108 authenticatee.cpp:259] Received SASL authentication 
step
I1103 01:40:55.532120 29101 authenticator.cpp:232] Received SASL authentication 
step
I1103 01:40:55.532155 29101 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false
I1103 01:40:55.532179 29101 auxprop.cpp:181] Looking up auxiliary property 
'*userPassword'
I1103 01:40:55.532233 29101 auxprop.cpp:181] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I1103 01:40:55.532266 29101 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: '3a1c598ce334' server FQDN: '3a1c598ce334' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true
I1103 01:40:55.532289 29101 auxprop.cpp:131] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I1103 01:40:55.532305 29101 auxprop.cpp:131] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I1103 01:40:55.532335 29101 authenticator.cpp:318] Authentication success
I1103 01:40:55.532413 29110 authenticatee.cpp:299] Authentication success
I1103 01:40:55.532467 29108 master.cpp:6772] Successfully authenticated 
principal 'test-principal' at slave(150)@172.17.0.2:58302
I1103 01:40:55.532536 29111 authenticator.cpp:432] Authentication session 
cleanup for crammd5-authenticatee(357)@172.17.0.2:58302
I1103 01:40:55.532755 29098 slave.cpp:1069] Successfully authenticated with 
master master@172.17.0.2:58302
I1103 01:40:55.532997 29098 slave.cpp:1483] Will retry registration in 
12.590371ms if necessary
I1103 01:40:55.533179 29108 master.cpp:5151] Registering agent at 
slave(150)@172.17.0.2:58302 (maintenance-host-2) with id 
3167a687-904b-4b57-bc0f-91b67dc7e41d-S1
I1103 01:40:55.533572 29112 registrar.cpp:461] Applied 1 operations in 94467ns; 
attempting to update the registry
I1103 01:40:55.546341 29107 slave.cpp:1483] Will retry registration in 
36.501523ms if necessary
I1103 01:40:55.546461 29099 master.cpp:5139] Ignoring register agent message 
from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
in progress
I1103 01:40:55.565403 29097 leveldb.cpp:341] Persisting action (16 bytes) to 
leveldb took 48.099208ms
I1103 01:40:55.565495 29097 replica.cpp:708] Persisted action TRUNCATE at 
position 4
I1103 01:40:55.566788 29097 replica.cpp:691] Replica received learned notice 
for position 4 from @0.0.0.0:0
I1103 01:40:55.583937 29101 slave.cpp:1483] Will retry registration in 
26.127711ms if necessary
I1103 01:40:55.584123 29112 master.cpp:5139] Ignoring register agent message 
from slave(150)@172.17.0.2:58302 (maintenance-host-2) as admission is already 
in progress
I1103 01:40:55.609695 29097 leveldb.cpp:341] Persisting action (18 bytes) to 
leveldb took 42.905697ms
I1103 01:40:55.609860 29097 leveldb.cpp:399] Deleting ~2 keys from leveldb took 
96623ns
I1103 01:40:55.609899 29097 replica.cpp:708] Persisted action TRUNCATE at 
position 4
I1103 01:40:55.611063 29106 log.cpp:577] Attempting to append 513 bytes to the 
log
I1103 01:40:55.611229 29097 coordinator.cpp:348] Coordinator attempting to 
write APPEND acti

[jira] [Assigned] (MESOS-6520) Make errno an explicit argument for ErrnoError.

2016-11-03 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6520:
--

Assignee: James Peach

> Make errno an explicit argument for ErrnoError.
> ---
>
> Key: MESOS-6520
> URL: https://issues.apache.org/jira/browse/MESOS-6520
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Make {{errno}} an explicit argument to {{ErrnoError}}. Right now, the 
> constructor to {{ErrnoError}} references {{errno}} directly, which makes it 
> awkward to pass a custom {{errno}} value (you have to set {{errno}} globally).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2437) SlaveTest.CommandExecutorWithOverride segfault on OSX

2016-11-03 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634717#comment-15634717
 ] 

Benjamin Mahler commented on MESOS-2437:


[~kaysoky] can this be closed now?

> SlaveTest.CommandExecutorWithOverride segfault on OSX
> -
>
> Key: MESOS-2437
> URL: https://issues.apache.org/jira/browse/MESOS-2437
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Jie Yu
>
> OSX 10.8.5
> gcc-4.8
> {noformat}
> [ RUN  ] SlaveTest.CommandExecutorWithOverride
> 2015-03-03 
> 13:55:27,650:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server 
> refused to accept the client
> 2015-03-03 
> 13:55:30,983:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server 
> refused to accept the client
> 2015-03-03 
> 13:55:34,318:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server 
> refused to accept the client
> 2015-03-03 
> 13:55:37,651:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server 
> refused to accept the client
> 2015-03-03 
> 13:55:40,985:33696(0x11cc42000):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:63419] zk retcode=-4, errno=61(Connection refused): server 
> refused to accept the client
> ../../../../mesos/src/tests/slave_tests.cpp:367: Failure
> Failed to wait 15secs for status1
> ../../../../mesos/src/tests/slave_tests.cpp:352: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(_, 
> _))...
>  Expected: to be called twice
>Actual: never called - unsatisfied and active
> F0303 13:55:42.214505 2103935360 logging.cpp:57] RAW: Pure virtual method 
> called
> *** Aborted at 1425419742 (unix time) try "date -d @1425419742" if you are 
> using GNU date ***
> PC: @ 0xfde4858b097e (unknown)
> @0x10a84cdbb  google::LogMessage::Fail()
> *** SIGSEGV (@0xfde4858b097e) received by PID 33696 (TID 0x110e1) 
> stack trace: ***
> @0x10a852598  google::RawLog__()
> @0x111629f87 os::Bsd::chained_handler()
> @0x109f02507  __cxa_pure_virtual
> @0x11162d422 JVM_handle_bsd_signal
> @0x106663cee  mesos::internal::tests::Cluster::Slaves::shutdown()
> @ 0x7fff8ffe990a _sigtramp
> @0x0 (unknown)
> @0x106b8cc6e  mesos::internal::tests::MesosTest::ShutdownSlaves()
> @0x10a1801e3 
> _ZZN7process8dispatchIN5mesos8internal5slave22ResourceMonitorProcessERKNS1_11ContainerIDERK8DurationS5_S8_EEvRKNS_3PIDIT_EEMSC_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESN_
> @0x106b8cc32  mesos::internal::tests::MesosTest::Shutdown()
> @0x10a18a5bc 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave22ResourceMonitorProcessERKNS5_11ContainerIDERK8DurationS9_SC_EEvRKNS0_3PIDIT_EEMSG_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @0x106b8a259  mesos::internal::tests::MesosTest::TearDown()
> @0x10a7d1177 std::function<>::operator()()
> @0x106e668b0  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x10a7b70f1 process::ProcessBase::visit()
> @0x106e619dc  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10a7bbf76 process::DispatchEvent::visit()
> @0x106e4927e  testing::Test::Run()
> @0x10696bd6e process::ProcessBase::serve()
> @0x106e49c58  testing::TestInfo::Run()
> @0x10a7b3987 process::ProcessManager::resume()
> @0x106e4a3d8  testing::TestCase::Run()
> @0x10a7a8b68 process::schedule()
> @0x106e4f462  testing::internal::UnitTestImpl::RunAllTests()
> @ 0x7fff8fffb772 _pthread_start
> @0x106e67721  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0x7fff8ffe81a1 thread_start
> Killed: 9
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6142) Frameworks may RESERVE for an arbitrary role.

2016-11-03 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6142:
---
Target Version/s: 0.28.3, 1.1.1, 1.2.0, 1.0.3  (was: 1.2.0)

> Frameworks may RESERVE for an arbitrary role.
> -
>
> Key: MESOS-6142
> URL: https://issues.apache.org/jira/browse/MESOS-6142
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>Priority: Critical
>  Labels: mesosphere, reservations
>
> The master does not validate that resources from a reservation request have 
> the same role the framework is registered with. As a result, frameworks may 
> reserve resources for arbitrary roles.
> I've modified the role in [the {{ReserveThenUnreserve}} 
> test|https://github.com/apache/mesos/blob/bca600cf5602ed8227d91af9f73d689da14ad786/src/tests/reservation_tests.cpp#L117]
>  to "yoyo" and observed the following in the test's log:
> {noformat}
> I0908 18:35:43.379122 2138112 master.cpp:3362] Processing ACCEPT call for 
> offers: [ dfaf67e6-7c1c-4988-b427-c49842cb7bb7-O0 ] on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train) for framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- 
> (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116
> I0908 18:35:43.379170 2138112 master.cpp:3022] Authorizing principal 
> 'test-principal' to reserve resources 'cpus(yoyo, test-principal):1; 
> mem(yoyo, test-principal):512'
> I0908 18:35:43.379678 2138112 master.cpp:3642] Applying RESERVE operation for 
> resources cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 from 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.379767 2138112 master.cpp:7341] Sending checkpointed resources 
> cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.380273 3211264 slave.cpp:2497] Updated checkpointed resources 
> from  to cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512
> I0908 18:35:43.380574 2674688 hierarchical.cpp:760] Updated allocation of 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 from cpus(*):1; mem(*):512; 
> disk(*):470841; ports(*):[31000-32000] to ports(*):[31000-32000]; cpus(yoyo, 
> test-principal):1; disk(*):470841; mem(yoyo, test-principal):512 with RESERVE 
> operation
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6542) Pull the current "init" process for a container out of the container.

2016-11-03 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634572#comment-15634572
 ] 

Yan Xu commented on MESOS-6542:
---

Should this be optional? i.e., can {{containerizer launch}} serve as a default 
"init" to ease the burden of users/framework developers?

> Pull the current "init" process for a container out of the container.
> -
>
> Key: MESOS-6542
> URL: https://issues.apache.org/jira/browse/MESOS-6542
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>
> Currently the mesos agent is in control of the "init" process launched inside 
> of a container. However, in order to properly support things like 
> systemd-in-a-container, we need to allow users to control the init process 
> that ultimately gets launched.
> We will still need to fork a process equivalent to the current "init" 
> process, but it shouldn't be placed inside the container itself (instead, it 
> should be the parent process of whatever init process it is directed to 
> launch).
> In order to do this properly, we will need to rework some of the logic in 
> {{launcher->fork()}} to allow this new parent process to do the namespace 
> entering / cloning instead of {{launcher->fork()}} itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6364) Support the rest of the cgroups subsystems

2016-11-03 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6364:
--
Summary: Support the rest of the cgroups subsystems  (was: Support rest 
cgroups subsystems)

> Support the rest of the cgroups subsystems
> --
>
> Key: MESOS-6364
> URL: https://issues.apache.org/jira/browse/MESOS-6364
> Project: Mesos
>  Issue Type: Epic
>Reporter: haosdent
>
> This is a follow up epic to MESOS-4697 to capture further improvements and 
> changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6364) Support rest cgroups subsystems

2016-11-03 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6364:

Summary: Support rest cgroups subsystems  (was: Improvements to cgroups 
isolator)

> Support rest cgroups subsystems
> ---
>
> Key: MESOS-6364
> URL: https://issues.apache.org/jira/browse/MESOS-6364
> Project: Mesos
>  Issue Type: Epic
>Reporter: haosdent
>
> This is a follow up epic to MESOS-4697 to capture further improvements and 
> changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6077) Added a default (task group) executor.

2016-11-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6077:
---
Summary: Added a default (task group) executor.  (was: Implement a basic 
default pod executor.)

> Added a default (task group) executor.
> --
>
> Key: MESOS-6077
> URL: https://issues.apache.org/jira/browse/MESOS-6077
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> We would like to build a basic default pod executor that upon receiving a 
> {{LAUNCH_GROUP}} event from the agent, sends a {{TASK_RUNNING}} status 
> update. This would be a good building block for getting to a fully functional 
> pod based default command executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6014) Added port mapping CNI plugin.

2016-11-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6014:
---
Summary: Added port mapping CNI plugin.  (was: Create a CNI plugin that 
provides port mapping functionality for various CNI plugins.)

> Added port mapping CNI plugin.
> --
>
> Key: MESOS-6014
> URL: https://issues.apache.org/jira/browse/MESOS-6014
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Currently there is no CNI plugin that supports port mapping. Given that the 
> unified containerizer is starting to become the de-facto container run time, 
> having  a CNI plugin that provides port mapping is a must have. This is 
> primarily required for support BRIDGE networking mode, similar to docker 
> bridge networking that users expect to have when using docker containers. 
> While the most obvious use case is that of using the port-mapper plugin with 
> the bridge plugin, the port-mapping functionality itself is generic and 
> should be usable with any CNI plugin that needs it.
> Keeping port-mapping as a CNI plugin gives operators the ability to use the 
> default port-mapper (CNI plugin) that Mesos provides, or use their own plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5788) Added JAVA API adapter for seamless transition to new scheduler API.

2016-11-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5788:
---
Summary: Added JAVA API adapter for seamless transition to new scheduler 
API.  (was: Consider adding a Java Scheduler Shim/Adapter for the new/old API.)

> Added JAVA API adapter for seamless transition to new scheduler API.
> 
>
> Key: MESOS-5788
> URL: https://issues.apache.org/jira/browse/MESOS-5788
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Currently, for existing JAVA based frameworks, moving to try out the new API 
> can be cumbersome. This change intends to introduce a shim/adapter interface 
> that makes this easier by allowing to toggle between the old/new API 
> (driver/new scheduler library) implementation via an environment variable. 
> This would allow framework developers to transition their older frameworks to 
> the new API rather seamlessly.
> This would look similar to the work done for the executor shim for C++ 
> (command/docker executor). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4364) Add roles validation code to master

2016-11-03 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4364:
-
Fix Version/s: (was: 0.28.3)

> Add roles validation code to master
> ---
>
> Key: MESOS-4364
> URL: https://issues.apache.org/jira/browse/MESOS-4364
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> A {{FrameworkInfo}} can only have one of role or roles. A natural location 
> for this appears to be under {{validation::operation::validate}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5763) Task stuck in fetching is not cleaned up after --executor_registration_timeout.

2016-11-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5763:
--
Target Version/s: 0.28.3
   Fix Version/s: (was: 0.27.4)
  (was: 0.28.3)

> Task stuck in fetching is not cleaned up after 
> --executor_registration_timeout.
> ---
>
> Key: MESOS-5763
> URL: https://issues.apache.org/jira/browse/MESOS-5763
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.28.0, 1.0.0
>Reporter: Yan Xu
>Assignee: Yan Xu
>Priority: Blocker
> Fix For: 1.0.0
>
>
> When the fetching process hangs forever due to reasons such as HDFS issues, 
> Mesos containerizer would attempt to destroy the container and kill the 
> executor after {{--executor_registration_timeout}}. However this reliably 
> fails for us: the executor would be killed by the launcher destroy and the 
> container would be destroyed but the agent would never find out that the 
> executor is terminated thus leaving the task in the STAGING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6541) Mesos test should mount cgroups_root

2016-11-03 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634334#comment-15634334
 ] 

Jason Lai commented on MESOS-6541:
--

It would be good to call `unshare(2)` with `CLONE_NEWNS` to fork a new mount 
namespace for the test suite process. In this case, you can do whatever you 
want with the process' mount table without affecting other processes' view of 
their root FS.

However, during the test runs, it is important to make sure that minimum 
changes are made to cgroup hierarchies or the changes should be idempotent, 
since such changes do have side effects across mount namespaces.

> Mesos test should mount cgroups_root
> 
>
> Key: MESOS-6541
> URL: https://issues.apache.org/jira/browse/MESOS-6541
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, test
>Reporter: Yan Xu
>
> Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, 
> mesos tests would fail like this:
> {noformat:title=}
> [ RUN  ] HTTPCommandExecutorTest.TerminateWithACK
> F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] 
> CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux 
> launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr
> eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or 
> directory
> {noformat}
> This is because the agent chooses to use {{LinuxLauncher}} based on 
> availability of the {{freezer}} subsystem alone. However for it to work, one 
> needs to do the following
> {noformat:title=}
> mount -t tmpfs cgroup_root /sys/fs/cgroup
> {noformat}
> in order to make  {{/sys/fs/cgroup}} writable. 
> I have always run the command manually in the past when this failure happens 
> but this could be baffling especially to new developers. Mesos tests should 
> just mount it if it's not already done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6391) Command task's sandbox should not be owned by root if it uses container image.

2016-11-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6391:
--
Target Version/s: 1.0.2, 1.1.0  (was: 0.28.3, 1.0.2, 1.1.0)

Removing the target version from 0.28.3 since it's not a trivial backport. cc: 
[~jieyu]

> Command task's sandbox should not be owned by root if it uses container image.
> --
>
> Key: MESOS-6391
> URL: https://issues.apache.org/jira/browse/MESOS-6391
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Blocker
> Fix For: 1.0.2, 1.1.0
>
>
> Currently, if the task defines a container image, the command executor will 
> be run under root because it needs to perform pivot_root.
> That means if the task wants to run under an unprivileged user, the sandbox 
> of that task will not be writable because it's owned by root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6525) Add API protos for managing debug containers

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6525:
---
Labels: debugging mesosphere  (was: )

> Add API protos for managing debug containers
> 
>
> Key: MESOS-6525
> URL: https://issues.apache.org/jira/browse/MESOS-6525
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: debugging, mesosphere
>
> The API calls that we should add are:
> LAUNCH_NESTED_CONTAINER_SESSION
> ATTACH_CONTAINER_INPUT
> ATTACH_CONTAINER_OUTPUT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6543) Add special case for entering the "mount" namespace of a parent container

2016-11-03 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6543:
--

 Summary: Add special case for entering the "mount" namespace of a 
parent container
 Key: MESOS-6543
 URL: https://issues.apache.org/jira/browse/MESOS-6543
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues
Assignee: Kevin Klues


Currently, tasks launched with the command executor have a hierarchy of 
processes inside their container that looks as follows:

{noformat}
| - mesos-containerizer launch
|   | - mesos-executor
|   |   | - task process
{noformat}

However, the only pid from this hierarchy of processes that the agent is aware 
of is the the pid for the top-level {{mesos-containerizer launch}} binary.

If all of these binaries were part of the same set of namespaces, then this 
would be sufficient to discover the namespaces of the {{task process}} (we 
could simply inspect the namespaces of the {{mesos-containerizer launch}} pid 
and know they were the same for the {{task process}}.

This is true for most of the namespaces that each of these processes exist in. 
However, the {{mnt}} namespace of the two may differ. That is, the 
{{mesos-containerizer launch}} binary is always in the same {{mnt}} namespace 
as the host, while the {{task process}} binary may be in its own {{mnt}} 
namespace if file system isolation is turned on and it has a new rootfs 
provisioned for it (e.g. a docker image was provided for it).

This has not been a problem until now because we never wanted to simply _enter_ 
the {{mnt}} namespace of a container before. Even with nested containers for 
pods, we always create a new {{mnt}} namespace branched off the host {{mnt}} 
namespace (in order to support the injection of host-mounted volumes).

However, with the new debugging support we are adding, we need a way of 
entering the {{mnt}} namespace of a parent container instead of cloning a new 
one.

Since we only have access to the {{pid}} of the container's init process, we 
can simply enter all namespaces associated with that pid except the {{mnt}} 
namespace. For the {{mnt}} namespace, we need to special case it to walk the 
process hierarchy until we find the first process in a different {{mnt}} 
namespace and enter that one instead. If none are found, simply enter the 
{{mnt}} namespace of the "init" process.

This is a dirty dirty hack, but should be sufficient for now.

Eventually we want to completely eliminate the command executor in favor of the 
"pod" (i.e. "default") executor, which doesn't have this problem at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6540:
---
Description: 
Right now the agent only knows about the pid of the "init" process forked by 
{{launcher->fork()}}. However, in order to properly enter the namespaces of a 
task for a nested container, we actually need the pid of the process that gets 
launched by the {{containerizer launch}} binary.

Using this pid, isolators can properly enter the namespaces of the actual 
*task* or *executor* launched by the {{containerizer launch}} binary instead of 
just the namespaces of the "init" process (which may be different).

In order to do this properly, we should pull the "init" process out of the 
container and update 

  was:
Right now the agent only knows about the pid of the "init" process forked by 
{{launcher->fork()}}. However, in order to properly enter the namespaces of a 
task for a nested container, we actually need the pid of the process that gets 
launched by the {{containerizer launch}} binary.

Using this pid, isolators can properly enter the namespaces of the actual 
*task* or *executor* launched by the {{containerizer launch}} binary instead of 
just the namespaces of the "init" process (which may be different).

This will involve opening a domain socket with the {{containerizer launch}} 
binary and passing the translated pid from the forked process back to the 
agent.  We can chieve this by opening the socket on the agent and passing the 
path to it using {{launchFlags}}.


> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> In order to do this properly, we should pull the "init" process out of the 
> container and update 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6541) Mesos test should mount cgroups_root

2016-11-03 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-6541:
--
Component/s: cgroups

> Mesos test should mount cgroups_root
> 
>
> Key: MESOS-6541
> URL: https://issues.apache.org/jira/browse/MESOS-6541
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, test
>Reporter: Yan Xu
>
> Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, 
> mesos tests would fail like this:
> {noformat:title=}
> [ RUN  ] HTTPCommandExecutorTest.TerminateWithACK
> F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] 
> CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux 
> launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr
> eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or 
> directory
> {noformat}
> This is because the agent chooses to use {{LinuxLauncher}} based on 
> availability of the {{freezer}} subsystem alone. However for it to work, one 
> needs to do the following
> {noformat:title=}
> mount -t tmpfs cgroup_root /sys/fs/cgroup
> {noformat}
> in order to make  {{/sys/fs/cgroup}} writable. 
> I have always run the command manually in the past when this failure happens 
> but this could be baffling especially to new developers. Mesos tests should 
> just mount it if it's not already done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6542) Pull the current "init" process for a container out of the container.

2016-11-03 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6542:
--

 Summary: Pull the current "init" process for a container out of 
the container.
 Key: MESOS-6542
 URL: https://issues.apache.org/jira/browse/MESOS-6542
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues


Currently the mesos agent is in control of the "init" process launched inside 
of a container. However, in order to properly support things like 
systemd-in-a-container, we need to allow users to control the init process that 
ultimately gets launched.

We will still need to fork a process equivalent to the current "init" process, 
but it shouldn't be placed inside the container itself (instead, it should be 
the parent process of whatever init process it is directed to launch).

In order to do this properly, we will need to rework some of the logic in 
{{launcher->fork()}} to allow this new parent process to do the namespace 
entering / cloning instead of {{launcher->fork()}} itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6541) Mesos test should mount cgroups_root

2016-11-03 Thread Yan Xu (JIRA)
Yan Xu created MESOS-6541:
-

 Summary: Mesos test should mount cgroups_root
 Key: MESOS-6541
 URL: https://issues.apache.org/jira/browse/MESOS-6541
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Yan Xu


Currently on hosts without prior cgroups setup and sysfs is mounted at /sys, 
mesos tests would fail like this:

{noformat:title=}
[ RUN  ] HTTPCommandExecutorTest.TerminateWithACK
F1103 19:54:40.807538 439804 command_executor_tests.cpp:236] 
CHECK_SOME(_containerizer): Failed to create launcher: Failed to create Linux 
launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/fr
eezer': Failed to create directory '/sys/fs/cgroup/freezer': No such file or 
directory
{noformat}

This is because the agent chooses to use {{LinuxLauncher}} based on 
availability of the {{freezer}} subsystem alone. However for it to work, one 
needs to do the following

{noformat:title=}
mount -t tmpfs cgroup_root /sys/fs/cgroup
{noformat}

in order to make  {{/sys/fs/cgroup}} writable. 

I have always run the command manually in the past when this failure happens 
but this could be baffling especially to new developers. Mesos tests should 
just mount it if it's not already done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634165#comment-15634165
 ] 

Kevin Klues commented on MESOS-6540:


I'm pretty sure that's not what we said at the end yesterday, but I'm fine with 
that so long as you are. I'll take this ticket out of this EPIC and create a 
new one to deal with that.

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634135#comment-15634135
 ] 

Jie Yu commented on MESOS-6540:
---

Yeah, i thought we agreed to walk the process tree. I'd like to avoid doing 
another domain socket to translate the pid because it'll be deprecated 
eventually.

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2115) Improve recovering Docker containers when slave is contained

2016-11-03 Thread QIHANG CHEN (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634080#comment-15634080
 ] 

QIHANG CHEN commented on MESOS-2115:


Thanks I actually make it work by using 
https://hub.docker.com/r/mesoscloud/mesos-slave/ with `--docker_mesos_image` 
since that docker image has docker client embedded . Thanks for your solution ! 

> Improve recovering Docker containers when slave is contained
> 
>
> Key: MESOS-2115
> URL: https://issues.apache.org/jira/browse/MESOS-2115
> Project: Mesos
>  Issue Type: Epic
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: docker
> Fix For: 0.23.0
>
>
> Currently when docker containerizer is recovering it checks the checkpointed 
> executor pids to recover which containers are still running, and remove the 
> rest of the containers from docker ps that isn't recognized.
> This is problematic when the slave itself was in a docker container, as when 
> the slave container dies all the forked processes are removed as well, so the 
> checkpointed executor pids are no longer valid.
> We have to assume the docker containers might be still running even though 
> the checkpointed executor pids are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6357) `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.

2016-11-03 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634059#comment-15634059
 ] 

Alexander Rukletsov commented on MESOS-6357:


Just saw that in the internal CI:
{noformat}
[17:41:36] : [Step 10/10] [ RUN  ] 
NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit
[17:41:36] : [Step 10/10] I1103 17:41:36.569166 23098 
containerizer.cpp:201] Using isolation: 
cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
[17:41:36] : [Step 10/10] I1103 17:41:36.572278 23098 
linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
[17:41:36] : [Step 10/10] I1103 17:41:36.577702 23118 
containerizer.cpp:557] Recovering containerizer
[17:41:36] : [Step 10/10] I1103 17:41:36.578866 23118 provisioner.cpp:253] 
Provisioner recovery complete
[17:41:36] : [Step 10/10] I1103 17:41:36.579076 23114 
containerizer.cpp:940] Starting container 0aedeff2-9704-42a1-a245-dff493c1ae73 
for executor 'executor' of framework 
[17:41:36] : [Step 10/10] I1103 17:41:36.579404 23115 cgroups.cpp:405] 
Creating cgroup at 
'/sys/fs/cgroup/cpu,cpuacct/mesos_test_c7d3570c-5d28-43a2-88cc-6a8877d9d527/0aedeff2-9704-42a1-a245-dff493c1ae73'
 for container 0aedeff2-9704-42a1-a245-dff493c1ae73
[17:41:36] : [Step 10/10] I1103 17:41:36.580543 23112 cpu.cpp:101] Updated 
'cpu.shares' to 1024 (cpus 1) for container 0aedeff2-9704-42a1-a245-dff493c1ae73
[17:41:36] : [Step 10/10] I1103 17:41:36.581063 23115 
containerizer.cpp:1469] Launching 'mesos-containerizer' with flags 
'--command="{"arguments":["bash","-c","read key 
<&29"],"shell":false,"value":"\/bin\/bash"}" 
--environment="{"MESOS_SANDBOX":"\/mnt\/teamcity\/temp\/buildTmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8"}"
 --help="false" --pipe_read="29" --pipe_write="32" 
--pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
 -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" 
--runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_kML9EY/containers/0aedeff2-9704-42a1-a245-dff493c1ae73"
 --unshare_namespace_mnt="false" 
--working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8"'
[17:41:36] : [Step 10/10] I1103 17:41:36.581171 23112 
linux_launcher.cpp:421] Launching container 
0aedeff2-9704-42a1-a245-dff493c1ae73 and cloning with namespaces CLONE_NEWNS | 
CLONE_NEWPID
[17:41:36] : [Step 10/10] I1103 17:41:36.584450 23115 
containerizer.cpp:1506] Checkpointing container's forked pid 17644 to 
'/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_bajwqO/meta/slaves/frameworks/executors/executor/runs/0aedeff2-9704-42a1-a245-dff493c1ae73/pids/forked.pid'
[17:41:36] : [Step 10/10] I1103 17:41:36.585805 23115 fetcher.cpp:345] 
Starting to fetch URIs for container: 0aedeff2-9704-42a1-a245-dff493c1ae73, 
directory: 
/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8
[17:41:36] : [Step 10/10] I1103 17:41:36.586761 23118 
containerizer.cpp:1674] Starting nested container 
0aedeff2-9704-42a1-a245-dff493c1ae73.37cb3ef8-056a-438e-a3ed-f852e51e4197
[17:41:36] : [Step 10/10] I1103 17:41:36.587481 23119 
containerizer.cpp:1469] Launching 'mesos-containerizer' with flags 
'--command="{"shell":true,"value":"sleep 1000"}" 
--environment="{"MESOS_SANDBOX":"\/mnt\/teamcity\/temp\/buildTmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8\/containers\/37cb3ef8-056a-438e-a3ed-f852e51e4197"}"
 --help="false" --pipe_read="29" --pipe_write="32" 
--pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
 -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" 
--runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_kML9EY/containers/0aedeff2-9704-42a1-a245-dff493c1ae73/containers/37cb3ef8-056a-438e-a3ed-f852e51e4197"
 --unshare_namespace_mnt="false" 
--working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_0X7GV8/containers/37cb3ef8-056a-438e-a3ed-f852e51e4197"'
[17:41:36] : [Step 10/10] I1103 17:41:36.587601 23113 
linux_launcher.cpp:421] Launching nested container 
0aedeff2-9704-42a1-a245-dff493c1ae73.37cb3ef8-056a-438e-a3ed-f852e51e4197 and 
cloning with namespaces CLONE_NEWNS | CLONE_NEWPID
[17:41:36] : [Step 10/10] Executing pre-exec command 
'{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rs

[jira] [Commented] (MESOS-6496) Support construction of Shared and Owned from managed Derived*

2016-11-03 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634058#comment-15634058
 ] 

Benjamin Bannier commented on MESOS-6496:
-

Currently a number of tests use raw pointers to objects of mock classes and 
construct {{Shared}}/{{Owned}} to inject, e.g.,

{code}
MockProvisioner* provisioner = new MockProvisioner();
// ...
EXPECT_CALL(*provisioner, destroy(_))
   .WillOnce(Return(true));
// ...
MesosContainerizerProcess* process = new MesosContainerizerProcess(
// ...
Shared(provisioner),
// ...);
{code}

(see e.g., https://reviews.apache.org/r/53387/#comment224436, 
https://reviews.apache.org/r/53387/#comment224429, 
https://reviews.apache.org/r/53387/#comment224430, or probably many other 
instances in the code base). This code can leak if test expectations fail 
before {{provisioner}} is wrapped into a smart ptr.

What we would really like to do here is to construct a managed ptr, 
{{Shared provisioner(new MockProvisioner())}} , access the 
mocked functions with {{EXPECT_CALL}} which expected a mock class, and then 
implicitly convert to a {{Shared}} when injecting into the 
{{MesosContainerizerProcess}} ctr.

> Support construction of Shared and Owned from managed Derived*
> --
>
> Key: MESOS-6496
> URL: https://issues.apache.org/jira/browse/MESOS-6496
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, tech-debt
>
> It should be possible to pass a {{Shared}} value to an object that 
> takes a parameter of type {{Shared}}. Similarly for {{Owned}}. In 
> general, {{Shared}} should be implicitly convertable to {{Shared}} 
> iff {{T2*}} is implicitly convertable to {{T1*}}. In C++11, this works 
> because they define the appropriate conversion constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6496) Support construction of Shared and Owned from managed Derived*

2016-11-03 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6496:

Summary: Support construction of Shared and Owned from managed 
Derived*  (was: Support up-casting of Shared and Owned)

> Support construction of Shared and Owned from managed Derived*
> --
>
> Key: MESOS-6496
> URL: https://issues.apache.org/jira/browse/MESOS-6496
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, tech-debt
>
> It should be possible to pass a {{Shared}} value to an object that 
> takes a parameter of type {{Shared}}. Similarly for {{Owned}}. In 
> general, {{Shared}} should be implicitly convertable to {{Shared}} 
> iff {{T2*}} is implicitly convertable to {{T1*}}. In C++11, this works 
> because they define the appropriate conversion constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633975#comment-15633975
 ] 

Kevin Klues commented on MESOS-6540:


I agree with everything you said. 

For 1, though, I thought we agreed that the "short term workaround" would be to 
pass the pid of the forked process back to the agent somehow.  If not, what 
other solution are you proposing? The one where we just walk the process tree 
from the init process and find the first child in a different mount namespace 
and enter that one?

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6465) Add a task_id -> container_id mapping in state.json

2016-11-03 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633921#comment-15633921
 ] 

Kevin Klues commented on MESOS-6465:


Jie can you comment on how you plan to do this now?

> Add a task_id -> container_id mapping in state.json
> ---
>
> Key: MESOS-6465
> URL: https://issues.apache.org/jira/browse/MESOS-6465
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Jie Yu
>  Labels: debugging, mesosphere
>
> Currently, there is no way to get the {{container-id}} of a task from hitting 
> the mesos master alone.  You must first hit the master to get the {{task_id 
> -> agent_id}} and {{task_id -> executor_id}} mappings, then hit the 
> corresponding agent with {{agent_id}} to get the {{executor_id -> 
> container_id}} mapping.
> It would simplify things alot if the {{container_id}} information was 
> immediately available in the {{/tasks}} and {{/state}} endpoints of the 
> master itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633801#comment-15633801
 ] 

Jie Yu commented on MESOS-6540:
---

I think there are two problems here we try to solve:

1) Solving the issue for old style command task so that we can find the proper 
namespace to enter for debugging support
2) Moving containerizer launch to the host namespaces and let the process it 
exes be the PID 1

For 1), containerizer launch and the executor are in the namespaces (except for 
mnt namespace), i think we can use a short term work around to solve that 
because we will eventually deprecate the old style command task.
For 2), it a more boarder discussion. that probably means that we need to 
ns::clone from linux launcher to containerizer launch. That means the Launcher 
interface should hide the details about how user process are created. It should 
take a ContainerLaunchInfo in fork and returns a pid that containerizer will 
checkpoint. The pid will be the pid of the actual user process. The 
mesos-containerizer launch helper will be a detail to Launcher. If ns::clone is 
in containerizer launch, then it'll properly send back the translated pid to 
containerizer launch which can communicate this pid back to agent using a 
simple pipe.

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6540:
---
Comment: was deleted

(was: Are you sure? I thought about this a bit more after our conversation
yesterday, and it's not clear to me how we could do it with just a pipe.
The {{contiainerizer launch}} binary is already cloned into the new pid
namespace, so if we just passed the pid back that it forks, it will be the
wrong pid from the perspective of the agent.

Am Donnerstag, 3. November 2016 schrieb Jie Yu (JIRA) :


[
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633681#comment-15633681
]

Jie Yu commented on MESOS-6540:
---

aha, ic. It can just be a pipe (rather than a domain socket).

checkpoint it.
---
by {{launcher->fork()}}. However, in order to properly enter the namespaces
of a task for a nested container, we actually need the pid of the process
that gets launched by the {{containerizer launch}} binary.
*task* or *executor* launched by the {{containerizer launch}} binary
instead of just the namespaces of the "init" process (which may be
different).
launch}} binary and passing the translated pid from the forked process back
to the agent.  We can chieve this by opening the socket on the agent and
passing the path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
)

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633713#comment-15633713
 ] 

Kevin Klues commented on MESOS-6540:


Are you sure? I thought about this a bit more after our conversation yesterday, 
and it's not clear to me how we could do it with just a pipe. The 
{{contiainerizer launch}} binary is already cloned into the new pid namespace, 
so if we just passed the pid back that it forks, it will be the wrong pid from 
the perspective of the agent. 

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633715#comment-15633715
 ] 

Kevin Klues commented on MESOS-6540:


Are you sure? I thought about this a bit more after our conversation
yesterday, and it's not clear to me how we could do it with just a pipe.
The {{contiainerizer launch}} binary is already cloned into the new pid
namespace, so if we just passed the pid back that it forks, it will be the
wrong pid from the perspective of the agent.

Am Donnerstag, 3. November 2016 schrieb Jie Yu (JIRA) :


[
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633681#comment-15633681
]

Jie Yu commented on MESOS-6540:
---

aha, ic. It can just be a pipe (rather than a domain socket).

checkpoint it.
---
by {{launcher->fork()}}. However, in order to properly enter the namespaces
of a task for a nested container, we actually need the pid of the process
that gets launched by the {{containerizer launch}} binary.
*task* or *executor* launched by the {{containerizer launch}} binary
instead of just the namespaces of the "init" process (which may be
different).
launch}} binary and passing the translated pid from the forked process back
to the agent.  We can chieve this by opening the socket on the agent and
passing the path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633681#comment-15633681
 ] 

Jie Yu commented on MESOS-6540:
---

aha, ic. It can just be a pipe (rather than a domain socket).

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633677#comment-15633677
 ] 

Kevin Klues commented on MESOS-6540:


This is what we talked about at lunch yesterday so we can handle entering the 
proper mount namespace for tasks launched via the command executor.

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6540:
---
Description: 
Right now the agent only knows about the pid of the "init" process forked by 
{{launcher->fork()}}. However, in order to properly enter the namespaces of a 
task for a nested container, we actually need the pid of the process that gets 
launched by the {{containerizer launch}} binary.

Using this pid, isolators can properly enter the namespaces of the actual 
*task* or *executor* launched by the {{containerizer launch}} binary instead of 
just the namespaces of the "init" process (which may be different).

This will involve opening a domain socket with the {{containerizer launch}} 
binary and passing the translated pid from the forked process back to the 
agent.  We can chieve this by opening the socket on the agent and passing the 
path to it using {{launchFlags}}.

  was:
Right now the agent only knows about the pid of the "init" process forked by 
{{launcher->fork()}}. However, in order to properly enter the namespaces of a 
task for a nested container, we actually need the pid of the process that gets 
launched by the {{containerizer launch}} binary.

Using this pid, isolators can properly enter the namespaces of the actual 
*task* or *executor* launched by the {{containerizer launch}} binary instead of 
just the namespaces of the "init" process (which may be different).


> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> This will involve opening a domain socket with the {{containerizer launch}} 
> binary and passing the translated pid from the forked process back to the 
> agent.  We can chieve this by opening the socket on the agent and passing the 
> path to it using {{launchFlags}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633661#comment-15633661
 ] 

Jie Yu commented on MESOS-6540:
---

Currently, containerizer launch is in the same namespace as the task it 
launches (except for the old style command task). What the context of this 
ticket?

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2016-11-03 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6540:
--

 Summary: Pass the forked pid from `containerizer launch` to the 
agent and checkpoint it.
 Key: MESOS-6540
 URL: https://issues.apache.org/jira/browse/MESOS-6540
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues
Assignee: Kevin Klues


Right now the agent only knows about the pid of the "init" process forked by 
{{launcher->fork()}}. However, in order to properly enter the namespaces of a 
task for a nested container, we actually need the pid of the process that gets 
launched by the {{containerizer launch}} binary.

Using this pid, isolators can properly enter the namespaces of the actual 
*task* or *executor* launched by the {{containerizer launch}} binary instead of 
just the namespaces of the "init" process (which may be different).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2115) Improve recovering Docker containers when slave is contained

2016-11-03 Thread Marc Villacorta (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633471#comment-15633471
 ] 

Marc Villacorta edited comment on MESOS-2115 at 11/3/16 5:07 PM:
-

[~SailC] The docker image you specify in {{--docker_mesos_image}} must have a 
docker client embedded (not bind-mounted) this image will be used to run the 
mesos executor. I personally use the same image for the mesos-agent and for the 
executor. In this 
[commit|https://github.com/katosys/kato/commit/50b7a82d8c63373b53072be33943cb6ff56a20b5]
 I switch from docker to rocket and it might be of interest to you because it 
shows how this can be achieved with both container runtimes.



was (Author: h0tbird):
[~SailC] The docker image you specify in {{--docker_mesos_image}} must have a 
docker client embedded (not bind-mounted) this image will be used to run the 
mesos executor. I personally use the same image for the mesos-agent and for the 
executor. In this 
[commit|https://github.com/katosys/kato/commit/50b7a82d8c63373b53072be33943cb6ff56a20b5]
 I switch from docker to rocker and it might be of interest to you because it 
shows how this can be achieved with both container runtimes.


> Improve recovering Docker containers when slave is contained
> 
>
> Key: MESOS-2115
> URL: https://issues.apache.org/jira/browse/MESOS-2115
> Project: Mesos
>  Issue Type: Epic
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: docker
> Fix For: 0.23.0
>
>
> Currently when docker containerizer is recovering it checks the checkpointed 
> executor pids to recover which containers are still running, and remove the 
> rest of the containers from docker ps that isn't recognized.
> This is problematic when the slave itself was in a docker container, as when 
> the slave container dies all the forked processes are removed as well, so the 
> checkpointed executor pids are no longer valid.
> We have to assume the docker containers might be still running even though 
> the checkpointed executor pids are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2115) Improve recovering Docker containers when slave is contained

2016-11-03 Thread Marc Villacorta (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633471#comment-15633471
 ] 

Marc Villacorta commented on MESOS-2115:


[~SailC] The docker image you specify in {{--docker_mesos_image}} must have a 
docker client embedded (not bind-mounted) this image will be used to run the 
mesos executor. I personally use the same image for the mesos-agent and for the 
executor. In this 
[commit|https://github.com/katosys/kato/commit/50b7a82d8c63373b53072be33943cb6ff56a20b5]
 I switch from docker to rocker and it might be of interest to you because it 
shows how this can be achieved with both container runtimes.


> Improve recovering Docker containers when slave is contained
> 
>
> Key: MESOS-2115
> URL: https://issues.apache.org/jira/browse/MESOS-2115
> Project: Mesos
>  Issue Type: Epic
>  Components: docker
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: docker
> Fix For: 0.23.0
>
>
> Currently when docker containerizer is recovering it checks the checkpointed 
> executor pids to recover which containers are still running, and remove the 
> rest of the containers from docker ps that isn't recognized.
> This is problematic when the slave itself was in a docker container, as when 
> the slave container dies all the forked processes are removed as well, so the 
> checkpointed executor pids are no longer valid.
> We have to assume the docker containers might be still running even though 
> the checkpointed executor pids are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6539) Compile warning in GMock: "binding dereferenced null pointer to reference"

2016-11-03 Thread Neil Conway (JIRA)
Neil Conway created MESOS-6539:
--

 Summary: Compile warning in GMock: "binding dereferenced null 
pointer to reference"
 Key: MESOS-6539
 URL: https://issues.apache.org/jira/browse/MESOS-6539
 Project: Mesos
  Issue Type: Bug
  Components: technical debt
Reporter: Neil Conway


{noformat}
In file included from ../gmock-1.7.0/include/gmock/gmock-actions.h:46:
../gmock-1.7.0/include/gmock/internal/gmock-internal-utils.h:371:7: warning: 
binding dereferenced null pointer to reference has undefined behavior 
[-Wnull-dereference]
  *static_cast::type*>(NULL));
  ^~~~
../gmock-1.7.0/include/gmock/gmock-actions.h:78:22: note: in instantiation of 
function template specialization 
'testing::internal::Invalid >' requested here
return internal::Invalid();
 ^
../gmock-1.7.0/include/gmock/gmock-actions.h:190:43: note: in instantiation of 
member function 'testing::internal::BuiltInDefaultValue 
>::Get' requested here
internal::BuiltInDefaultValue::Get() : *value_;
  ^
../gmock-1.7.0/include/gmock/gmock-spec-builders.h:1460:34: note: in 
instantiation of member function 'testing::DefaultValue 
>::Get' requested here
return DefaultValue::Get();
 ^
../gmock-1.7.0/include/gmock/gmock-spec-builders.h:1350:22: note: in 
instantiation of member function 
'testing::internal::FunctionMockerBase 
(bool)>::PerformDefaultAction' requested here
func_mocker->PerformDefaultAction(args, call_description));
 ^
../gmock-1.7.0/include/gmock/gmock-spec-builders.h:1473:26: note: in 
instantiation of function template specialization 
'testing::internal::ActionResultHolder 
>::PerformDefaultAction (bool)>' requested here
return ResultHolder::PerformDefaultAction(this, args, call_description);
 ^
../../../mesos/3rdparty/libprocess/src/tests/process_tests.cpp:152:7: note: in 
instantiation of member function 
'testing::internal::FunctionMockerBase 
(bool)>::UntypedPerformDefaultAction' requested here
class DispatchProcess : public Process
  ^
In file included from 
../../../mesos/3rdparty/libprocess/src/tests/process_tests.cpp:20:
{noformat}

The code in question has changed in upstream GMock: 
https://github.com/google/googletest/blob/master/googlemock/include/gmock/internal/gmock-internal-utils.h#L377

So the easiest fix is probably to vendor GMock 1.8.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6390) Ensure Python support scripts are linted

2016-11-03 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633248#comment-15633248
 ] 

Kevin Klues commented on MESOS-6390:


I would hold off on doing this until after https://reviews.apache.org/r/53074/ 
lands.

Additionally, keep in mind that in order to do this properly, we will have to 
build a common virtualenv for use by the entire mesos source tree instead of 
just creating one inside the {{src/new_cli}} directory.

> Ensure Python support scripts are linted
> 
>
> Key: MESOS-6390
> URL: https://issues.apache.org/jira/browse/MESOS-6390
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Assignee: Manuwela Kanade
>  Labels: newbie, python
>
> Currently {{support/mesos-style.py}} does not lint files under {{support/}}. 
> This is mostly due to the fact that these scripts are too inconsistent 
> style-wise that they wouldn't even pass the linter now.
> We should clean up all Python scripts under {{support/}} so they pass the 
> Python linter, and activate that directory in the linter for future 
> additions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4709) Enable compiler optimization by default

2016-11-03 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633204#comment-15633204
 ] 

Neil Conway commented on MESOS-4709:


See also this mailing list thread: 
https://lists.apache.org/thread.html/ded657aebe9e3013114a3af584a064bae39246c14e72338c978aa92f@1455755092@%3Cdev.mesos.apache.org%3E

> Enable compiler optimization by default
> ---
>
> Key: MESOS-4709
> URL: https://issues.apache.org/jira/browse/MESOS-4709
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: autoconf, configure, mesosphere
>
> At present, Mesos defaults to compiling with "-O0"; to enable compiler
> optimizations, the user needs to specify "--enable-optimize" when running 
> {{configure}}.
> We should change the default for the following reasons:
> (1) The autoconf default for CFLAGS/CXXFLAGS is "-O2 -g". Anecdotally,
> I think most software packages compile with a reasonable level of
> optimizations enabled by default.
> (2) I think we should make the default configure flags appropriate for
> end-users (rather than Mesos developers): developers will be familiar
> enough with Mesos to tune the configure flags according to their own
> preferences.
> (3) The performance consequences of not enabling compiler
> optimizations can be pretty severe: 5x in a benchmark I just ran, and
> we've seen between 2x and 30x (!) performance differences for some
> real-world workloads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error

2016-11-03 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6532:
--
Affects Version/s: 1.0.1

> I use mesos container type, I set CommandInfo command set shell cmd, eg: 
> python a.py "xx  xxx", but get error
> -
>
> Key: MESOS-6532
> URL: https://issues.apache.org/jira/browse/MESOS-6532
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Affects Versions: 1.0.1
>Reporter: yongyu
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error

2016-11-03 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632762#comment-15632762
 ] 

Till Toenshoff commented on MESOS-6532:
---

Could you please provide the complete app definition as well as the complete 
resulting task stderr?

Mind that currently it is cut off at {{"value":"bash': syntax error at line 1 
near:"}}

> I use mesos container type, I set CommandInfo command set shell cmd, eg: 
> python a.py "xx  xxx", but get error
> -
>
> Key: MESOS-6532
> URL: https://issues.apache.org/jira/browse/MESOS-6532
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: yongyu
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error

2016-11-03 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629152#comment-15629152
 ] 

Till Toenshoff edited comment on MESOS-6532 at 11/3/16 1:44 PM:


task'stderr is :
{noformat}
I1102 20:56:44.266408 13134 exec.cpp:161] Version: 1.0.1
I1102 20:56:44.270975 13129 exec.cpp:236] Executor registered on agent 
52245f11-e42d-4d67-b470-80bc9c4b10c2-S0
Failed to parse the flags: Failed to load flag 'command': Failed to load value 
'{"environment":{"variables":[{"name":"MARATHON_APP_VERSION","value":"2016-11-02T12:44:14.934Z"},{"name":"HOST","value":"10.191.154.105"},{"name":"MARATHON_APP_RESOURCE_CPUS","value":"1.0"},{"name":"MARATHON_APP_RESOURCE_GPUS","value":"0"},{"name":"PORT_10001","value":"31279"},{"name":"MESOS_TASK_ID","value":"2.dbe78f32-a0fb-11e6-afab-d48564cf107d"},{"name":"PORT","value":"31279"},{"name":"MARATHON_APP_RESOURCE_MEM","value":"128.0"},{"name":"PORTS","value":"31279"},{"name":"MARATHON_APP_RESOURCE_DISK","value":"0.0"},{"name":"MARATHON_APP_LABELS","value":""},{"name":"MARATHON_APP_ID","value":"\/2"},{"name":"PORT0","value":"31279"}]},"shell":true,"value":"bash':
 syntax error at line 1 near: 
{noformat}



was (Author: 2507697...@qq.com):
task'stderr is :
I1102 20:56:44.266408 13134 exec.cpp:161] Version: 1.0.1
I1102 20:56:44.270975 13129 exec.cpp:236] Executor registered on agent 
52245f11-e42d-4d67-b470-80bc9c4b10c2-S0
Failed to parse the flags: Failed to load flag 'command': Failed to load value 
'{"environment":{"variables":[{"name":"MARATHON_APP_VERSION","value":"2016-11-02T12:44:14.934Z"},{"name":"HOST","value":"10.191.154.105"},{"name":"MARATHON_APP_RESOURCE_CPUS","value":"1.0"},{"name":"MARATHON_APP_RESOURCE_GPUS","value":"0"},{"name":"PORT_10001","value":"31279"},{"name":"MESOS_TASK_ID","value":"2.dbe78f32-a0fb-11e6-afab-d48564cf107d"},{"name":"PORT","value":"31279"},{"name":"MARATHON_APP_RESOURCE_MEM","value":"128.0"},{"name":"PORTS","value":"31279"},{"name":"MARATHON_APP_RESOURCE_DISK","value":"0.0"},{"name":"MARATHON_APP_LABELS","value":""},{"name":"MARATHON_APP_ID","value":"\/2"},{"name":"PORT0","value":"31279"}]},"shell":true,"value":"bash':
 syntax error at line 1 near: 

> I use mesos container type, I set CommandInfo command set shell cmd, eg: 
> python a.py "xx  xxx", but get error
> -
>
> Key: MESOS-6532
> URL: https://issues.apache.org/jira/browse/MESOS-6532
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: yongyu
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5533) Agent fails to start on CentOS 6 due to missing cgroup hierarchy.

2016-11-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5533:
---
Priority: Major  (was: Critical)

> Agent fails to start on CentOS 6 due to missing cgroup hierarchy.
> -
>
> Key: MESOS-5533
> URL: https://issues.apache.org/jira/browse/MESOS-5533
> Project: Mesos
>  Issue Type: Bug
>  Components: build, isolation
>Reporter: Kapil Arya
>Assignee: Jie Yu
>  Labels: mesosphere
>
> With the network CNI isolator, agent now _requires_ cgroups to be installed 
> on the system. Can we add some check(s) to either automatically disable CNI 
> module if cgroup hierarchies are not available or ask the user to 
> install/enable cgroup hierarchies.
> On CentOS 6, cgroup tools aren't installed by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5533) Agent fails to start on CentOS 6 due to missing cgroup hierarchy.

2016-11-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5533:
---
Target Version/s: 1.2.0

> Agent fails to start on CentOS 6 due to missing cgroup hierarchy.
> -
>
> Key: MESOS-5533
> URL: https://issues.apache.org/jira/browse/MESOS-5533
> Project: Mesos
>  Issue Type: Bug
>  Components: build, isolation
>Reporter: Kapil Arya
>Assignee: Jie Yu
>  Labels: mesosphere
>
> With the network CNI isolator, agent now _requires_ cgroups to be installed 
> on the system. Can we add some check(s) to either automatically disable CNI 
> module if cgroup hierarchies are not available or ask the user to 
> install/enable cgroup hierarchies.
> On CentOS 6, cgroup tools aren't installed by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6142) Frameworks may RESERVE for an arbitrary role.

2016-11-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6142:
---
Priority: Critical  (was: Major)

> Frameworks may RESERVE for an arbitrary role.
> -
>
> Key: MESOS-6142
> URL: https://issues.apache.org/jira/browse/MESOS-6142
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>Priority: Critical
>  Labels: mesosphere, reservations
>
> The master does not validate that resources from a reservation request have 
> the same role the framework is registered with. As a result, frameworks may 
> reserve resources for arbitrary roles.
> I've modified the role in [the {{ReserveThenUnreserve}} 
> test|https://github.com/apache/mesos/blob/bca600cf5602ed8227d91af9f73d689da14ad786/src/tests/reservation_tests.cpp#L117]
>  to "yoyo" and observed the following in the test's log:
> {noformat}
> I0908 18:35:43.379122 2138112 master.cpp:3362] Processing ACCEPT call for 
> offers: [ dfaf67e6-7c1c-4988-b427-c49842cb7bb7-O0 ] on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train) for framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- 
> (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116
> I0908 18:35:43.379170 2138112 master.cpp:3022] Authorizing principal 
> 'test-principal' to reserve resources 'cpus(yoyo, test-principal):1; 
> mem(yoyo, test-principal):512'
> I0908 18:35:43.379678 2138112 master.cpp:3642] Applying RESERVE operation for 
> resources cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 from 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.379767 2138112 master.cpp:7341] Sending checkpointed resources 
> cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.380273 3211264 slave.cpp:2497] Updated checkpointed resources 
> from  to cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512
> I0908 18:35:43.380574 2674688 hierarchical.cpp:760] Updated allocation of 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 from cpus(*):1; mem(*):512; 
> disk(*):470841; ports(*):[31000-32000] to ports(*):[31000-32000]; cpus(yoyo, 
> test-principal):1; disk(*):470841; mem(yoyo, test-principal):512 with RESERVE 
> operation
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6117) TCP health checks are not supported on Windows.

2016-11-03 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632627#comment-15632627
 ] 

Alexander Rukletsov commented on MESOS-6117:


{noformat}
Commit: eb225062a40556250e8825d90d2b16b470d1ec4a [eb22506]
Author: Alexander Rukletsov al...@apache.org
Date: 2 September 2016 at 18:25:38 GMT+2
Commit Date: 3 November 2016 at 11:34:29 GMT+1

Extracted "curl" binary into HTTP_CHECK_COMMAND constant.

Review: https://reviews.apache.org/r/51608
{noformat}

> TCP health checks are not supported on Windows.
> ---
>
> Key: MESOS-6117
> URL: https://issues.apache.org/jira/browse/MESOS-6117
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently, TCP health check is only available on Linux. Windows support 
> should be added to maintain feature parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6532) I use mesos container type, I set CommandInfo command set shell cmd, eg: python a.py "xx xxx", but get error

2016-11-03 Thread yongyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongyu updated MESOS-6532:
--
Priority: Critical  (was: Major)

> I use mesos container type, I set CommandInfo command set shell cmd, eg: 
> python a.py "xx  xxx", but get error
> -
>
> Key: MESOS-6532
> URL: https://issues.apache.org/jira/browse/MESOS-6532
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: yongyu
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.

2016-11-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629370#comment-15629370
 ] 

Gastón Kleiman edited comment on MESOS-6457 at 11/3/16 9:21 AM:


Patches:

https://reviews.apache.org/r/53378/
https://reviews.apache.org/r/53406/
https://reviews.apache.org/r/53407/
https://reviews.apache.org/r/53385/


was (Author: gkleiman):
Patches:

https://reviews.apache.org/r/53378/
https://reviews.apache.org/r/53385/

> Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING.
> -
>
> Key: MESOS-6457
> URL: https://issues.apache.org/jira/browse/MESOS-6457
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Blocker
>
> A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if 
> for example it starts/stops passing a health check once it got into the 
> {{TASK_KILLING}} state.
> I think that this behaviour is counterintuitive. It also makes the life of 
> framework/tools developers harder, since they have to keep track of the 
> complete task status history in order to know if a task is being killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6390) Ensure Python support scripts are linted

2016-11-03 Thread Manuwela Kanade (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manuwela Kanade reassigned MESOS-6390:
--

Assignee: Manuwela Kanade

> Ensure Python support scripts are linted
> 
>
> Key: MESOS-6390
> URL: https://issues.apache.org/jira/browse/MESOS-6390
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Assignee: Manuwela Kanade
>  Labels: newbie, python
>
> Currently {{support/mesos-style.py}} does not lint files under {{support/}}. 
> This is mostly due to the fact that these scripts are too inconsistent 
> style-wise that they wouldn't even pass the linter now.
> We should clean up all Python scripts under {{support/}} so they pass the 
> Python linter, and activate that directory in the linter for future 
> additions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6390) Ensure Python support scripts are linted

2016-11-03 Thread Manuwela Kanade (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632101#comment-15632101
 ] 

Manuwela Kanade commented on MESOS-6390:


Hi [~bbannier]: I would like to work on this issue. I'll assign it to myself 
for now. Thanks

> Ensure Python support scripts are linted
> 
>
> Key: MESOS-6390
> URL: https://issues.apache.org/jira/browse/MESOS-6390
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>  Labels: newbie, python
>
> Currently {{support/mesos-style.py}} does not lint files under {{support/}}. 
> This is mostly due to the fact that these scripts are too inconsistent 
> style-wise that they wouldn't even pass the linter now.
> We should clean up all Python scripts under {{support/}} so they pass the 
> Python linter, and activate that directory in the linter for future 
> additions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)