[jira] [Commented] (MESOS-4604) ROOT_DOCKER_DockerHealthyTask is flaky.

2017-01-06 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806984#comment-15806984
 ] 

Avinash Sridharan commented on MESOS-4604:
--

After refactoring of the tests is this still an issue?

> ROOT_DOCKER_DockerHealthyTask is flaky.
> ---
>
> Key: MESOS-4604
> URL: https://issues.apache.org/jira/browse/MESOS-4604
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: CentOS 6/7, Ubuntu 15.04 on AWS.
>Reporter: Jan Schlicht
>Assignee: Joseph Wu
>  Labels: flaky-test, health-check, mesosphere, test
>
> Log from Teamcity that is running {{sudo ./bin/mesos-tests.sh}} on AWS EC2 
> instances:
> {noformat}
> [18:27:14][Step 8/8] [--] 8 tests from HealthCheckTest
> [18:27:14][Step 8/8] [ RUN  ] HealthCheckTest.HealthyTask
> [18:27:17][Step 8/8] [   OK ] HealthCheckTest.HealthyTask ( ms)
> [18:27:17][Step 8/8] [ RUN  ] 
> HealthCheckTest.ROOT_DOCKER_DockerHealthyTask
> [18:27:36][Step 8/8] ../../src/tests/health_check_tests.cpp:388: Failure
> [18:27:36][Step 8/8] Failed to wait 15secs for termination
> [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure 
> virtual method called
> [18:27:36][Step 8/8] @ 0x7f7077055e1c  google::LogMessage::Fail()
> [18:27:36][Step 8/8] @ 0x7f707705ba6f  google::RawLog__()
> [18:27:36][Step 8/8] @ 0x7f70760f76c9  __cxa_pure_virtual
> [18:27:36][Step 8/8] @   0xa9423c  
> mesos::internal::tests::Cluster::Slaves::shutdown()
> [18:27:36][Step 8/8] @  0x1074e45  
> mesos::internal::tests::MesosTest::ShutdownSlaves()
> [18:27:36][Step 8/8] @  0x1074de4  
> mesos::internal::tests::MesosTest::Shutdown()
> [18:27:36][Step 8/8] @  0x1070ec7  
> mesos::internal::tests::MesosTest::TearDown()
> [18:27:36][Step 8/8] @  0x16eb7b2  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [18:27:36][Step 8/8] @  0x16e61a9  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> [18:27:36][Step 8/8] @  0x16c56aa  testing::Test::Run()
> [18:27:36][Step 8/8] @  0x16c5e89  testing::TestInfo::Run()
> [18:27:36][Step 8/8] @  0x16c650a  testing::TestCase::Run()
> [18:27:36][Step 8/8] @  0x16cd1f6  
> testing::internal::UnitTestImpl::RunAllTests()
> [18:27:36][Step 8/8] @  0x16ec513  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [18:27:36][Step 8/8] @  0x16e6df1  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> [18:27:36][Step 8/8] @  0x16cbe26  testing::UnitTest::Run()
> [18:27:36][Step 8/8] @   0xe54c84  RUN_ALL_TESTS()
> [18:27:36][Step 8/8] @   0xe54867  main
> [18:27:36][Step 8/8] @ 0x7f7071560a40  (unknown)
> [18:27:36][Step 8/8] @   0x9b52d9  _start
> [18:27:36][Step 8/8] Aborted (core dumped)
> [18:27:36][Step 8/8] Process exited with code 134
> {noformat}
> Happens with Ubuntu 15.04, CentOS 6, CentOS 7 _quite_ often. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6598) Broken Link Framework Development Page

2017-01-06 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806968#comment-15806968
 ] 

Avinash Sridharan commented on MESOS-6598:
--

[~kaysoky] should this be in the current sprint?

> Broken Link Framework Development Page
> --
>
> Key: MESOS-6598
> URL: https://issues.apache.org/jira/browse/MESOS-6598
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Miguel Bernadin
>Assignee: Joseph Wu
>Priority: Trivial
>
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/
> The link to this page is broken: 
> Create your Framework Scheduler
> If you are writing a scheduler against Mesos 1.0 or newer, it is recommended 
> to use the new HTTP API (BROKEN LINK) to talk to Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6504) Use 'geteuid()' for the root privileges check.

2017-01-06 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806960#comment-15806960
 ] 

Avinash Sridharan commented on MESOS-6504:
--

[~gilbert] [~jieyu] will be able to finish this in the coming sprint? If not we 
should move it out of the sprint.

> Use 'geteuid()' for the root privileges check.
> --
>
> Key: MESOS-6504
> URL: https://issues.apache.org/jira/browse/MESOS-6504
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: backend, isolator, mesosphere, user
>
> Currently, parts of code in Mesos check the root privileges using os::user() 
> to compare to "root", which is not sufficient, since it compares the real 
> user. When people change the mesos binary by 'setuid root', the process may 
> not have the right permission to execute.
> We should check the effective user id instead in our code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5578) Support static address allocation in CNI

2017-01-06 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556755#comment-15556755
 ] 

Avinash Sridharan edited comment on MESOS-5578 at 1/7/17 6:37 AM:
--

Raised the issue of static IP addresses in the CNI community.


was (Author: avin...@mesosphere.io):
Raises the issue of static IP addresses in the CNI community.

> Support static address allocation in CNI
> 
>
> Key: MESOS-5578
> URL: https://issues.apache.org/jira/browse/MESOS-5578
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently a framework can't specify a static IP address for the container 
> when using the network/cni isolator.
> The `ipaddress` field in the `NetworkInfo` protobuf was designed for this 
> specific purpose but since the CNI spec does not specify a means to allocate 
> an IP address to the container the `network/cni` isolator cannot honor this 
> field even when it is filled in by the framework.
> Creating this ticket to act as a place holder to track this limitation. As 
> and when the CNI spec allows us to specify a static IP address for the 
> container, we can resolve this ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5646) Build `network/cni` isolator with `libnl` support

2017-01-06 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan reassigned MESOS-5646:


Assignee: Avinash Sridharan  (was: Qian Zhang)

> Build `network/cni` isolator with `libnl` support
> -
>
> Key: MESOS-5646
> URL: https://issues.apache.org/jira/browse/MESOS-5646
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Affects Versions: 1.0.0
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently, the `network/cni` isolator does not have the ability to collect 
> network statistics for containers launched on a CNI network. We need to give 
> the `network/cni` isolator the ability to query interfaces, route tables and 
> statistics in the containers network namespace. To achieve this the 
> `network/cni` isolator will need to talk `netlink`.
> For enabling `netlink` API we need the `network/cni` isolator to be built 
> with libnl support. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6891) Test that only one ATTACH_CONTAINER_INPUT call is allowed

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6891:
-

 Summary: Test that only one ATTACH_CONTAINER_INPUT call is allowed
 Key: MESOS-6891
 URL: https://issues.apache.org/jira/browse/MESOS-6891
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6884) Add a test to verify that scheduler can launch a TTY container

2017-01-06 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806520#comment-15806520
 ] 

Anand Mazumdar commented on MESOS-6884:
---

I don't think so. A related test {{AttachContainerInput}} launches a TTY 
container as a nested sub-container. But, we don't yet have a test that tries 
to launch a root level TTY container using the Scheduler API directly. (e.g., 
launch vim etc.)

> Add a test to verify that scheduler can launch a TTY container
> --
>
> Key: MESOS-6884
> URL: https://issues.apache.org/jira/browse/MESOS-6884
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>
> [~anandmazumdar] Is this already done?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6890) Test that multiple ATTACH_CONTAINER_OUTPUT calls are allowed

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6890:
-

 Summary: Test that multiple ATTACH_CONTAINER_OUTPUT calls are 
allowed
 Key: MESOS-6890
 URL: https://issues.apache.org/jira/browse/MESOS-6890
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6889) Test container launch failure and IOSwitchBoard succeeds case

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6889:
-

 Summary: Test container launch failure and IOSwitchBoard succeeds 
case
 Key: MESOS-6889
 URL: https://issues.apache.org/jira/browse/MESOS-6889
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone


When this happens everything should be cleaned up properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6888) Add TTY resizing tests for ATTACH_CONTAINER_INPUT call

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6888:
-

 Summary: Add TTY resizing tests for ATTACH_CONTAINER_INPUT call
 Key: MESOS-6888
 URL: https://issues.apache.org/jira/browse/MESOS-6888
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6886) Add authorization tests for debug API handlers

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6886:
-

 Summary: Add authorization tests for debug API handlers
 Key: MESOS-6886
 URL: https://issues.apache.org/jira/browse/MESOS-6886
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Alexander Rojas


Should test authz of all 3 debug calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6887) Add validation tests for debug API calls

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6887:
-

 Summary: Add validation tests for debug API calls
 Key: MESOS-6887
 URL: https://issues.apache.org/jira/browse/MESOS-6887
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6885) Test that the client gets EOF when the attached container exits

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6885:
-

 Summary: Test that the client gets EOF when the attached container 
exits
 Key: MESOS-6885
 URL: https://issues.apache.org/jira/browse/MESOS-6885
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone


This should work for container with and without TTY, for normal and debug 
containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6884) Add a test to verify that scheduler can launch a TTY container

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6884:
-

 Summary: Add a test to verify that scheduler can launch a TTY 
container
 Key: MESOS-6884
 URL: https://issues.apache.org/jira/browse/MESOS-6884
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6884) Add a test to verify that scheduler can launch a TTY container

2017-01-06 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6884:
--
   Assignee: Anand Mazumdar
Description: [~anandmazumdar] Is this already done?

> Add a test to verify that scheduler can launch a TTY container
> --
>
> Key: MESOS-6884
> URL: https://issues.apache.org/jira/browse/MESOS-6884
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>
> [~anandmazumdar] Is this already done?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6883) Update HttpProxy to use `http::Server`

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6883:
-

 Summary: Update HttpProxy to use `http::Server`
 Key: MESOS-6883
 URL: https://issues.apache.org/jira/browse/MESOS-6883
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Benjamin Hindman


Once `http::Server` is implemented in MESOS-6882, we need to update HttpProxy 
to use it.

As part of testing this, it would be ideal to add benchmark tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6882) Add `http::server` abstraction that does `http::serve`

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6882:
-

 Summary: Add `http::server` abstraction that does `http::serve`
 Key: MESOS-6882
 URL: https://issues.apache.org/jira/browse/MESOS-6882
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Benjamin Hindman


Instead of the application level code directly calling `http::serve`, it would 
be nice to use a `http::Server` instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6881) GC the sandbox of debug containers

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6881:
-

 Summary: GC the sandbox of debug containers
 Key: MESOS-6881
 URL: https://issues.apache.org/jira/browse/MESOS-6881
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone


Right now the sandbox directory of a debug container outlives the life of the 
debug container. It probably makes more sense for the directory to be cleaned 
up immediately after the debug container exits?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6880) Write user doc for debugging support

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6880:
-

 Summary: Write user doc for debugging support
 Key: MESOS-6880
 URL: https://issues.apache.org/jira/browse/MESOS-6880
 Project: Mesos
  Issue Type: Documentation
Reporter: Vinod Kone
Assignee: Kevin Klues






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6879) IOSwitchboard should wait for stdin to be closed and drained before exiting

2017-01-06 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6879:
---
Description: 
Currently, the IOSwitchboard assumes that whenever a container's {{stdout}} and 
{{stderr}} have been closed, then it is OK to exit the switchboard process. We 
assume this because {{stdout}} and {{stderr}} will only be closed after both 
the read end of the {{stdout}} stream and the read end of the {{stderr}} stream 
have been drained. Since draining these {{fds}} represents having read 
everything possible from a container's {{stdout}} and {{stderr}} this is likely 
sufficient termination criteria. However, there's a non-zero chance that *some* 
containers may decide to close their {{stdout}} and {{stderr}} while expecting 
to continue reading from {{stdin}}. For now we don't support containers with 
this behavior and we will exit out of the switchboard process early.

The reason we don't support this currently is that {{libevent}} and {{libev} 
don't provide a nice way of asynchronously detecting when an {{fd}} has been 
closed. If they did, we could easily leverage this to asynchronously wait until 
{{stdin}} was closed before killing the IOSwitchboard process.

Once these libraries support this (or we find a workaround) we should update 
the IOSwitchboard appropriately.



  was:[~klueska] can you fill the description for this?


> IOSwitchboard should wait for stdin to be closed and drained before exiting
> ---
>
> Key: MESOS-6879
> URL: https://issues.apache.org/jira/browse/MESOS-6879
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Kevin Klues
>
> Currently, the IOSwitchboard assumes that whenever a container's {{stdout}} 
> and {{stderr}} have been closed, then it is OK to exit the switchboard 
> process. We assume this because {{stdout}} and {{stderr}} will only be closed 
> after both the read end of the {{stdout}} stream and the read end of the 
> {{stderr}} stream have been drained. Since draining these {{fds}} represents 
> having read everything possible from a container's {{stdout}} and {{stderr}} 
> this is likely sufficient termination criteria. However, there's a non-zero 
> chance that *some* containers may decide to close their {{stdout}} and 
> {{stderr}} while expecting to continue reading from {{stdin}}. For now we 
> don't support containers with this behavior and we will exit out of the 
> switchboard process early.
> The reason we don't support this currently is that {{libevent}} and {{libev} 
> don't provide a nice way of asynchronously detecting when an {{fd}} has been 
> closed. If they did, we could easily leverage this to asynchronously wait 
> until {{stdin}} was closed before killing the IOSwitchboard process.
> Once these libraries support this (or we find a workaround) we should update 
> the IOSwitchboard appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6782) Inherit Environment from Parent containers image spec when launching DEBUG container

2017-01-06 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806433#comment-15806433
 ] 

Kevin Klues commented on MESOS-6782:


This should not be a blocker. We should set up a meeting to discuss whether 
these are the semantics we want or not. Relatedly, we should discuss what 
directory we enter by default when `execing` into a container (should it be '/' 
or the sandbox).

> Inherit Environment from Parent containers image spec when launching DEBUG 
> container
> 
>
> Key: MESOS-6782
> URL: https://issues.apache.org/jira/browse/MESOS-6782
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Jie Yu
>  Labels: debugging, mesosphere
> Fix For: 1.2.0
>
>
> Right now whenever we enter a DEBUG container we have a fresh environment. 
> For a better user experience, we should have the DEBUG container inherit the 
> environment set up in its parent container image spec (if there is one). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6878) Figure out the right CWD for debug container

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6878:
-

 Summary: Figure out the right CWD for debug container
 Key: MESOS-6878
 URL: https://issues.apache.org/jira/browse/MESOS-6878
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone


Right now debug container lands in the root of the filesystem of the task it is 
debugging. Alternatively, we can make it land in the sandbox directory of the 
task instead. 

This ticket is to track the decision regarding keeping the current behavior or 
changing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6804) Running 'tty' inside a debug container that has a tty reports "Not a tty"

2017-01-06 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6804:
--
Assignee: Kevin Klues
Priority: Blocker  (was: Major)

> Running 'tty' inside a debug container that has a tty reports "Not a tty"
> -
>
> Key: MESOS-6804
> URL: https://issues.apache.org/jira/browse/MESOS-6804
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: debugging, mesosphere
>
> We need to inject `/dev/console` into the container and map it to the slave 
> end of the TTY we are attached to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6877) Mount /dev/console into containers that request tty

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6877:
-

 Summary: Mount /dev/console into containers that request tty
 Key: MESOS-6877
 URL: https://issues.apache.org/jira/browse/MESOS-6877
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Kevin Klues
Priority: Blocker


We should auto mount /dev/console into containers when tty is requested because 
that is the expected behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6876) Default "Accept" type for LAUNCH_NESTED_CONTAINER_SESSION and ATTACH_CONTAINER_OUTPUT should be streaming type

2017-01-06 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6876:
-

 Summary: Default "Accept" type for LAUNCH_NESTED_CONTAINER_SESSION 
and ATTACH_CONTAINER_OUTPUT should be streaming type
 Key: MESOS-6876
 URL: https://issues.apache.org/jira/browse/MESOS-6876
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Anand Mazumdar
Priority: Blocker


Right now the default "Accept" type in the HTTP response to 
LAUNCH_NESTED_CONTAINER_SESSION and ATTACH_CONTAINER_OUTPUT is 
"application/json". This should be instead "application/json+recordio" or 
whatever we decide the streaming type should be in MESOS-3601.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3601) Formalize all headers and metadata for HTTP API Event Stream

2017-01-06 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3601:
--
Assignee: Anand Mazumdar
Target Version/s: 1.2.0
Priority: Blocker  (was: Major)

> Formalize all headers and metadata for HTTP API Event Stream
> 
>
> Key: MESOS-3601
> URL: https://issues.apache.org/jira/browse/MESOS-3601
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.24.0
> Environment: Mesos 0.24.0
>Reporter: Ben Whitehead
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: api, http, mesosphere, wireprotocol
>
> From an HTTP standpoint the current set of headers returned when connecting 
> to the HTTP scheduler API are insufficient. 
> {code:title=current headers}
> HTTP/1.1 200 OK
> Transfer-Encoding: chunked
> Date: Wed, 30 Sep 2015 21:07:16 GMT
> Content-Type: application/json
> {code}
> Since the response from mesos is intended to function as a stream 
> {{Connection: keep-alive}} should be specified so that the connection can 
> remain open.
> If RecordIO is going to be applied to the messages, the headers should 
> include the information necessary for a client to be able to detect RecordIO 
> and setup it response handlers appropriately.
> How RecordIO is expressed will come down to the semantics of what is actually 
> "Returned" as the response from {{POST /api/v1/scheduler}}.
> h4. Proposal
> One approach would be to leverage http as much as possible, having a client 
> specify an {{Accept-Encoding}} along with the {{Accept}} header to indicate 
> that it can handle RecordIO {{Content-Encoding}} of {{Content-Type}} 
> messages.  (This approach allows for things like gzip to be woven in fairly 
> easily in the future)
> For this approach I would expect the following:
> {code:title=Request}
> POST /api/v1/scheduler HTTP/1.1
> Host: localhost:5050
> Accept: application/x-protobuf
> Accept-Encoding: recordio
> Content-Type: application/x-protobuf
> Content-Length: 35
> User-Agent: RxNetty Client
> {code}
> {code:title=Response}
> HTTP/1.1 200 OK
> Connection: keep-alive
> Transfer-Encoding: chunked
> Content-Type: application/x-protobuf
> Content-Encoding: recordio
> Cache-Control: no-transform
> {code}
> When Content-Encoding is used it is recommended to set {{Cache-Control: 
> no-transform}} to signal to any proxies that no transformation should be 
> applied to the the content encoding [Section 14.11 RFC 
> 2616|http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6596) Dynamic reservation endpoint returns 409s

2017-01-06 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806354#comment-15806354
 ] 

Michael Park commented on MESOS-6596:
-

[~zhitao] What is the {{allocation_interval}} for the cluster, and how many 
frameworks are in play?
I think [~kaysoky] is right in that you are indeed running into the 
{{allocate}} vs {{updateAvailable}} race.

We initially tried to "practically" get around the issue with this piece of 
code: 
https://github.com/apache/mesos/blob/1.1.0/src%2Fmaster%2Fhttp.cpp#L4599-L4606
which was a hack to begin with, and seems that it's not good enough practically 
because the {{Filter}} is only applied to the specific framework.

There have been thoughts about making the master/allocator have a much closer 
relationship, but I think that's a much bigger undertaking.
Meanwhile, I think we could consider something like: adding a call to the 
allocator to request leaving room for specified resources,
so that the batch {{allocate}} doesn't flush all of the resources before 
{{updateAvailable}} call gets processed by the allocator.

> Dynamic reservation endpoint returns 409s
> -
>
> Key: MESOS-6596
> URL: https://issues.apache.org/jira/browse/MESOS-6596
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Kunal Thakar
>
> The operation to dynamically reserve a host for a framework consistently 
> fails, but succeeds sometimes.
> We are calling the /reserve endpoint on the master with the same payload and 
> it mostly returns 409, with the occasional success. Pasting the output of two 
> consecutive /reserve calls:
> {code}
> * About to connect() to computexxx-yyy port 5050 (#0)
> *   Trying 10.184.21.3... connected
> * Server auth using Basic with user 'cassandra'
> > POST /master/reserve HTTP/1.1
> > Authorization: Basic blah
> > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> > zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> > Host: computexxx-yyy:5050
> > Accept: */*
> > Content-Length: 1046
> > Content-Type: application/x-www-form-urlencoded
> > Expect: 100-continue
> >
> * Done waiting for 100-continue
> < HTTP/1.1 409 Conflict
> HTTP/1.1 409 Conflict
> < Date: Tue, 15 Nov 2016 23:07:10 GMT
> Date: Tue, 15 Nov 2016 23:07:10 GMT
> < Content-Type: text/plain; charset=utf-8
> Content-Type: text/plain; charset=utf-8
> < Content-Length: 58
> Content-Length: 58
> * HTTP error before end of send, stop sending
> <
> * Closing connection #0
> Invalid RESERVE Operation:  does not contain mem(*):120621
> {code}
> {code}
> * About to connect() to computexxx-yyy port 5050 (#0)
> *   Trying 10.184.21.3... connected
> * Server auth using Basic with user 'cassandra'
> > POST /master/reserve HTTP/1.1
> > Authorization: Basic blah
> > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j 
> > zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> > Host: computexxx-yyy:5050
> > Accept: */*
> > Content-Length: 1046
> > Content-Type: application/x-www-form-urlencoded
> > Expect: 100-continue
> >
> * Done waiting for 100-continue
> < HTTP/1.1 202 Accepted
> HTTP/1.1 202 Accepted
> < Date: Tue, 15 Nov 2016 23:07:16 GMT
> Date: Tue, 15 Nov 2016 23:07:16 GMT
> < Content-Length: 0
> Content-Length: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6845) All /master endpoint documentation lack request details

2017-01-06 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806335#comment-15806335
 ] 

Benjamin Mahler commented on MESOS-6845:


I don't know if we're quite at the point yet where we've called the existing 
http endpoints deprecated, there are some pieces we need still to make it 
usable. For example, you can't use them from the browser: MESOS-6773.

> All /master endpoint documentation lack request details
> ---
>
> Key: MESOS-6845
> URL: https://issues.apache.org/jira/browse/MESOS-6845
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Tobias Mueller
>
> When I was trying to use the /master/teardown endpoint 
> (http://mesos.apache.org/documentation/latest/endpoints/master/teardown/), I 
> was unsuccessful in tearing down my specific framework, because the docs say 
> nothing about request details, such as
> * HTTP method
> * Request body contents
> This makes the usage rather hard, if not impossible. 
> I'd suggest to add call examples like for the Scheduler HTTP API at 
> http://mesos.apache.org/documentation/latest/scheduler-http-api/
> E.g.
> TEARDOWN Request (JSON):
> POST /master/teardown  HTTP/1.1
> Host: masterhost:5050
> Content-Type: application/json
> frameworkId=12220-3440-12532-2345
> TEARDOWN Response:
> HTTP/1.1 200 Ok



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6286) Master does not remove an agent if it is responsive but not registered

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6286:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Master does not remove an agent if it is responsive but not registered
> --
>
> Key: MESOS-6286
> URL: https://issues.apache.org/jira/browse/MESOS-6286
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joseph Wu
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere
>
> As part of MESOS-6285, we observed an agent stuck in the recovery phase.  The 
> agent would do the following in a loop:
> # Systemd starts the agent.
> # The agent detects the master, but does not connect yet.  The agent needs to 
> recover first.
> # The agent responds to {{PingSlaveMessage}} from the master, but it is 
> stalled in recovery.
> # The agent is OOM-killed by the kernel before recovery finishes.  Repeat 
> (1-4).
> The consequences of this:
> * Frameworks will never get a TASK_LOST or terminal status update for tasks 
> on this agent.
> * Executors on the agent can connect to the agent, but will not be able to 
> register.
> We should consider adding some timeout/intervention in the master for 
> responsive, but non-recoverable agents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6664) Force cleanup of IOSwitchboard server if it does not terminate after the container terminates.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6664:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Force cleanup of IOSwitchboard server if it does not terminate after the 
> container terminates.
> --
>
> Key: MESOS-6664
> URL: https://issues.apache.org/jira/browse/MESOS-6664
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Kevin Klues
>
> In normal case, IOSwitchboard server will terminate after container 
> terminates. However, we should be more defensive and always cleanup the 
> IOSwitchboard server if it does not terminate within a reasonable grace 
> period. 
> The reason for the grace period is to allow the IOSwitchboard server to 
> finish redirecting the stdout/stderr to the logger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6419) The 'master/teardown' endpoint should support tearing down 'unregistered_frameworks'.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6419:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> The 'master/teardown' endpoint should support tearing down 
> 'unregistered_frameworks'.
> -
>
> Key: MESOS-6419
> URL: https://issues.apache.org/jira/browse/MESOS-6419
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.26.2, 0.27.3, 0.28.2, 1.0.1
>Reporter: Gilbert Song
>Assignee: Neil Conway
>Priority: Critical
>  Labels: endpoint, master
>
> This issue is exposed from 
> [MESOS-6400](https://issues.apache.org/jira/browse/MESOS-6400). When a user 
> is trying to tear down an 'unregistered_framework' from the 'master/teardown' 
> endpoint, a bad request will be returned: `No framework found with specified 
> ID`.
> Ideally, we should support tearing down an unregistered framework, since 
> those frameworks may occur due to network partition, then all the orphan 
> tasks still occupy the resources. It would be a nightmare if a user has to 
> wait until the unregistered framework to get those resources back.
> This may be the initial implementation: 
> https://github.com/apache/mesos/commit/bb8375975e92ee722befb478ddc3b2541d1ccaa9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6602) Shutdown completed frameworks when unreachable agent re-registers

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6602:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Shutdown completed frameworks when unreachable agent re-registers
> -
>
> Key: MESOS-6602
> URL: https://issues.apache.org/jira/browse/MESOS-6602
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> We currently shutdown completed frameworks when an agent re-registers with a 
> master that it is already registered with (MESOS-633). We should also 
> shutdown completed frameworks when an unreachable agent re-registers.
> This is distinct from the more difficult problem of shutting down completed 
> frameworks after master failover (MESOS-4659).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6040) Add a CMake build for `mesos-port-mapper`

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6040:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 48, 
Mesosphere Sprint 49  (was: Mesosphere Sprint 41, Mesosphere Sprint 42, 
Mesosphere Sprint 48)

> Add a CMake build for `mesos-port-mapper`
> -
>
> Key: MESOS-6040
> URL: https://issues.apache.org/jira/browse/MESOS-6040
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Once the port-mapper binary compiles with GNU make, we need to modify the 
> CMake to build the port-mapper binary as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6366) Design doc for executor authentication

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6366:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Design doc for executor authentication
> --
>
> Key: MESOS-6366
> URL: https://issues.apache.org/jira/browse/MESOS-6366
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6001) Aufs backend cannot support the image with numerous layers.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6001:
-
Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48)

> Aufs backend cannot support the image with numerous layers.
> ---
>
> Key: MESOS-6001
> URL: https://issues.apache.org/jira/browse/MESOS-6001
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any other os with aufs module
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: aufs, backend, containerizer
>
> This issue was exposed in this unit test 
> `ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller` by manually 
> specifying the `bind` backend. Most likely mounting the aufs with specific 
> options is limited by string length.
> {noformat}
> [20:13:07] :   [Step 10/10] [ RUN  ] 
> DockerRuntimeIsolatorTest.ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.615844 23416 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.624106 23416 leveldb.cpp:174] 
> Opened db in 8.148813ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627252 23416 leveldb.cpp:181] 
> Compacted db in 3.126629ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627275 23416 leveldb.cpp:196] 
> Created db iterator in 4410ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627282 23416 leveldb.cpp:202] 
> Seeked to beginning of db in 763ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627287 23416 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 491ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627301 23416 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627563 23434 recover.cpp:451] 
> Starting replica recovery
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627800 23437 recover.cpp:477] 
> Replica is in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628113 23431 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5852)@172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628243 23430 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628365 23437 recover.cpp:568] 
> Updating replica status to STARTING
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628744 23432 master.cpp:375] 
> Master dd755a55-0dd1-4d2d-9a49-812a666015cb (ip-172-30-2-138.mesosphere.io) 
> started on 172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628758 23432 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/OZHDIQ/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/OZHDIQ/master" --zk_session_timeout="10secs"
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628893 23432 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628900 23432 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628902 23432 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628906 23432 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/OZHDIQ/credentials'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628999 23432 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629041 23432 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629114 23432 

[jira] [Updated] (MESOS-6388) Report new PARTITION_AWARE task statuses in HTTP endpoints

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6388:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Report new PARTITION_AWARE task statuses in HTTP endpoints
> --
>
> Key: MESOS-6388
> URL: https://issues.apache.org/jira/browse/MESOS-6388
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> At a minimum, the {{/state-summary}} endpoint needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6654) Duplicate image layer ids may make the backend failed to mount rootfs.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6654:
-
Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48)

> Duplicate image layer ids may make the backend failed to mount rootfs.
> --
>
> Key: MESOS-6654
> URL: https://issues.apache.org/jira/browse/MESOS-6654
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: aufs, backend, containerizer
>
> Some images (e.g., 'mesosphere/inky') may contain duplicate layer ids in 
> manifest, which may cause some backends unable to mount the rootfs (e.g., 
> 'aufs' backend). We should make sure that each layer path returned in 
> 'ImageInfo' is unique.
> Here is an example manifest from 'mesosphere/inky':
> {noformat}
> [20:13:08]W:   [Step 10/10]"name": "mesosphere/inky",
> [20:13:08]W:   [Step 10/10]"tag": "latest",
> [20:13:08]W:   [Step 10/10]"architecture": "amd64",
> [20:13:08]W:   [Step 10/10]"fsLayers": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   }
> [20:13:08]W:   [Step 10/10]],
> [20:13:08]W:   [Step 10/10]"history": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> "{\"id\":\"e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6\",\"parent\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"created\":\"2014-08-15T00:31:36.407713553Z\",\"container\":\"5d55401ff99c7508c9d546926b711c78e3ccb36e39a848024b623b2aef4c2c06\",\"container_config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"/bin/sh\",\"-c\",\"#(nop)
>  ENTRYPOINT 
> [echo]\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"docker_version\":\"1.1.2\",\"author\":\"supp...@mesosphere.io\",\"config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"inky\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"architecture\":\"amd64\",\"os\":\"linux\",\"Size\":0}\n"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> 

[jira] [Updated] (MESOS-6805) Check unreachable task cache for task ID collisions on launch

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6805:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Check unreachable task cache for task ID collisions on launch
> -
>
> Key: MESOS-6805
> URL: https://issues.apache.org/jira/browse/MESOS-6805
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> As discussed in MESOS-6785, it is possible to crash the master by launching a 
> task that reuses the ID of an unreachable/partitioned task. A complete 
> solution to this problem will be quite involved, but an incremental 
> improvement is easy: when we see a task launch operation, reject the launch 
> attempt if the task ID collides with an ID in the per-framework 
> {{unreachableTasks}} cache. This doesn't catch all situations in which IDs 
> are reused, but it is better than nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6475) Mesos Container Attach/Exec Unit Tests

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6475:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Mesos Container Attach/Exec Unit Tests
> --
>
> Key: MESOS-6475
> URL: https://issues.apache.org/jira/browse/MESOS-6475
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Ideally, all unit tests should be written as the individual tasks that make 
> up this Epic are completed. However, sometime this doesn't always happen as 
> planned. 
> This ticket should not be closed and the Epic should not be considered 
> complete until all unit tests for all components have been written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6719) Unify "active" and "state"/"connected" fields in Master::Framework

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6719:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Unify "active" and "state"/"connected" fields in Master::Framework
> --
>
> Key: MESOS-6719
> URL: https://issues.apache.org/jira/browse/MESOS-6719
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> Rather than tracking whether a framework is "active" separately from whether 
> it is "connected", we should consider using a single "state" variable to 
> track the current state of the framework (connected-and-active, 
> connected-and-inactive, disconnected, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6653:
-
Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48)

> Overlayfs backend may fail to mount the rootfs if both container image and 
> image volume are specified.
> --
>
> Key: MESOS-6653
> URL: https://issues.apache.org/jira/browse/MESOS-6653
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: backend, containerizer, overlayfs
>
> Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting 
> arguments. However, if more than one image need to be provisioned (e.g., a 
> container image is specified while image volumes are specified for the same 
> container), the symlink .../backends/overlay/links would fail to be created 
> since it exists already.
> Here is a simple log when we hard code overlayfs as our default backend:
> {noformat}
> [07:02:45] :   [Step 10/10] [ RUN  ] 
> Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0
> [07:02:46] :   [Step 10/10] I1127 07:02:46.416021  2919 
> containerizer.cpp:207] Using isolation: 
> filesystem/linux,volume/image,docker/runtime,network/cni
> [07:02:46] :   [Step 10/10] I1127 07:02:46.419312  2919 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [07:02:46] :   [Step 10/10] E1127 07:02:46.425336  2919 shell.hpp:107] 
> Command 'hadoop version 2>&1' failed; this is the output:
> [07:02:46] :   [Step 10/10] sh: 1: hadoop: not found
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425379  2919 fetcher.cpp:69] 
> Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to 
> create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was 
> either not found or exited with a non-zero exit status: 127
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425452  2919 local_puller.cpp:94] 
> Creating local puller with docker registry '/tmp/R6OUei/registry'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427258  2934 
> containerizer.cpp:956] Starting container 
> 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of 
> framework 
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427592  2938 
> metadata_manager.cpp:167] Looking for image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427774  2936 local_puller.cpp:147] 
> Untarring image 'test_image_rootfs' from 
> '/tmp/R6OUei/registry/test_image_rootfs.tar' to 
> '/tmp/R6OUei/store/staging/9krDz2'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512070  2933 local_puller.cpp:167] 
> The repositories JSON file for image 'test_image_rootfs' is 
> '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512279  2933 local_puller.cpp:295] 
> Extracting layer tar ball 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar
>  to rootfs 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617442  2937 
> metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617908  2938 provisioner.cpp:286] 
> Image layers: 1
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617925  2938 provisioner.cpp:296] 
> Should hit here
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617949  2938 provisioner.cpp:315] 
> : bind
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617959  2938 provisioner.cpp:315] 
> : overlay
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617967  2938 provisioner.cpp:315] 
> : copy
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617974  2938 provisioner.cpp:318] 
> Provisioning image rootfs 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/rootfses/c71e83d2-5dbe-4eb7-a2fc-b8cc826771f7'
>  for container 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 using overlay backend
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618408  2936 overlay.cpp:175] 
> Created symlink 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/links'
>  -> '/tmp/DQ3blT'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618472  2936 overlay.cpp:203] 
> Provisioning image rootfs with overlayfs: 
> 

[jira] [Updated] (MESOS-6619) Duplicate elements in "completed_tasks"

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6619:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Duplicate elements in "completed_tasks"
> ---
>
> Key: MESOS-6619
> URL: https://issues.apache.org/jira/browse/MESOS-6619
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Scenario:
> # Framework starts non-partition-aware task T on agent A
> # Agent A is partitioned. Task T is marked as a "completed task" in the 
> {{Framework}} struct of the master, as part of {{Framework::removeTask}}.
> # Agent A re-registers with the master. The tasks running on A are re-added 
> to their respective frameworks on the master as running tasks.
> # In {{Master::\_reregisterSlave}}, the master sends a 
> {{ShutdownFrameworkMessage}} for all non-partition-aware frameworks running 
> on the agent. The master then does {{removeTask}} for each task managed by 
> one of these frameworks, which results in calling {{Framework::removeTask}}, 
> which adds _another_ task to {{completed_tasks}}. Note that 
> {{completed_tasks}} does not attempt to detect/suppress duplicates, so this 
> results in two elements in the {{completed_tasks}} collection.
> Similar problems occur when a partition-aware task is running on a 
> partitioned agent that re-registers: the result is a task in the {{tasks}} 
> list _and_ a task in the {{completed_tasks}} list.
> Possible fixes/changes:
> * Adding a task to the {{completed_tasks}} list when an agent becomes 
> partitioned is debatable; certainly for partition-aware tasks, the task is 
> not "completed". We might consider adding an "{{unreachable_tasks}}" list to 
> the HTTP endpoints.
> * Regardless of whether we continue to use {{completed_tasks}} or add a new 
> collection, we should ensure the consistency of that data structure after 
> agent re-registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6292) Add unit tests for nested container case for docker/runtime isolator.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6292:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Add unit tests for nested container case for docker/runtime isolator.
> -
>
> Key: MESOS-6292
> URL: https://issues.apache.org/jira/browse/MESOS-6292
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Launch nested containers with different container images specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6504) Use 'geteuid()' for the root privileges check.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6504:
-
Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48)

> Use 'geteuid()' for the root privileges check.
> --
>
> Key: MESOS-6504
> URL: https://issues.apache.org/jira/browse/MESOS-6504
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: backend, isolator, mesosphere, user
>
> Currently, parts of code in Mesos check the root privileges using os::user() 
> to compare to "root", which is not sufficient, since it compares the real 
> user. When people change the mesos binary by 'setuid root', the process may 
> not have the right permission to execute.
> We should check the effective user id instead in our code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6335) Add user doc for task group tasks

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6335:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Vinod Kone
>Assignee: Gilbert Song
> Fix For: 1.2.0
>
>
> Committed some basic documentation. So moving this to pods-improvements epic 
> and targeting this for 1.2.0. I would like this to track the more 
> comprehensive documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6193) Make the docker/volume isolator nesting aware.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6193:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Make the docker/volume isolator nesting aware.
> --
>
> Key: MESOS-6193
> URL: https://issues.apache.org/jira/browse/MESOS-6193
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5931) Support auto backend in Unified Containerizer.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5931:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 47, 
Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 41, 
Mesosphere Sprint 42, Mesosphere Sprint 47, Mesosphere Sprint 48)

> Support auto backend in Unified Containerizer.
> --
>
> Key: MESOS-5931
> URL: https://issues.apache.org/jira/browse/MESOS-5931
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: backend, containerizer, mesosphere
>
> Currently in Unified Containerizer, copy backend will be selected by default. 
> This is not ideal, especially for production environment. It would take a 
> long time to prepare an huge container image to copy it from the store to 
> provisioner.
> Ideally, we should support `auto backend`, which would 
> automatically/intelligently select the best/optimal backend for image 
> provisioner if user does not specify one from the agent flag.
> We should have a logic design first in this ticket, to determine how we want 
> to choose the right backend (e.g., overlayfs or aufs should be preferred if 
> available from the kernel).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6291) Add unit tests for nested container case for filesystem/linux isolator.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6291:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Add unit tests for nested container case for filesystem/linux isolator.
> ---
>
> Key: MESOS-6291
> URL: https://issues.apache.org/jira/browse/MESOS-6291
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Parameterize the existing tests so that all works for both top level 
> container and nested container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6857) Mesos master UI resources per role

2017-01-06 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806302#comment-15806302
 ] 

Charles Allen commented on MESOS-6857:
--

[~bmahler] cool thanks!

> Mesos master UI resources per role
> --
>
> Key: MESOS-6857
> URL: https://issues.apache.org/jira/browse/MESOS-6857
> Project: Mesos
>  Issue Type: Wish
>  Components: master, webui
>Reporter: Charles Allen
>
> Currently when viewing resources in the mesos master ui all resources are 
> jumbled together. This makes it challenging for operators to determine how 
> different roles are utilizing the cluster resources. This ask is that the 
> mesos master web ui have a per-role view of resources, similar in function to 
> the current global resource view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6857) Mesos master UI resources per role

2017-01-06 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806297#comment-15806297
 ] 

Benjamin Mahler commented on MESOS-6857:


[~drcrallen] We're planning to address this in the multi-role framework support 
by providing a top-level 'roles' tab that will give a role-oriented view. 
[~guoger] will send out a proposed design so keep and eye out and share any 
feedback you have.

> Mesos master UI resources per role
> --
>
> Key: MESOS-6857
> URL: https://issues.apache.org/jira/browse/MESOS-6857
> Project: Mesos
>  Issue Type: Wish
>  Components: master, webui
>Reporter: Charles Allen
>
> Currently when viewing resources in the mesos master ui all resources are 
> jumbled together. This makes it challenging for operators to determine how 
> different roles are utilizing the cluster resources. This ask is that the 
> mesos master web ui have a per-role view of resources, similar in function to 
> the current global resource view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6836) Immediately failing tasks show incomplete logs in the sandbox

2017-01-06 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-6836:
--

Assignee: Kevin Klues

> Immediately failing tasks show incomplete logs in the sandbox
> -
>
> Key: MESOS-6836
> URL: https://issues.apache.org/jira/browse/MESOS-6836
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Joseph Wu
>Assignee: Kevin Klues
>  Labels: mesosphere
>
> I started a master with default settings:
> {code}
> src/mesos-master --work_dir=/tmp/master
> {code}
> And an agent with default settings (on OSX and CentOS 7)
> {code}
> sudo src/mesos-agent --work_dir=/tmp/agent --master=...
> {code}
> Then I ran a task which I expect to fail immediately:
> {code}
> src/mesos-execute --master=... --name=fail --command=asdf
> {code}
> When I look inside the sandbox, I see a {{stderr}} like this:
> {code}
> @   0x4156be _Abort()
> @   0x4156fc _Abort()
> {code}
> The stack trace is apparently clipped.  I have a hunch (insubstantiated) that 
> this output clipping is due to the IO Switchboard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6874) Agent silently ignores FS isolation when protobuf is malformed

2017-01-06 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-6874:
---

Assignee: Gilbert Song

> Agent silently ignores FS isolation when protobuf is malformed
> --
>
> Key: MESOS-6874
> URL: https://issues.apache.org/jira/browse/MESOS-6874
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.0
>Reporter: Michael Gummelt
>Assignee: Gilbert Song
>  Labels: newbie
>
> cc [~vinodkone]
> I accidentally set my Mesos ContainerInfo to include a DockerInfo instead of 
> a MesosInfo:
> {code}
> executorInfoBuilder.setContainer(
>  Protos.ContainerInfo.newBuilder()
>  .setType(Protos.ContainerInfo.Type.MESOS)
>  .setDocker(Protos.ContainerInfo.DockerInfo.newBuilder()
>  
> .setImage(podSpec.getContainer().get().getImageName()))
> {code}
> I would have expected a validation error before or during containerization, 
> but instead, the agent silently decided to ignore filesystem isolation 
> altogether, and launch my executor on the host filesystem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6875) Copy backend fails to copy container

2017-01-06 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-6875:
---

Assignee: Gilbert Song

> Copy backend fails to copy container
> 
>
> Key: MESOS-6875
> URL: https://issues.apache.org/jira/browse/MESOS-6875
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Affects Versions: 1.1.0
>Reporter: Michael Gummelt
>Assignee: Gilbert Song
>
> cc [~gilbert]
> I get the following error when trying to launch a custom executor in 
> mgummelt/couchbase:latest (which is just ubuntu:14.04 with {{erl}} installed).
> {code}
> E0106 19:43:18.759450  3597 slave.cpp:4562] Container 
> 'c1958040-3ca0-4d46-ab32-0c307919be9b' for executor 
> 'server__5cebe7d5-28c3-465c-a442-0ecd49364e62' of framework 
> dbf21cd6-e559-45cf-a159-704aa10d2482-0002 failed to start: Collect failed: 
> Failed to copy layer: cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Africa/Lusaka':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Africa/Mbabane':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/America/Curacao':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Katmandu':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Kuwait':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Thimphu':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Urumqi':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Atlantic/St_Helena':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Lord_Howe':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/North':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Sydney':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Tasmania':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Pacific/Easter':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Pacific/Saipan':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Zulu':
>  Too many levels of symbolic links
> cp: cannot stat 
> '/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/right/Africa/Lusaka':
>  Too many levels of symbolic links
> cp: cannot stat 
> 

[jira] [Updated] (MESOS-6874) Agent silently ignores FS isolation when protobuf is malformed

2017-01-06 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6874:
--
 Labels: newbie  (was: )
Component/s: containerization

> Agent silently ignores FS isolation when protobuf is malformed
> --
>
> Key: MESOS-6874
> URL: https://issues.apache.org/jira/browse/MESOS-6874
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.0
>Reporter: Michael Gummelt
>  Labels: newbie
>
> cc [~vinodkone]
> I accidentally set my Mesos ContainerInfo to include a DockerInfo instead of 
> a MesosInfo:
> {code}
> executorInfoBuilder.setContainer(
>  Protos.ContainerInfo.newBuilder()
>  .setType(Protos.ContainerInfo.Type.MESOS)
>  .setDocker(Protos.ContainerInfo.DockerInfo.newBuilder()
>  
> .setImage(podSpec.getContainer().get().getImageName()))
> {code}
> I would have expected a validation error before or during containerization, 
> but instead, the agent silently decided to ignore filesystem isolation 
> altogether, and launch my executor on the host filesystem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6875) Copy backend fails to copy container

2017-01-06 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6875:
--

 Summary: Copy backend fails to copy container
 Key: MESOS-6875
 URL: https://issues.apache.org/jira/browse/MESOS-6875
 Project: Mesos
  Issue Type: Bug
  Components: agent, containerization
Affects Versions: 1.1.0
Reporter: Michael Gummelt


cc [~gilbert]

I get the following error when trying to launch a custom executor in 
mgummelt/couchbase:latest (which is just ubuntu:14.04 with {{erl}} installed).

{code}
E0106 19:43:18.759450  3597 slave.cpp:4562] Container 
'c1958040-3ca0-4d46-ab32-0c307919be9b' for executor 
'server__5cebe7d5-28c3-465c-a442-0ecd49364e62' of framework 
dbf21cd6-e559-45cf-a159-704aa10d2482-0002 failed to start: Collect failed: 
Failed to copy layer: cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Africa/Lusaka':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Africa/Mbabane':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/America/Curacao':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Katmandu':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Kuwait':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Thimphu':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Urumqi':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Atlantic/St_Helena':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Lord_Howe':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/North':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Sydney':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Tasmania':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Pacific/Easter':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Pacific/Saipan':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Zulu':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/right/Africa/Lusaka':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/right/Africa/Mbabane':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/right/America/Curacao':
 Too many levels of symbolic 

[jira] [Created] (MESOS-6874) Agent silently ignores FS isolation when protobuf is malformed

2017-01-06 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6874:
--

 Summary: Agent silently ignores FS isolation when protobuf is 
malformed
 Key: MESOS-6874
 URL: https://issues.apache.org/jira/browse/MESOS-6874
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Michael Gummelt


cc [~vinodkone]

I accidentally set my Mesos ContainerInfo to include a DockerInfo instead of a 
MesosInfo:

{code}
executorInfoBuilder.setContainer(
 Protos.ContainerInfo.newBuilder()
 .setType(Protos.ContainerInfo.Type.MESOS)
 .setDocker(Protos.ContainerInfo.DockerInfo.newBuilder()
 .setImage(podSpec.getContainer().get().getImageName()))
{code}

I would have expected a validation error before or during containerization, but 
instead, the agent silently decided to ignore filesystem isolation altogether, 
and launch my executor on the host filesystem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6631) Disallow frameworks from modifying FrameworkInfo.roles.

2017-01-06 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6631:

Sprint: Mesosphere Sprint 49

> Disallow frameworks from modifying FrameworkInfo.roles.
> ---
>
> Key: MESOS-6631
> URL: https://issues.apache.org/jira/browse/MESOS-6631
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>
> In "phase 1" of the multi-role framework support, we want to preserve the 
> existing behavior of single-role framework support in that we disallow 
> frameworks from modifying their role.
> With multi-role framework support, we will initially disallow frameworks from 
> modifying the roles field. Note that in the case that the master has failed 
> over but the framework hasn't re-registered yet, we will use the framework 
> info from the agents to disallow changes to the roles field. We will treat 
> {{FrameworkInfo.roles}} as a set rather than a list, so ordering does not 
> matter for equality.
> One difference between {{role}} and {{roles}} is that for {{role}} 
> modification, we ignore it. But, with {{roles}} modification, since this is a 
> new feature, we can disallow it by rejecting the framework subscription.
> Later, in phase 2, we will allow frameworks to modify their roles, see 
> MESOS-6627.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6866) Mesos agent not checking IDs before using them as part of the paths

2017-01-06 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805307#comment-15805307
 ] 

Yan Xu commented on MESOS-6866:
---

So to do this I need move things around a bit: pull some of the validation 
that's not strictly "master only" to "common/validation.hpp|cpp" so it can be 
used by the agent as well.

> Mesos agent not checking IDs before using them as part of the paths
> ---
>
> Key: MESOS-6866
> URL: https://issues.apache.org/jira/browse/MESOS-6866
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Yan Xu
>Assignee: Yan Xu
>
> Various IDs are used in Mesos, some assigned by the master (AgentID, 
> FrameworkID, etc) and some created by the frameworks (TaskID, ExecutorID etc).
> The master does sufficient validation on the IDs supplied by the frameworks 
> and the agent currently just trusts that the IDs are valid because they have 
> been validated. 
> The problem is that currently any entity can spoof as the master to inject 
> certain actions on the agent which can be executed as "root" and inflict harm 
> on the system. The "right" long term fix is of course to prevent this from 
> happening but as a short-term defensive measure we can insert some hard 
> CHECKs on the validity of the IDs in the agent code paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6003) Add logging module for logging to an external program

2017-01-06 Thread Joel Wilsson (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805126#comment-15805126
 ] 

Joel Wilsson commented on MESOS-6003:
-

Looks good to me as an outsider, but most of the code changes ended up in the 
documentation review request.

> Add logging module for logging to an external program
> -
>
> Key: MESOS-6003
> URL: https://issues.apache.org/jira/browse/MESOS-6003
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Will Rouesnel
>Assignee: Will Rouesnel
>Priority: Minor
>
> In the vein of the logrotate module for logging, there should be a similar 
> module which provides support for logging to an arbitrary log handling 
> program, with suitable task metadata provided by environment variables or 
> command line arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6860) Some tests use CHECK instead of ASSERT

2017-01-06 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6860:

Sprint: Mesosphere Sprint 49
Labels: mesosphere newbie  (was: newbie)

> Some tests use CHECK instead of ASSERT
> --
>
> Key: MESOS-6860
> URL: https://issues.apache.org/jira/browse/MESOS-6860
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere, newbie
>
> Some tests check preconditions with {{CHECK}} instead of e.g., 
> {{ASSERT_TRUE}}. When such a check fails it leads to a undesirable complete 
> abort of the test run, potentially dumping core. We should make sure tests 
> check preconditions in a proper way, e.g., with {{ASSERT_TRUE}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6860) Some tests use CHECK instead of ASSERT

2017-01-06 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6860:

Assignee: Benjamin Bannier

Reviews:

https://reviews.apache.org/r/55268/
https://reviews.apache.org/r/55269/

> Some tests use CHECK instead of ASSERT
> --
>
> Key: MESOS-6860
> URL: https://issues.apache.org/jira/browse/MESOS-6860
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: newbie
>
> Some tests check preconditions with {{CHECK}} instead of e.g., 
> {{ASSERT_TRUE}}. When such a check fails it leads to a undesirable complete 
> abort of the test run, potentially dumping core. We should make sure tests 
> check preconditions in a proper way, e.g., with {{ASSERT_TRUE}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6873) Replace actor queue size gauge with pair of counters

2017-01-06 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li updated MESOS-6873:
-
Description: 
This is necessary to ensure that when the corresponding actor is overloaded, 
`/metrics/snapshot` still finishes and not blocked infinitely, or seeing 
corresponding metrics timeout.

Quoting [~bmahler] from email:

{quote}
There are instances of Gauges that might be better represented as
Counters. For example, we expose the actor queue sizes using a gauge (known
to be unfortunate!), when instead we could expose two counters for
"enqueued" and "dequeued" messages and infer size from these. We can also
add the ability for callers to manually increment and decrement their
Gauges rather than go through a dispatch.
{quote}

  was:
This is necessary to ensure that when the corresponding actor is overloaded, 
`/metrics/snapshot` still finishes and not blocked infinitely, or seeing 
corresponding metrics timeout.

Quoting [~bmahler] from email:

bq.  There are instances of Gauges that might be better represented as
Counters. For example, we expose the actor queue sizes using a gauge (known
to be unfortunate!), when instead we could expose two counters for
"enqueued" and "dequeued" messages and infer size from these. We can also
add the ability for callers to manually increment and decrement their
Gauges rather than go through a dispatch.


> Replace actor queue size gauge with pair of counters
> 
>
> Key: MESOS-6873
> URL: https://issues.apache.org/jira/browse/MESOS-6873
> Project: Mesos
>  Issue Type: Bug
>Reporter: Zhitao Li
>
> This is necessary to ensure that when the corresponding actor is overloaded, 
> `/metrics/snapshot` still finishes and not blocked infinitely, or seeing 
> corresponding metrics timeout.
> Quoting [~bmahler] from email:
> {quote}
> There are instances of Gauges that might be better represented as
> Counters. For example, we expose the actor queue sizes using a gauge (known
> to be unfortunate!), when instead we could expose two counters for
> "enqueued" and "dequeued" messages and infer size from these. We can also
> add the ability for callers to manually increment and decrement their
> Gauges rather than go through a dispatch.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6873) Replace actor queue size gauge with pair of counters

2017-01-06 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-6873:


 Summary: Replace actor queue size gauge with pair of counters
 Key: MESOS-6873
 URL: https://issues.apache.org/jira/browse/MESOS-6873
 Project: Mesos
  Issue Type: Bug
Reporter: Zhitao Li


This is necessary to ensure that when the corresponding actor is overloaded, 
`/metrics/snapshot` still finishes and not blocked infinitely, or seeing 
corresponding metrics timeout.

Quoting [~bmahler] from email:

bq.  There are instances of Gauges that might be better represented as
Counters. For example, we expose the actor queue sizes using a gauge (known
to be unfortunate!), when instead we could expose two counters for
"enqueued" and "dequeued" messages and infer size from these. We can also
add the ability for callers to manually increment and decrement their
Gauges rather than go through a dispatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6872) Document timeout and gauge usage for `/metrics/snapshot`

2017-01-06 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-6872:


 Summary: Document timeout and gauge usage for `/metrics/snapshot`
 Key: MESOS-6872
 URL: https://issues.apache.org/jira/browse/MESOS-6872
 Project: Mesos
  Issue Type: Documentation
Reporter: Zhitao Li
Assignee: Zhitao Li
Priority: Minor


Quoting [~bmahler]:

/quote
The /metrics endpoint exposes a timeout parameter if you want to receive a
response with all of the metrics that were available within the timeout,
e.g. /metrics/snapshot.json?timeout=10secs
/quote

We should document this clearly in `metrics/snapshot` endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6320) Implement clang-tidy check to catch incorrect flags hierarchies

2017-01-06 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6320:

Shepherd: Michael Park

> Implement clang-tidy check to catch incorrect flags hierarchies
> ---
>
> Key: MESOS-6320
> URL: https://issues.apache.org/jira/browse/MESOS-6320
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: clang-tidy, mesosphere
>
> Classes need to always use {{virtual}} inheritance when being derived from 
> {{FlagsBase}}. Also, in order to compose such derived flags they should be 
> inherited virtually again.
> Some examples:
> {code}
> struct A : virtual FlagsBase {}; // OK
> struct B : FlagsBase {}; // ERROR
> struct C : A {}; // ERROR
> {code}
> We should implement a clang-tidy checker to catch such wrong inheritance 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6824) mesos-this-capture clang-tidy check has false positives

2017-01-06 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6824:

Shepherd: Michael Park

> mesos-this-capture clang-tidy check has false positives
> ---
>
> Key: MESOS-6824
> URL: https://issues.apache.org/jira/browse/MESOS-6824
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: clang-tidy
>
> The {{mesos-this-capture}} clang-tidy checks incorrectly triggers on the code 
> here,
>   
> https://github.com/apache/mesos/blob/d2117362349ab4c383045720f77d42b2d9fd6871/src/slave/containerizer/mesos/io/switchboard.cpp#L1487
> We should tighten the matcher to avoid triggering on such constructs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6871) Scheme handling in libprocess URL::parse()

2017-01-06 Thread Ilya Pronin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Pronin updated MESOS-6871:
---
Description: 
{{process::http::URL::parse()}} can mistake the host part for the scheme 
because of unsuitable use of {{std::string::find_first_of()}} method which 
looks for the first character equal to one of the characters in the given 
sequence.

E.g. {{URL::parse("http/abcdef")}} will construct a {{URL}} that represents 
{{http://cdef/}}.

Review request: https://reviews.apache.org/r/55177/

  was:
{{process::http::URL::parse()}} can mistake the host part for the scheme 
because of unsuitable use of {{std::string::find_first_of()}} method which 
looks for the first character equal to one of the characters in the given 
sequence.

E.g. {{URL::parse("http/abcdef")}} will construct an {{URL}} that represents 
{{http://cdef/}}.

Review request: https://reviews.apache.org/r/55177/


> Scheme handling in libprocess URL::parse()
> --
>
> Key: MESOS-6871
> URL: https://issues.apache.org/jira/browse/MESOS-6871
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.1.0
>Reporter: Ilya Pronin
>Assignee: Ilya Pronin
>
> {{process::http::URL::parse()}} can mistake the host part for the scheme 
> because of unsuitable use of {{std::string::find_first_of()}} method which 
> looks for the first character equal to one of the characters in the given 
> sequence.
> E.g. {{URL::parse("http/abcdef")}} will construct a {{URL}} that represents 
> {{http://cdef/}}.
> Review request: https://reviews.apache.org/r/55177/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6871) Scheme handling in libprocess URL::parse()

2017-01-06 Thread Ilya Pronin (JIRA)
Ilya Pronin created MESOS-6871:
--

 Summary: Scheme handling in libprocess URL::parse()
 Key: MESOS-6871
 URL: https://issues.apache.org/jira/browse/MESOS-6871
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 1.1.0
Reporter: Ilya Pronin
Assignee: Ilya Pronin


{{process::http::URL::parse()}} can mistake the host part for the scheme 
because of unsuitable use of {{std::string::find_first_of()}} method which 
looks for the first character equal to one of the characters in the given 
sequence.

E.g. {{URL::parse("http/abcdef")}} will construct an {{URL}} that represents 
{{http://cdef/}}.

Review request: https://reviews.apache.org/r/55177/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)