[jira] [Updated] (MESOS-6304) Add authentication support to the default executor

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6304:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Add authentication support to the default executor
> --
>
> Key: MESOS-6304
> URL: https://issues.apache.org/jira/browse/MESOS-6304
> Project: Mesos
>  Issue Type: Improvement
>  Components: executor, modules, security
>Reporter: Galen Pewtherer
>Assignee: Greg Mann
>  Labels: executor, mesosphere, module, security
>
> The V1 executors should be updated to authenticate with the agent when HTTP 
> executor authentication is enabled. This will be hard-coded into the executor 
> library for the MVP, and it can be refactored into an {{HttpAuthenticatee}} 
> module later. The executor must:
> * load a JWT from its environment, if present
> * decorate its requests with an {{Authorization}} header containing the JWT



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6280) Task group executor should support command health checks.

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6280:
-
Sprint: Mesosphere Sprint 49, Mesosphere Sprint 50, Mesosphere Sprint 51, 
Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 49, 
Mesosphere Sprint 50, Mesosphere Sprint 51, Mesosphere Sprint 52)

> Task group executor should support command health checks.
> -
>
> Key: MESOS-6280
> URL: https://issues.apache.org/jira/browse/MESOS-6280
> Project: Mesos
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>Priority: Critical
>  Labels: health-check, mesosphere
>
> Currently, the default (aka pod) executor supports only HTTP and TCP health 
> checks. We should also support command health checks as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7197) Requesting tiny amount of CPU crashes master

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-7197:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Requesting tiny amount of CPU crashes master
> 
>
> Key: MESOS-7197
> URL: https://issues.apache.org/jira/browse/MESOS-7197
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.1.0, 1.2.0
> Environment: Ubuntu 14.04, using Mesosphere PPA to install Mesos
>Reporter: Bruce Merry
>Assignee: Neil Conway
>Priority: Critical
>
> If a task is submitted with a tiny CPU request e.g. 0.0004, then when it 
> completes the master crashes due to a CHECK failure:
> {noformat}
> F0302 10:48:26.654909 15391 sorter.cpp:291] Check failed: 
> allocations[name].resources[slaveId].contains(resources) 
> {noformat}
> I can reproduce this with the following command:
> {noformat}
> mesos-execute --command='sleep 5' --master=$MASTER --name=crashtest 
> --resources='cpus:0.0004;mem:128'
> {noformat}
> If I replace 0.0004 with 0.001 the issue no longer occurs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6892) Reconsider process creation primitives on Windows

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6892:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Reconsider process creation primitives on Windows
> -
>
> Key: MESOS-6892
> URL: https://issues.apache.org/jira/browse/MESOS-6892
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: microsoft
>
> Windows does not have the same notions of process hierarchies as Unix, and so 
> killing groups of processes requires us to make sure all processes are 
> contained in a job object, which acts something like a cgroup. This is 
> particularly important when we decide to kill a task, as there is no way to 
> reliably do this unless all the processes you'd like to kill are in the job 
> object.
> This causes us a number of issues; it is a big reason we needed to fork the 
> command executor, and it is the reason tasks are currently unkillable in the 
> default executor.
> As we clean this issue up, we need to think carefully about the process 
> governance semantics of Mesos, and how we can map them to a reliable, simple 
> Windows implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7047) Update agent for hierarchical roles.

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-7047:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Update agent for hierarchical roles.
> 
>
> Key: MESOS-7047
> URL: https://issues.apache.org/jira/browse/MESOS-7047
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Neil Conway
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> Agents use the role name in the file system path for persistent volumes: a 
> persistent volume is written to 
> {{work_dir/volumes/roles//}}. When using 
> hierarchical roles, {{role-name}} might contain slashes. It seems like there 
> are three options here:
> # When converting the role name into the file system path, escape any slashes 
> that appear.
> # Hash the role name before using it in the file system path.
> # Create a directory hierarchy that corresponds to the nesting in the role 
> name. So a volume for role {{a/b/c/d}} would be stored in 
> {{roles/a/b/c/d/}}.
> If we adopt #3, we'd probably also want to cleanup the filesystem when a 
> volume is removed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6998) Add authentication support to agent's '/v1/executor' endpoint

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6998:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Add authentication support to agent's '/v1/executor' endpoint
> -
>
> Key: MESOS-6998
> URL: https://issues.apache.org/jira/browse/MESOS-6998
> Project: Mesos
>  Issue Type: Task
>  Components: agent, security
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: agent, executor, mesosphere, security
>
> The new agent flag {{--authenticate_http_executors}} must be added. When set, 
> it will require that requests received on the {{/v1/executor}} endpoint be 
> authenticated, and the default JWT authenticator will be loaded. Note that 
> this will require the addition of a new authentication realm for that 
> endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7216) Delayed executor termination leads to test failures

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-7216:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Delayed executor termination leads to test failures
> ---
>
> Key: MESOS-7216
> URL: https://issues.apache.org/jira/browse/MESOS-7216
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: debugging, mesosphere
>
> This bug came up during the development of a test for the new COMMAND health 
> checks that use nested containers. The test can be found here: 
> https://reviews.apache.org/r/55901/.
> The test setup was explained in MESOS-7050:
> 1) Start the scheduler driver
> 2) Launch a task group with the default executor that includes a single long 
> running task with a COMMAND health check
> 3) Wait for the task to return a status of HEALTHY one time
> 4) Stop the scheduler driver without explicitly waiting for any of the tasks 
> to complete
> 5) Wait for all the containers to complete
> 6) Exit the test
> With this setup, all of the ASSERTS in the test itself pass, but the test 
> failed because there were remaining processes once the test exited (after a 
> timeout of 15 seconds):
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from HealthCheckTest
> [ RUN  ] HealthCheckTest.DefaultExecutorCommandHealthCheck
> I0228 14:29:19.078368 3475919808 cluster.cpp:160] Creating default 'local' 
> authorizer
> I0228 14:29:19.084883 238907392 master.cpp:383] Master 
> 98c48dab-fd2b-404e-85dc-4ec5dd0d635c (172.18.8.139) started on 
> 172.18.8.139:55836
> I0228 14:29:19.084915 238907392 master.cpp:385] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/Users/gaston/mesos/master/share/mesos/webui" 
> --work_dir="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/master"
>  --zk_session_timeout="10secs"
> I0228 14:29:19.086030 238907392 master.cpp:435] Master only allowing 
> authenticated frameworks to register
> I0228 14:29:19.086041 238907392 master.cpp:449] Master only allowing 
> authenticated agents to register
> I0228 14:29:19.086046 238907392 master.cpp:462] Master only allowing 
> authenticated HTTP frameworks to register
> I0228 14:29:19.086050 238907392 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/credentials'
> I0228 14:29:19.086334 238907392 master.cpp:507] Using default 'crammd5' 
> authenticator
> I0228 14:29:19.086369 238907392 authenticator.cpp:519] Initializing server 
> SASL
> I0228 14:29:19.100981 238907392 auxprop.cpp:73] Initialized in-memory 
> auxiliary property plugin
> I0228 14:29:19.101080 238907392 http.cpp:933] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0228 14:29:19.101274 238907392 http.cpp:933] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0228 14:29:19.101414 238907392 http.cpp:933] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0228 14:29:19.101528 238907392 master.cpp:587] Authorization enabled
> I0228 14:29:19.101702 240517120 hierarchical.cpp:161] Initialized 
> hierarchical allocator process
> I0228 14:29:19.101740 239443968 whitelist_watcher.cpp:77] No whitelist given
> I0228 14:29:19.105717 240517120 master.cpp:2122] Elected as the leading 
> master!
> I0228 14:29:19.105738 240517120 master.cpp:1646] Recovering from registrar
> 

[jira] [Updated] (MESOS-7106) Test ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1 segfaults

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-7106:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Test ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1 segfaults
> 
>
> Key: MESOS-7106
> URL: https://issues.apache.org/jira/browse/MESOS-7106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: centos7, SSL build
>Reporter: Benjamin Bannier
>Assignee: Joseph Wu
>  Labels: flaky-test, mesosphere, test
>
> {{ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1}} segfaulted 
> in our internal CI:
> {noformat}
> [ RUN  ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1
> W0210 03:08:05.018744  1020 process.cpp:3029] Attempted to spawn a process 
> (__http_connection__(1079)@10.168.212.35:42363) after finalizing libprocess!
> *** Aborted at 1486696085 (unix time) try "date -d @1486696085" if you are 
> using GNU date ***
> I0210 03:08:05.023609  6019 process.cpp:1246] libprocess is initialized on 
> 10.168.212.35:44850 with 8 worker threads
> I0210 03:08:05.024163  6019 cluster.cpp:160] Creating default 'local' 
> authorizer
> I0210 03:08:05.025065  1025 master.cpp:383] Master 
> 7adcbe15-38a9-4512-aa9c-8d5f7538e4ee (ip-10-168-212-35.ec2.internal) started 
> on 10.168.212.35:44850
> I0210 03:08:05.025089  1025 master.cpp:385] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/5DRa8u/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/5DRa8u/master" 
> --zk_session_timeout="10secs"
> I0210 03:08:05.025264  1025 master.cpp:435] Master only allowing 
> authenticated frameworks to register
> I0210 03:08:05.025276  1025 master.cpp:449] Master only allowing 
> authenticated agents to register
> I0210 03:08:05.025285  1025 master.cpp:462] Master only allowing 
> authenticated HTTP frameworks to register
> I0210 03:08:05.025293  1025 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/5DRa8u/credentials'
> I0210 03:08:05.025387  1025 master.cpp:507] Using default 'crammd5' 
> authenticator
> I0210 03:08:05.025441  1025 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0210 03:08:05.025512  1025 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0210 03:08:05.025560  1025 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0210 03:08:05.025619  1025 master.cpp:587] Authorization enabled
> I0210 03:08:05.025728  1023 hierarchical.cpp:161] Initialized hierarchical 
> allocator process
> I0210 03:08:05.025754  1027 whitelist_watcher.cpp:77] No whitelist given
> PC: @ 0x7f69d2296012 process::ProcessManager::spawn()
> *** SIGSEGV (@0x0) received by PID 6019 (TID 0x7f69c46d5700) from PID 0; 
> stack trace: ***
> @ 0x7f69c2408725 (unknown)
> I0210 03:08:05.026340  1023 master.cpp:2124] Elected as the leading master!
> I0210 03:08:05.026357  1023 master.cpp:1646] Recovering from registrar
> I0210 03:08:05.026406  1025 registrar.cpp:329] Recovering registrar
> @ 0x7f69c240d2f1 (unknown)
> @ 0x7f69c24011e8 (unknown)
> I0210 03:08:05.027294  1024 registrar.cpp:362] Successfully fetched the 
> registry (0B) in 865024ns
> I0210 03:08:05.027330  1024 registrar.cpp:461] Applied 1 operations in 
> 2848ns; attempting to update the registry
> @ 0x7f69d027b370 (unknown)
> I0210 03:08:05.028261  1028 registrar.cpp:506] Successfully updated the 
> registry in 916992ns
> I0210 03:08:05.028313  1028 

[jira] [Updated] (MESOS-6999) Add agent support for generating and passing executor secrets

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6999:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Add agent support for generating and passing executor secrets
> -
>
> Key: MESOS-6999
> URL: https://issues.apache.org/jira/browse/MESOS-6999
> Project: Mesos
>  Issue Type: Task
>  Components: agent, security
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: agent, executor, flags, mesosphere, security
>
> The agent must generate and pass executor secrets to all executors using the 
> V1 API. For MVP, the agent will have this behavior by default when compiled 
> with SSL support. To accomplish this, the agent must:
> * load the default {{SecretGenerator}} module
> * call the secret generator when launching an executor
> * pass the generated secret into the executor's environment



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7225) Tasks launched via the default executor cannot access disk resource volumes.

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-7225:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Tasks launched via the default executor cannot access disk resource volumes.
> 
>
> Key: MESOS-7225
> URL: https://issues.apache.org/jira/browse/MESOS-7225
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, when a task in a task group tries to access a volume specified in 
> disk resources (e.g., persistent volumes), it doesn't have access to them 
> since they are mounted in the root container (executor). This happens due to 
> there being no mechanism to specify resources for child containers yet. 
> Hence, by default any resources (e.g., disk) are added to the root container.
> A possible solution can be to set up the mapping manually by the default 
> executor using the {{SANDBOX_PATH}} volume source type giving child 
> containers access to the volume mounted in the parent container. This is at 
> best a workaround and the ideal solution would be tackled as part of 
> MESOS-7207.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6919) Libprocess reinit code leaks SSL server socket FD

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6919:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Libprocess reinit code leaks SSL server socket FD
> -
>
> Key: MESOS-6919
> URL: https://issues.apache.org/jira/browse/MESOS-6919
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.2.0
>Reporter: Greg Mann
>Assignee: Joseph Wu
>  Labels: libprocess, mesosphere, ssl
>
> After [this commit|https://github.com/apache/mesos/commit/789e9f7], it was 
> discovered that tests which use {{process::reinitialize}} to switch between 
> SSL and non-SSL modes will leak the file descriptor associated with the 
> server socket {{\_\_s\_\_}}. This can be reproduced by running the following 
> trivial test in repetition:
> {code}
> diff --git a/src/tests/scheduler_tests.cpp b/src/tests/scheduler_tests.cpp
> index 1ff423f..d5fd575 100644
> --- a/src/tests/scheduler_tests.cpp
> +++ b/src/tests/scheduler_tests.cpp
> @@ -1821,6 +1821,12 @@ INSTANTIATE_TEST_CASE_P(
>  #endif // USE_SSL_SOCKET
> +TEST_P(SchedulerSSLTest, LeakTest)
> +{
> +  ::sleep(1);
> +}
> +
> +
>  // Tests that a scheduler can subscribe, run a task, and then tear itself 
> down.
>  TEST_P(SchedulerSSLTest, RunTaskAndTeardown)
>  {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7011) Add a '--executor_secret_key' flag to the agent

2017-03-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-7011:
-
Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53  (was: Mesosphere Sprint 
52)

> Add a '--executor_secret_key' flag to the agent
> ---
>
> Key: MESOS-7011
> URL: https://issues.apache.org/jira/browse/MESOS-7011
> Project: Mesos
>  Issue Type: Task
>  Components: agent, security
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: agent, flags, mesosphere, security
>
> A new {{\-\-executor_secret_key}} flag should be added to the agent to allow 
> the operator to specify a secret file to be loaded into the default executor 
> JWT authenticator and SecretGenerator modules. This secret will be used to 
> generate default executor secrets when {{\-\-generate_executor_secrets}} is 
> set, and will be used to verify those secrets when 
> {{\-\-authenticate_http_executors}} is set.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7120) Add an Agent API call to cleanup nested container artifacts

2017-03-02 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-7120:
-
Shepherd: Alexander Rukletsov

> Add an Agent API call to cleanup nested container artifacts
> ---
>
> Key: MESOS-7120
> URL: https://issues.apache.org/jira/browse/MESOS-7120
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, containerization
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: debugging, health-check, mesosphere
>
> Executors and operators should be able to request the agent to cleanup all 
> the artifacts (i.e., the sandbox, runtime dirs, checkpointed info, etc.) 
> related to a nested container.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7004) Enable multiple HTTP authenticator modules

2017-03-02 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-7004:
-
Sprint: Mesosphere Sprint 50, Mesosphere Sprint 51  (was: Mesosphere Sprint 
50, Mesosphere Sprint 51, Mesosphere Sprint 52)

> Enable multiple HTTP authenticator modules
> --
>
> Key: MESOS-7004
> URL: https://issues.apache.org/jira/browse/MESOS-7004
> Project: Mesos
>  Issue Type: Task
>  Components: modules, security
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: libprocess, module, security
>
> To accommodate executor authentication, we will add support for the loading 
> of multiple authenticator modules. The {{--http_authenticators}} flag is 
> already set up for this, but we must relax the constraint in Mesos which 
> enforces just a single authenticator, and libprocess must implement this 
> infrastructure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6280) Task group executor should support command health checks.

2017-01-27 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6280:
-
Sprint: Mesosphere Sprint 49, Mesosphere Sprint 50  (was: Mesosphere Sprint 
49)

> Task group executor should support command health checks.
> -
>
> Key: MESOS-6280
> URL: https://issues.apache.org/jira/browse/MESOS-6280
> Project: Mesos
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>Priority: Critical
>  Labels: health-check, mesosphere
>
> Currently, the default (aka pod) executor supports only HTTP and TCP health 
> checks. We should also support command health checks as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6991) Change `Environment.Variable.Value` from required to optional

2017-01-27 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6991:
-
Sprint: Mesosphere Sprint 49, Mesosphere Sprint 50  (was: Mesosphere Sprint 
49)

> Change `Environment.Variable.Value` from required to optional
> -
>
> Key: MESOS-6991
> URL: https://issues.apache.org/jira/browse/MESOS-6991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>
> To prepare for future work which will enable the modular fetching of secrets, 
> we should change the {{Environment.Variable.Value}} field from {{required}} 
> to {{optional}}. This way, the field can be left empty and filled in by a 
> secret fetching module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6432) Roles with quota assigned can "game" the system to receive excessive resources.

2017-01-27 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6432:
-
Sprint: Mesosphere Sprint 49, Mesosphere Sprint 50  (was: Mesosphere Sprint 
49)

> Roles with quota assigned can "game" the system to receive excessive 
> resources.
> ---
>
> Key: MESOS-6432
> URL: https://issues.apache.org/jira/browse/MESOS-6432
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>Priority: Critical
>
> The current implementation of quota allocation attempts to satisfy each 
> resource quota for a role, but in doing so can far exceed the quota assigned 
> to the role.
> For example, if a role has quota for {{\[30,20,10\]}}, it can consume up to: 
> {{\[∞, ∞, 10\]}} or {{\[∞, 20, ∞\]}} or {{\[30, ∞, ∞\]}} as only once each 
> resource in the quota vector is satisfied do we stop allocating agent's 
> resources to the role!
> As a first step for preventing gaming, we could consider quota satisfied once 
> any of the resources in the vector has quota satisfied. This approach works 
> reasonably well for resources that are required and are present on every 
> agent (cpus, mem, disk). However, it doesn't work well for resources that are 
> optional / only present on some agents (e.g. gpus) (a.k.a. non-ubiquitous / 
> scarce resources). For this we would need to determine which agents have 
> resources that can satisfy the quota prior to performing the allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6936) Add support for media types needed for streaming request/responses.

2017-01-27 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6936:
-
Sprint: Mesosphere Sprint 49, Mesosphere Sprint 50  (was: Mesosphere Sprint 
49)

> Add support for media types needed for streaming request/responses.
> ---
>
> Key: MESOS-6936
> URL: https://issues.apache.org/jira/browse/MESOS-6936
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> As per the design document created as part of MESOS-3601, we need to add 
> support for the additional media types proposed to our API handlers for 
> supporting request streaming. These headers would also be used by the server 
> in the future for streaming responses.
> The following media types needed to be added:
> {{RecordIO-Accept}}: Enables the client to perform content negotiation for 
> the contents of the stream. The supported values for this header would be 
> {{application/json}} and {{application/x-protobuf}}.
> {{RecordIO-Content-Type}}: The content type of the RecordIO stream sent by 
> the server. The supported values for this header would be 
> {{application/json}} and {{application/x-protobuf}}.
> The {{Content-Type}} for the response would be {{application/recordio}}. For 
> more details/examples see the alternate proposal section of the design doc:
> https://docs.google.com/document/d/1OV1D5uUmWNvTaX3qEO9fZGo4FRlCSqrx0IHq5GuLAk8/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5931) Support auto backend in Unified Containerizer.

2017-01-27 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5931:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 47, 
Mesosphere Sprint 48, Mesosphere Sprint 49, Mesosphere Sprint 50  (was: 
Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 47, Mesosphere 
Sprint 48, Mesosphere Sprint 49)

> Support auto backend in Unified Containerizer.
> --
>
> Key: MESOS-5931
> URL: https://issues.apache.org/jira/browse/MESOS-5931
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: backend, containerizer, mesosphere
>
> Currently in Unified Containerizer, copy backend will be selected by default. 
> This is not ideal, especially for production environment. It would take a 
> long time to prepare an huge container image to copy it from the store to 
> provisioner.
> Ideally, we should support `auto backend`, which would 
> automatically/intelligently select the best/optimal backend for image 
> provisioner if user does not specify one from the agent flag.
> We should have a logic design first in this ticket, to determine how we want 
> to choose the right backend (e.g., overlayfs or aufs should be preferred if 
> available from the kernel).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6355) Improvements to task group support.

2017-01-27 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6355:
-
Sprint: Mesosphere Sprint 49, Mesosphere Sprint 50  (was: Mesosphere Sprint 
49)

> Improvements to task group support.
> ---
>
> Key: MESOS-6355
> URL: https://issues.apache.org/jira/browse/MESOS-6355
> Project: Mesos
>  Issue Type: Epic
>Reporter: Vinod Kone
>  Labels: mesosphere
>
> This is a follow up epic to MESOS-2249 to capture further improvements and 
> changes that need to be made to the MVP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6900) Add test for framework upgrading to multi-role capability.

2017-01-27 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6900:
-
Sprint: Mesosphere Sprint 49, Mesosphere Sprint 50  (was: Mesosphere Sprint 
49)

> Add test for framework upgrading to multi-role capability.
> --
>
> Key: MESOS-6900
> URL: https://issues.apache.org/jira/browse/MESOS-6900
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> Frameworks can upgrade to multi-role capability as long as the framework's 
> role remains the same.
> We consider the framework roles unchanged if 
> * a framework previously didn't specify a {{role}} now has {{roles=()}}, or
> * a framework which previously had {{role=A}} and now has {{roles=(A)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6286) Master does not remove an agent if it is responsive but not registered

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6286:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Master does not remove an agent if it is responsive but not registered
> --
>
> Key: MESOS-6286
> URL: https://issues.apache.org/jira/browse/MESOS-6286
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joseph Wu
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere
>
> As part of MESOS-6285, we observed an agent stuck in the recovery phase.  The 
> agent would do the following in a loop:
> # Systemd starts the agent.
> # The agent detects the master, but does not connect yet.  The agent needs to 
> recover first.
> # The agent responds to {{PingSlaveMessage}} from the master, but it is 
> stalled in recovery.
> # The agent is OOM-killed by the kernel before recovery finishes.  Repeat 
> (1-4).
> The consequences of this:
> * Frameworks will never get a TASK_LOST or terminal status update for tasks 
> on this agent.
> * Executors on the agent can connect to the agent, but will not be able to 
> register.
> We should consider adding some timeout/intervention in the master for 
> responsive, but non-recoverable agents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6664) Force cleanup of IOSwitchboard server if it does not terminate after the container terminates.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6664:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Force cleanup of IOSwitchboard server if it does not terminate after the 
> container terminates.
> --
>
> Key: MESOS-6664
> URL: https://issues.apache.org/jira/browse/MESOS-6664
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Kevin Klues
>
> In normal case, IOSwitchboard server will terminate after container 
> terminates. However, we should be more defensive and always cleanup the 
> IOSwitchboard server if it does not terminate within a reasonable grace 
> period. 
> The reason for the grace period is to allow the IOSwitchboard server to 
> finish redirecting the stdout/stderr to the logger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6419) The 'master/teardown' endpoint should support tearing down 'unregistered_frameworks'.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6419:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> The 'master/teardown' endpoint should support tearing down 
> 'unregistered_frameworks'.
> -
>
> Key: MESOS-6419
> URL: https://issues.apache.org/jira/browse/MESOS-6419
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.26.2, 0.27.3, 0.28.2, 1.0.1
>Reporter: Gilbert Song
>Assignee: Neil Conway
>Priority: Critical
>  Labels: endpoint, master
>
> This issue is exposed from 
> [MESOS-6400](https://issues.apache.org/jira/browse/MESOS-6400). When a user 
> is trying to tear down an 'unregistered_framework' from the 'master/teardown' 
> endpoint, a bad request will be returned: `No framework found with specified 
> ID`.
> Ideally, we should support tearing down an unregistered framework, since 
> those frameworks may occur due to network partition, then all the orphan 
> tasks still occupy the resources. It would be a nightmare if a user has to 
> wait until the unregistered framework to get those resources back.
> This may be the initial implementation: 
> https://github.com/apache/mesos/commit/bb8375975e92ee722befb478ddc3b2541d1ccaa9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6602) Shutdown completed frameworks when unreachable agent re-registers

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6602:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Shutdown completed frameworks when unreachable agent re-registers
> -
>
> Key: MESOS-6602
> URL: https://issues.apache.org/jira/browse/MESOS-6602
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> We currently shutdown completed frameworks when an agent re-registers with a 
> master that it is already registered with (MESOS-633). We should also 
> shutdown completed frameworks when an unreachable agent re-registers.
> This is distinct from the more difficult problem of shutting down completed 
> frameworks after master failover (MESOS-4659).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6040) Add a CMake build for `mesos-port-mapper`

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6040:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 48, 
Mesosphere Sprint 49  (was: Mesosphere Sprint 41, Mesosphere Sprint 42, 
Mesosphere Sprint 48)

> Add a CMake build for `mesos-port-mapper`
> -
>
> Key: MESOS-6040
> URL: https://issues.apache.org/jira/browse/MESOS-6040
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Once the port-mapper binary compiles with GNU make, we need to modify the 
> CMake to build the port-mapper binary as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6366) Design doc for executor authentication

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6366:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Design doc for executor authentication
> --
>
> Key: MESOS-6366
> URL: https://issues.apache.org/jira/browse/MESOS-6366
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6001) Aufs backend cannot support the image with numerous layers.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6001:
-
Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48)

> Aufs backend cannot support the image with numerous layers.
> ---
>
> Key: MESOS-6001
> URL: https://issues.apache.org/jira/browse/MESOS-6001
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any other os with aufs module
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: aufs, backend, containerizer
>
> This issue was exposed in this unit test 
> `ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller` by manually 
> specifying the `bind` backend. Most likely mounting the aufs with specific 
> options is limited by string length.
> {noformat}
> [20:13:07] :   [Step 10/10] [ RUN  ] 
> DockerRuntimeIsolatorTest.ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.615844 23416 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.624106 23416 leveldb.cpp:174] 
> Opened db in 8.148813ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627252 23416 leveldb.cpp:181] 
> Compacted db in 3.126629ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627275 23416 leveldb.cpp:196] 
> Created db iterator in 4410ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627282 23416 leveldb.cpp:202] 
> Seeked to beginning of db in 763ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627287 23416 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 491ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627301 23416 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627563 23434 recover.cpp:451] 
> Starting replica recovery
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627800 23437 recover.cpp:477] 
> Replica is in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628113 23431 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5852)@172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628243 23430 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628365 23437 recover.cpp:568] 
> Updating replica status to STARTING
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628744 23432 master.cpp:375] 
> Master dd755a55-0dd1-4d2d-9a49-812a666015cb (ip-172-30-2-138.mesosphere.io) 
> started on 172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628758 23432 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/OZHDIQ/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/OZHDIQ/master" --zk_session_timeout="10secs"
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628893 23432 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628900 23432 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628902 23432 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628906 23432 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/OZHDIQ/credentials'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628999 23432 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629041 23432 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629114 23432 

[jira] [Updated] (MESOS-6388) Report new PARTITION_AWARE task statuses in HTTP endpoints

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6388:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Report new PARTITION_AWARE task statuses in HTTP endpoints
> --
>
> Key: MESOS-6388
> URL: https://issues.apache.org/jira/browse/MESOS-6388
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> At a minimum, the {{/state-summary}} endpoint needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6654) Duplicate image layer ids may make the backend failed to mount rootfs.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6654:
-
Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48)

> Duplicate image layer ids may make the backend failed to mount rootfs.
> --
>
> Key: MESOS-6654
> URL: https://issues.apache.org/jira/browse/MESOS-6654
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: aufs, backend, containerizer
>
> Some images (e.g., 'mesosphere/inky') may contain duplicate layer ids in 
> manifest, which may cause some backends unable to mount the rootfs (e.g., 
> 'aufs' backend). We should make sure that each layer path returned in 
> 'ImageInfo' is unique.
> Here is an example manifest from 'mesosphere/inky':
> {noformat}
> [20:13:08]W:   [Step 10/10]"name": "mesosphere/inky",
> [20:13:08]W:   [Step 10/10]"tag": "latest",
> [20:13:08]W:   [Step 10/10]"architecture": "amd64",
> [20:13:08]W:   [Step 10/10]"fsLayers": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   }
> [20:13:08]W:   [Step 10/10]],
> [20:13:08]W:   [Step 10/10]"history": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> "{\"id\":\"e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6\",\"parent\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"created\":\"2014-08-15T00:31:36.407713553Z\",\"container\":\"5d55401ff99c7508c9d546926b711c78e3ccb36e39a848024b623b2aef4c2c06\",\"container_config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"/bin/sh\",\"-c\",\"#(nop)
>  ENTRYPOINT 
> [echo]\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"docker_version\":\"1.1.2\",\"author\":\"supp...@mesosphere.io\",\"config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"inky\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"architecture\":\"amd64\",\"os\":\"linux\",\"Size\":0}\n"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> 

[jira] [Updated] (MESOS-6805) Check unreachable task cache for task ID collisions on launch

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6805:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Check unreachable task cache for task ID collisions on launch
> -
>
> Key: MESOS-6805
> URL: https://issues.apache.org/jira/browse/MESOS-6805
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> As discussed in MESOS-6785, it is possible to crash the master by launching a 
> task that reuses the ID of an unreachable/partitioned task. A complete 
> solution to this problem will be quite involved, but an incremental 
> improvement is easy: when we see a task launch operation, reject the launch 
> attempt if the task ID collides with an ID in the per-framework 
> {{unreachableTasks}} cache. This doesn't catch all situations in which IDs 
> are reused, but it is better than nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6475) Mesos Container Attach/Exec Unit Tests

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6475:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Mesos Container Attach/Exec Unit Tests
> --
>
> Key: MESOS-6475
> URL: https://issues.apache.org/jira/browse/MESOS-6475
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Ideally, all unit tests should be written as the individual tasks that make 
> up this Epic are completed. However, sometime this doesn't always happen as 
> planned. 
> This ticket should not be closed and the Epic should not be considered 
> complete until all unit tests for all components have been written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6719) Unify "active" and "state"/"connected" fields in Master::Framework

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6719:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Unify "active" and "state"/"connected" fields in Master::Framework
> --
>
> Key: MESOS-6719
> URL: https://issues.apache.org/jira/browse/MESOS-6719
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> Rather than tracking whether a framework is "active" separately from whether 
> it is "connected", we should consider using a single "state" variable to 
> track the current state of the framework (connected-and-active, 
> connected-and-inactive, disconnected, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6653:
-
Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48)

> Overlayfs backend may fail to mount the rootfs if both container image and 
> image volume are specified.
> --
>
> Key: MESOS-6653
> URL: https://issues.apache.org/jira/browse/MESOS-6653
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: backend, containerizer, overlayfs
>
> Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting 
> arguments. However, if more than one image need to be provisioned (e.g., a 
> container image is specified while image volumes are specified for the same 
> container), the symlink .../backends/overlay/links would fail to be created 
> since it exists already.
> Here is a simple log when we hard code overlayfs as our default backend:
> {noformat}
> [07:02:45] :   [Step 10/10] [ RUN  ] 
> Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0
> [07:02:46] :   [Step 10/10] I1127 07:02:46.416021  2919 
> containerizer.cpp:207] Using isolation: 
> filesystem/linux,volume/image,docker/runtime,network/cni
> [07:02:46] :   [Step 10/10] I1127 07:02:46.419312  2919 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [07:02:46] :   [Step 10/10] E1127 07:02:46.425336  2919 shell.hpp:107] 
> Command 'hadoop version 2>&1' failed; this is the output:
> [07:02:46] :   [Step 10/10] sh: 1: hadoop: not found
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425379  2919 fetcher.cpp:69] 
> Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to 
> create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was 
> either not found or exited with a non-zero exit status: 127
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425452  2919 local_puller.cpp:94] 
> Creating local puller with docker registry '/tmp/R6OUei/registry'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427258  2934 
> containerizer.cpp:956] Starting container 
> 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of 
> framework 
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427592  2938 
> metadata_manager.cpp:167] Looking for image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427774  2936 local_puller.cpp:147] 
> Untarring image 'test_image_rootfs' from 
> '/tmp/R6OUei/registry/test_image_rootfs.tar' to 
> '/tmp/R6OUei/store/staging/9krDz2'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512070  2933 local_puller.cpp:167] 
> The repositories JSON file for image 'test_image_rootfs' is 
> '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512279  2933 local_puller.cpp:295] 
> Extracting layer tar ball 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar
>  to rootfs 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617442  2937 
> metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617908  2938 provisioner.cpp:286] 
> Image layers: 1
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617925  2938 provisioner.cpp:296] 
> Should hit here
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617949  2938 provisioner.cpp:315] 
> : bind
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617959  2938 provisioner.cpp:315] 
> : overlay
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617967  2938 provisioner.cpp:315] 
> : copy
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617974  2938 provisioner.cpp:318] 
> Provisioning image rootfs 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/rootfses/c71e83d2-5dbe-4eb7-a2fc-b8cc826771f7'
>  for container 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 using overlay backend
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618408  2936 overlay.cpp:175] 
> Created symlink 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/links'
>  -> '/tmp/DQ3blT'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618472  2936 overlay.cpp:203] 
> Provisioning image rootfs with overlayfs: 
> 

[jira] [Updated] (MESOS-6619) Duplicate elements in "completed_tasks"

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6619:
-
Sprint: Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 
48)

> Duplicate elements in "completed_tasks"
> ---
>
> Key: MESOS-6619
> URL: https://issues.apache.org/jira/browse/MESOS-6619
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Scenario:
> # Framework starts non-partition-aware task T on agent A
> # Agent A is partitioned. Task T is marked as a "completed task" in the 
> {{Framework}} struct of the master, as part of {{Framework::removeTask}}.
> # Agent A re-registers with the master. The tasks running on A are re-added 
> to their respective frameworks on the master as running tasks.
> # In {{Master::\_reregisterSlave}}, the master sends a 
> {{ShutdownFrameworkMessage}} for all non-partition-aware frameworks running 
> on the agent. The master then does {{removeTask}} for each task managed by 
> one of these frameworks, which results in calling {{Framework::removeTask}}, 
> which adds _another_ task to {{completed_tasks}}. Note that 
> {{completed_tasks}} does not attempt to detect/suppress duplicates, so this 
> results in two elements in the {{completed_tasks}} collection.
> Similar problems occur when a partition-aware task is running on a 
> partitioned agent that re-registers: the result is a task in the {{tasks}} 
> list _and_ a task in the {{completed_tasks}} list.
> Possible fixes/changes:
> * Adding a task to the {{completed_tasks}} list when an agent becomes 
> partitioned is debatable; certainly for partition-aware tasks, the task is 
> not "completed". We might consider adding an "{{unreachable_tasks}}" list to 
> the HTTP endpoints.
> * Regardless of whether we continue to use {{completed_tasks}} or add a new 
> collection, we should ensure the consistency of that data structure after 
> agent re-registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6292) Add unit tests for nested container case for docker/runtime isolator.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6292:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Add unit tests for nested container case for docker/runtime isolator.
> -
>
> Key: MESOS-6292
> URL: https://issues.apache.org/jira/browse/MESOS-6292
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Launch nested containers with different container images specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6504) Use 'geteuid()' for the root privileges check.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6504:
-
Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48)

> Use 'geteuid()' for the root privileges check.
> --
>
> Key: MESOS-6504
> URL: https://issues.apache.org/jira/browse/MESOS-6504
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: backend, isolator, mesosphere, user
>
> Currently, parts of code in Mesos check the root privileges using os::user() 
> to compare to "root", which is not sufficient, since it compares the real 
> user. When people change the mesos binary by 'setuid root', the process may 
> not have the right permission to execute.
> We should check the effective user id instead in our code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6335) Add user doc for task group tasks

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6335:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Vinod Kone
>Assignee: Gilbert Song
> Fix For: 1.2.0
>
>
> Committed some basic documentation. So moving this to pods-improvements epic 
> and targeting this for 1.2.0. I would like this to track the more 
> comprehensive documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6193) Make the docker/volume isolator nesting aware.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6193:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Make the docker/volume isolator nesting aware.
> --
>
> Key: MESOS-6193
> URL: https://issues.apache.org/jira/browse/MESOS-6193
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5931) Support auto backend in Unified Containerizer.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5931:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 47, 
Mesosphere Sprint 48, Mesosphere Sprint 49  (was: Mesosphere Sprint 41, 
Mesosphere Sprint 42, Mesosphere Sprint 47, Mesosphere Sprint 48)

> Support auto backend in Unified Containerizer.
> --
>
> Key: MESOS-5931
> URL: https://issues.apache.org/jira/browse/MESOS-5931
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: backend, containerizer, mesosphere
>
> Currently in Unified Containerizer, copy backend will be selected by default. 
> This is not ideal, especially for production environment. It would take a 
> long time to prepare an huge container image to copy it from the store to 
> provisioner.
> Ideally, we should support `auto backend`, which would 
> automatically/intelligently select the best/optimal backend for image 
> provisioner if user does not specify one from the agent flag.
> We should have a logic design first in this ticket, to determine how we want 
> to choose the right backend (e.g., overlayfs or aufs should be preferred if 
> available from the kernel).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6291) Add unit tests for nested container case for filesystem/linux isolator.

2017-01-06 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6291:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49  (was: 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47, Mesosphere Sprint 48)

> Add unit tests for nested container case for filesystem/linux isolator.
> ---
>
> Key: MESOS-6291
> URL: https://issues.apache.org/jira/browse/MESOS-6291
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Parameterize the existing tests so that all works for both top level 
> container and nested container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6475) Mesos Container Attach/Exec Unit Tests

2017-01-05 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6475:
-
Shepherd: Vinod Kone

> Mesos Container Attach/Exec Unit Tests
> --
>
> Key: MESOS-6475
> URL: https://issues.apache.org/jira/browse/MESOS-6475
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Ideally, all unit tests should be written as the individual tasks that make 
> up this Epic are completed. However, sometime this doesn't always happen as 
> planned. 
> This ticket should not be closed and the Epic should not be considered 
> complete until all unit tests for all components have been written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6640) mesos-local doesn't hande --work_dir correctly.

2016-11-28 Thread Artem Harutyunyan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702918#comment-15702918
 ] 

Artem Harutyunyan commented on MESOS-6640:
--

Thanks [~haosd...@gmail.com]!

> mesos-local doesn't hande --work_dir correctly.
> ---
>
> Key: MESOS-6640
> URL: https://issues.apache.org/jira/browse/MESOS-6640
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>  Labels: beginner, newbie
> Fix For: 1.2.0
>
>
> After {{work_dir}} became a required command line flag for {{mesos-agent}} 
> it's only possible to launch {{mesos-local}} if {{MESOS_WORK_DIR}} 
> environment variable is set.  Using {{work_dir}} that {{mesos-local}} 
> presumably allows to set does not work:
> {code}
> ~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
> I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
> I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
> I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
> I1124 13:26:42.617058 1064960 master.cpp:380] Master 
> 73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
> 10.204.3.193:5050
> I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="false" --authenticate_frameworks="false" 
> --authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticators="crammd5" 
> --authorizers="local" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
> --registry_max_agent_count="102400" --registry_store_timeout="20secs" 
> --registry_strict="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" 
> --webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
> --work_dir="/tmp/foo" --zk_session_timeout="10secs"
> I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response 
> from a replica in EMPTY status
> I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
> frameworks to register
> I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
> agents to register
> I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
> to register without authentication
> I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
> authenticator
> W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
> authentication requests will be refused
> I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
> Failed to start a local cluster while loading agent flags from the 
> environment: Flag 'work_dir' is required, but it was not provided
> ~/src/mesos-install  $
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:

{code}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{code}

  was:
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:

{quote}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:



{code}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{code}

  was:
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:

{code}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:

{quote}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{quote}

  was:
After {{work_dir}} was made required for {{mesos-agent}} it's only possible to 
launch {{mesos-local}} if MESOS_WORK_DIR environment variable is set. 

Using {{work_dir}} does not work:

{quote}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{work_dir}} was made required for {{mesos-agent}} it's only possible to 
launch {{mesos-local}} if MESOS_WORK_DIR environment variable is set. 

Using {{work_dir}} does not work:

{quote}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{quote}

  was:
After {{{--work_dir}}} was made required for {mesos-agent} it's only possible 
to launch {mesos-local} if MESOS_WORK_DIR environment variable is set. 

Using {--work_dir} does not work:

{{
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{{--work_dir}}} was made required for {mesos-agent} it's only possible 
to launch {mesos-local} if MESOS_WORK_DIR environment variable is set. 

Using {--work_dir} does not work:

{{
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
}}

  was:
After {{{--work_dir}}} was made required for {{{mesos-agent}}} it's only 
possible to launch {{{mesos-local}}} if MESOS_WORK_DIR environment variable is 
set. 

Using {{{--work_dir}}} does not work:

{{{
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 

[jira] [Created] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)
Artem Harutyunyan created MESOS-6640:


 Summary: mesos-local doesn't hande --work_dir correctly
 Key: MESOS-6640
 URL: https://issues.apache.org/jira/browse/MESOS-6640
 Project: Mesos
  Issue Type: Bug
Reporter: Artem Harutyunyan
 Fix For: 1.2.0


After {{{--work_dir}}} was made required for {{{mesos-agent}}} it's only 
possible to launch {{{mesos-local}}} if MESOS_WORK_DIR environment variable is 
set. 

Using {{{--work_dir}}} does not work:

{{{
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
}}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5966) Add libprocess HTTP tests with SSL support

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5966:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, Mesosphere 
Sprint 47  (was: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 
42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46)

> Add libprocess HTTP tests with SSL support
> --
>
> Key: MESOS-5966
> URL: https://issues.apache.org/jira/browse/MESOS-5966
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Libprocess contains SSL unit tests which test our SSL support using simple 
> sockets. We should add tests which also make use of libprocess's various HTTP 
> classes and helpers in a variety of SSL configurations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6395) HealthChecker sends updates to executor via libprocess messaging.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6395:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> HealthChecker sends updates to executor via libprocess messaging.
> -
>
> Key: MESOS-6395
> URL: https://issues.apache.org/jira/browse/MESOS-6395
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently {{HealthChecker}} sends status updates via libprocess messaging to 
> the executor's UPID. This seems unnecessary after refactoring health checker 
> into the library: a simple callback will do. Moreover, not requiring 
> executor's {{UPID}} will simplify creating a mocked {{HealthChecker}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6477) Build a standalone python client for connecting to our Mock HTTP Server that implements the new Debug APIs

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6477:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Build a standalone python client for connecting to our Mock HTTP Server that 
> implements the new Debug APIs
> --
>
> Key: MESOS-6477
> URL: https://issues.apache.org/jira/browse/MESOS-6477
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Steven Locke
>  Labels: debugging, mesosphere
>
> This client prototype should have a similar CLI to what we eventually want to 
> build into the Mesos or DC/OS CLI.
> {noformat}
> Streaming HTTP Client
> Usage:
>   client task exec [--tty] [--interactive]   [...]
>   client task attach [--tty] [--interactive] 
> Options:
>   --tty  Allocate a tty on the server before
>  attaching to the container.
>   --interactive  Connect the stdin of the client to
>  the stdin of the container.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6366) Design doc for executor authentication

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6366:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Design doc for executor authentication
> --
>
> Key: MESOS-6366
> URL: https://issues.apache.org/jira/browse/MESOS-6366
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6476) Build a Mock HTTP Server that implements the new Debugging API calls

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6476:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Build a Mock HTTP Server that implements the new Debugging API calls
> 
>
> Key: MESOS-6476
> URL: https://issues.apache.org/jira/browse/MESOS-6476
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Steven Locke
>  Labels: debugging, mesosphere
>
> The mock server should simply launch a process to run whatever command is 
> passed to it, rather than attempt to launch an actual nested container in 
> mesos. However, it should do everything necessary to deal with attaching a 
> {{pty}}  / redirecting {{stdin/stdout/stderr}} properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6335) Add user doc for task group tasks

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6335:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Gilbert Song
>
> Committed some basic documentation. So moving this to pods-improvements epic 
> and targeting this for 1.2.0. I would like this to track the more 
> comprehensive documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3753:
-
Sprint: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, 
Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere 
Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 39, Mesosphere Sprint 
40, Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 44, 
Mesosphere Sprint 45, Mesosphere Sprint 46)

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5900) Support Unix domain socket connections in libprocess

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5900:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Support Unix domain socket connections in libprocess
> 
>
> Key: MESOS-5900
> URL: https://issues.apache.org/jira/browse/MESOS-5900
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Benjamin Hindman
>  Labels: mesosphere
>
> We should consider allowing two programs on the same host using libprocess to 
> communicate via Unix domain sockets rather than TCP. This has a few 
> advantages:
> * Security: remote hosts cannot connect to the Unix socket. Domain sockets 
> also offer additional support for 
> [authentication|https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Authentication-UNIX_Domain.html].
> * Performance: domain sockets are marginally faster than localhost TCP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6193) Make the docker/volume isolator nesting aware.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6193:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Make the docker/volume isolator nesting aware.
> --
>
> Key: MESOS-6193
> URL: https://issues.apache.org/jira/browse/MESOS-6193
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6466) Add support for streaming HTTP requests in Mesos

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6466:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Add support for streaming HTTP requests in Mesos
> 
>
> Key: MESOS-6466
> URL: https://issues.apache.org/jira/browse/MESOS-6466
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Anand Mazumdar
>  Labels: debugging, mesosphere
>
> We already have support for streaming HTTP responses in Mesos. We now also 
> need to add support for streaming HTTP requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6291) Add unit tests for nested container case for filesystem/linux isolator.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6291:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Add unit tests for nested container case for filesystem/linux isolator.
> ---
>
> Key: MESOS-6291
> URL: https://issues.apache.org/jira/browse/MESOS-6291
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Parameterize the existing tests so that all works for both top level 
> container and nested container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6292) Add unit tests for nested container case for docker/runtime isolator.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6292:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Add unit tests for nested container case for docker/runtime isolator.
> -
>
> Key: MESOS-6292
> URL: https://issues.apache.org/jira/browse/MESOS-6292
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Launch nested containers with different container images specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5597) Document Mesos "health check" feature.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5597:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Document Mesos "health check" feature.
> --
>
> Key: MESOS-5597
> URL: https://issues.apache.org/jira/browse/MESOS-5597
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>  Labels: documentation, health-check, mesosphere
>
> We don't talk about this feature at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5963) HealthChecker should not decide when to kill tasks and when to stop performing health checks.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5963:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> HealthChecker should not decide when to kill tasks and when to stop 
> performing health checks.
> -
>
> Key: MESOS-5963
> URL: https://issues.apache.org/jira/browse/MESOS-5963
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently, {{HealthChecker}} library decides when a task should be killed 
> based on its health status. Moreover, it stops checking it health after that. 
> This seems unfortunate, because it's up to the executor and / or framework to 
> decide both when to kill tasks and when to health check them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5856) Logrotate ContainerLogger module does not rotate logs when run as root with `--switch_user`.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5856:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Logrotate ContainerLogger module does not rotate logs when run as root with 
> `--switch_user`.
> 
>
> Key: MESOS-5856
> URL: https://issues.apache.org/jira/browse/MESOS-5856
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0, 0.28.0, 1.0.0
>Reporter: Joseph Wu
>Assignee: Sivaram Kannan
>Priority: Critical
>  Labels: logger, mesosphere, newbie
>
> The logrotate ContainerLogger module runs as the agent's user.  In most 
> cases, this is {{root}}.
> When {{logrotate}} is run as root, there is an additional check the 
> configuration files must pass (because a root {{logrotate}} needs to be 
> secured against non-root modifications to the configuration):
> https://github.com/logrotate/logrotate/blob/fe80cb51a2571ca35b1a7c8ba0695db5a68feaba/config.c#L807-L815
> Log rotation will fail under the following scenario:
> 1) The agent is run with {{--switch_user}} (default: true)
> 2) A task is launched with a non-root user specified
> 3) The logrotate module spawns a few companion processes (as root) and this 
> creates the {{stdout}}, {{stderr}}, {{stdout.logrotate.conf}}, and 
> {{stderr.logrotate.conf}} files (as root).  This step races with the next 
> step.
> 4) The Mesos containerizer and Fetcher will {{chown}} the task's sandbox to 
> the non-root user.  Including the files just created.
> 5) When {{logrotate}} is run, it will skip any non-root configuration files.  
> This means the files are not rotated.
> 
> Fix: The logrotate module's companion processes should call {{setuid}} and 
> {{setgid}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6494) Clean up the flags parsing in the executors.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6494:
-
Sprint: Mesosphere Sprint 46, Mesosphere Sprint 47  (was: Mesosphere Sprint 
46)

> Clean up the flags parsing in the executors.
> 
>
> Key: MESOS-6494
> URL: https://issues.apache.org/jira/browse/MESOS-6494
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> The current executors and the executor libraries use a mix of `stout::flags` 
> and `os::getenv` to parse flags, leading to a lot of unnecessary and 
> sometimes duplicated code.
> This should be cleaned up, using only {{stout::flags}} to parse flags.
> Environment variables should be used for the flags that are common to ALL the 
> executors (listed in the Executor HTTP API doc).
> Command line parameters should be used for flags that apply only to 
> individual executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6184:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 46, Mesosphere Sprint 47  
(was: Mesosphere Sprint 44, Mesosphere Sprint 46)

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Blocker
>  Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6348) Allow `network/cni` isolator unit-tests to run with CNI plugins

2016-11-15 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6348:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46, 
Mesosphere Sprint 47  (was: Mesosphere Sprint 44, Mesosphere Sprint 45, 
Mesosphere Sprint 46)

> Allow `network/cni` isolator unit-tests to run with CNI plugins 
> 
>
> Key: MESOS-6348
> URL: https://issues.apache.org/jira/browse/MESOS-6348
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently, we don't have any infrastructure to allow for CNI plugins to be 
> used in `network/cni` isolator unit-tests. This forces us to mock CNI plugins 
> that don't use new network namespaces leading to very restricting form of 
> unit-tests. 
> Especially for port-mapper plugin, in order to test its DNAT functionality it 
> will be very useful if we run the containers in separate network namespace 
> requiring an actual CNI plugin.
> The proposal is there to introduce a test filter called CNIPLUGIN, that gets 
> set when CNI_PATH env var is set. Tests using the CNIPLUGIN filter can then 
> use actual CNI plugins in their tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6348) Allow `network/cni` isolator unit-tests to run with CNI plugins

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6348:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Allow `network/cni` isolator unit-tests to run with CNI plugins 
> 
>
> Key: MESOS-6348
> URL: https://issues.apache.org/jira/browse/MESOS-6348
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently, we don't have any infrastructure to allow for CNI plugins to be 
> used in `network/cni` isolator unit-tests. This forces us to mock CNI plugins 
> that don't use new network namespaces leading to very restricting form of 
> unit-tests. 
> Especially for port-mapper plugin, in order to test its DNAT functionality it 
> will be very useful if we run the containers in separate network namespace 
> requiring an actual CNI plugin.
> The proposal is there to introduce a test filter called CNIPLUGIN, that gets 
> set when CNI_PATH env var is set. Tests using the CNIPLUGIN filter can then 
> use actual CNI plugins in their tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6291) Add unit tests for nested container case for filesystem/linux isolator.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6291:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add unit tests for nested container case for filesystem/linux isolator.
> ---
>
> Key: MESOS-6291
> URL: https://issues.apache.org/jira/browse/MESOS-6291
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Parameterize the existing tests so that all works for both top level 
> container and nested container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6142) Frameworks may RESERVE for an arbitrary role.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6142:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Frameworks may RESERVE for an arbitrary role.
> -
>
> Key: MESOS-6142
> URL: https://issues.apache.org/jira/browse/MESOS-6142
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Affects Versions: 1.0.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>  Labels: mesosphere, reservations
>
> The master does not validate that resources from a reservation request have 
> the same role the framework is registered with. As a result, frameworks may 
> reserve resources for arbitrary roles.
> I've modified the role in [the {{ReserveThenUnreserve}} 
> test|https://github.com/apache/mesos/blob/bca600cf5602ed8227d91af9f73d689da14ad786/src/tests/reservation_tests.cpp#L117]
>  to "yoyo" and observed the following in the test's log:
> {noformat}
> I0908 18:35:43.379122 2138112 master.cpp:3362] Processing ACCEPT call for 
> offers: [ dfaf67e6-7c1c-4988-b427-c49842cb7bb7-O0 ] on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train) for framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- 
> (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116
> I0908 18:35:43.379170 2138112 master.cpp:3022] Authorizing principal 
> 'test-principal' to reserve resources 'cpus(yoyo, test-principal):1; 
> mem(yoyo, test-principal):512'
> I0908 18:35:43.379678 2138112 master.cpp:3642] Applying RESERVE operation for 
> resources cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 from 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.379767 2138112 master.cpp:7341] Sending checkpointed resources 
> cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.380273 3211264 slave.cpp:2497] Updated checkpointed resources 
> from  to cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512
> I0908 18:35:43.380574 2674688 hierarchical.cpp:760] Updated allocation of 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 from cpus(*):1; mem(*):512; 
> disk(*):470841; ports(*):[31000-32000] to ports(*):[31000-32000]; cpus(yoyo, 
> test-principal):1; disk(*):470841; mem(yoyo, test-principal):512 with RESERVE 
> operation
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6493) Add test cases for the HTTPS health checks.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6493:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add test cases for the HTTPS health checks.
> ---
>
> Key: MESOS-6493
> URL: https://issues.apache.org/jira/browse/MESOS-6493
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere, test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6405) Benchmark call ingestion path on the Mesos master.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6405:
-
Sprint: Mesosphere Sprint 45, Mesosphere Sprint 46  (was: Mesosphere Sprint 
45)

> Benchmark call ingestion path on the Mesos master.
> --
>
> Key: MESOS-6405
> URL: https://issues.apache.org/jira/browse/MESOS-6405
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> [~drexin] reported on the user mailing 
> [list|http://mail-archives.apache.org/mod_mbox/mesos-user/201610.mbox/%3C6B42E374-9AB7--A315-A6558753E08B%40apple.com%3E]
>  that there seems to be a significant regression in performance on the call 
> ingestion path on the Mesos master wrt to the scheduler driver (v0 API). 
> We should create a benchmark to first get a sense of the numbers and then go 
> about fixing the performance issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6464) Add fine grained control of which namespaces / cgroups a nested container should inherit (or not).

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6464:
-
Sprint: Mesosphere Sprint 45, Mesosphere Sprint 46  (was: Mesosphere Sprint 
45)

> Add fine grained control of which namespaces / cgroups a nested container 
> should inherit (or not).
> --
>
> Key: MESOS-6464
> URL: https://issues.apache.org/jira/browse/MESOS-6464
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> We need finer grained control of which namespaces / cgroups a nested 
> container should inherit or not.
> Right now, there are some implicit assumptions about which cgroups we enter 
> and which namespaces we inherit when we launch a nested container. For 
> example, under the current semantics, a nested container will always get a 
> new pid namespace but inherit the network namespace from its parent. 
> Moreover, nested containers will always inherit all of the cgroups from their 
> parent (except the freezer cgroup), with no possiblity of choosing any 
> different configuration.
> My current thinking is to pass the set of isolators to 
> {{containerizer->launch()} that we would like to have invoked as part of 
> launching a new container. Only if that isolator is enabled (via the agent 
> flags) AND it is passed in via {{launch()}, will it be used to isolate the 
> new container (note that both cgroup isolation as well as namespace 
> membership also implemented using isolators).  This is a sort of a whitelist 
> approach, where we have to know the full set of isolators we want our 
> container launched with ahead of time.
> Alternatively, we could consider passing in the set of isolators that we 
> would like *disabled* instead.  This way we could blacklist certain isolators 
> from kicking in, even if they have been enabled via the agent flags.
> In both approaches, one major caveat of this is that it will have to become 
> part of the top-level containerizer API, but it is specific only to the 
> universal containerizer. Maybe this is OK as we phase out the docker 
> containerizer anyway.
> I am leaning towards the blacklist approach at the moment...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5856) Logrotate ContainerLogger module does not rotate logs when run as root with --switch_user

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5856:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Logrotate ContainerLogger module does not rotate logs when run as root with 
> --switch_user
> -
>
> Key: MESOS-5856
> URL: https://issues.apache.org/jira/browse/MESOS-5856
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0, 0.28.0, 1.0.0
>Reporter: Joseph Wu
>Assignee: Sivaram Kannan
>Priority: Critical
>  Labels: logger, mesosphere, newbie
>
> The logrotate ContainerLogger module runs as the agent's user.  In most 
> cases, this is {{root}}.
> When {{logrotate}} is run as root, there is an additional check the 
> configuration files must pass (because a root {{logrotate}} needs to be 
> secured against non-root modifications to the configuration):
> https://github.com/logrotate/logrotate/blob/fe80cb51a2571ca35b1a7c8ba0695db5a68feaba/config.c#L807-L815
> Log rotation will fail under the following scenario:
> 1) The agent is run with {{--switch_user}} (default: true)
> 2) A task is launched with a non-root user specified
> 3) The logrotate module spawns a few companion processes (as root) and this 
> creates the {{stdout}}, {{stderr}}, {{stdout.logrotate.conf}}, and 
> {{stderr.logrotate.conf}} files (as root).  This step races with the next 
> step.
> 4) The Mesos containerizer and Fetcher will {{chown}} the task's sandbox to 
> the non-root user.  Including the files just created.
> 5) When {{logrotate}} is run, it will skip any non-root configuration files.  
> This means the files are not rotated.
> 
> Fix: The logrotate module's companion processes should call {{setuid}} and 
> {{setgid}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6462) Design Doc: Mesos Support for Container Attach and Container Exec

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6462:
-
Sprint: Mesosphere Sprint 45, Mesosphere Sprint 46  (was: Mesosphere Sprint 
45)

> Design Doc: Mesos Support for Container Attach and Container Exec
> -
>
> Key: MESOS-6462
> URL: https://issues.apache.org/jira/browse/MESOS-6462
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Here is a link to the design doc:
> https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482NSVk8V0D56sFMzU
> It is not yet complete, but it is filled out enough to start eliciting 
> feedback. Please feel free to add comments (or even add content!) as you wish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3753:
-
Sprint: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, 
Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere 
Sprint 46  (was: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 
41, Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45)

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5966) Add libprocess HTTP tests with SSL support

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5966:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  (was: 
Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere 
Sprint 44, Mesosphere Sprint 45)

> Add libprocess HTTP tests with SSL support
> --
>
> Key: MESOS-5966
> URL: https://issues.apache.org/jira/browse/MESOS-5966
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Libprocess contains SSL unit tests which test our SSL support using simple 
> sockets. We should add tests which also make use of libprocess's various HTTP 
> classes and helpers in a variety of SSL configurations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6411) Add documentation for CNI port-mapper plugin.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6411:
-
Sprint: Mesosphere Sprint 45, Mesosphere Sprint 46  (was: Mesosphere Sprint 
45)

> Add documentation for CNI port-mapper plugin.
> -
>
> Key: MESOS-6411
> URL: https://issues.apache.org/jira/browse/MESOS-6411
> Project: Mesos
>  Issue Type: Documentation
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Need to add the CNI port-mapper plugin to the CNI documentation within Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6335) Add user doc for task group tasks

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6335:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Gilbert Song
>
> Committed some basic documentation. So moving this to pods-improvements epic 
> and targeting this for 1.2.0. I would like this to track the more 
> comprehensive documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6366) Design doc for agent secrets

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6366:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Design doc for agent secrets
> 
>
> Key: MESOS-6366
> URL: https://issues.apache.org/jira/browse/MESOS-6366
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Produce a design for the passing of credentials to the agent, and their use 
> in the following three scenarios:
> * HTTP executor authentication
> * Container image fetching
> * Artifact fetching



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6292) Add unit tests for nested container case for docker/runtime isolator.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6292:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add unit tests for nested container case for docker/runtime isolator.
> -
>
> Key: MESOS-6292
> URL: https://issues.apache.org/jira/browse/MESOS-6292
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Launch nested containers with different container images specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6193) Make the docker/volume isolator nesting aware.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6193:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Make the docker/volume isolator nesting aware.
> --
>
> Key: MESOS-6193
> URL: https://issues.apache.org/jira/browse/MESOS-6193
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6230) Add support for health checks to the default executor.

2016-09-29 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6230:
-
Sprint: Mesosphere Sprint 43, Mesosphere Sprint 44  (was: Mesosphere Sprint 
43)

> Add support for health checks to the default executor.
> --
>
> Key: MESOS-6230
> URL: https://issues.apache.org/jira/browse/MESOS-6230
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, there is no health checking mechanism for the tasks in a task 
> group. Ideally, we would like to re-use the existing health checking 
> infrastructure and do health checking for all the tasks in a task group. If 
> one of them, fails we should kill all the tasks in the task group (default 
> policy). We would add support for specifying custom policies in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6157) ContainerInfo is not validated.

2016-09-29 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6157:
-
Sprint: Mesosphere Sprint 42, Mesosphere Sprint 43, Mesosphere Sprint 44  
(was: Mesosphere Sprint 42, Mesosphere Sprint 43)

> ContainerInfo is not validated.
> ---
>
> Key: MESOS-6157
> URL: https://issues.apache.org/jira/browse/MESOS-6157
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: containerizer, mesos-containerizer, mesosphere
> Fix For: 1.1.0
>
>
> Currently Mesos does not validate {{ContainerInfo}} provided with 
> {{TaskInfo}} or {{ExecutorInfo}}, hence invalid task configurations can be 
> accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6119) TCP health checks are not portable.

2016-09-29 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6119:
-
Sprint: Mesosphere Sprint 42, Mesosphere Sprint 43, Mesosphere Sprint 44  
(was: Mesosphere Sprint 42, Mesosphere Sprint 43)

> TCP health checks are not portable.
> ---
>
> Key: MESOS-6119
> URL: https://issues.apache.org/jira/browse/MESOS-6119
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: health-check, mesosphere
> Fix For: 1.1.0
>
>
> MESOS-3567 introduced a dependency on "bash" for TCP health checks, which is 
> undesirable. We should implement a portable solution for TCP health checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5275) Add capabilities support for unified containerizer.

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5275:
-
Sprint: Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 37, 
Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere 
Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 34, Mesosphere Sprint 
35, Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere Sprint 39, 
Mesosphere Sprint 40, Mesosphere Sprint 41)

> Add capabilities support for unified containerizer.
> ---
>
> Key: MESOS-5275
> URL: https://issues.apache.org/jira/browse/MESOS-5275
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> Add capabilities support for unified containerizer. 
> Requirements:
> 1. Use the mesos capabilities API.
> 2. Frameworks be able to add capability requests for containers.
> 3. Agents be able to add maximum allowed capabilities for all containers 
> launched.
> Design document: 
> https://docs.google.com/document/d/1YiTift8TQla2vq3upQr7K-riQ_pQ-FKOCOsysQJROGc/edit#heading=h.rgfwelqrskmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4690) Reorganize 3rdparty directory

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4690:
-
Sprint: Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere 
Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42  
(was: Mesosphere Sprint 33, Mesosphere Sprint 34, Mesosphere Sprint 35, 
Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere 
Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41)

> Reorganize 3rdparty directory
> -
>
> Key: MESOS-4690
> URL: https://issues.apache.org/jira/browse/MESOS-4690
> Project: Mesos
>  Issue Type: Epic
>  Components: build, libprocess, stout
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> This issues is currently being discussed in the dev mailing list:
> http://www.mail-archive.com/dev@mesos.apache.org/msg34349.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5792) Add mesos tests to CMake (make check)

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5792:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42  
(was: Mesosphere Sprint 40, Mesosphere Sprint 41)

> Add mesos tests to CMake (make check)
> -
>
> Key: MESOS-5792
> URL: https://issues.apache.org/jira/browse/MESOS-5792
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Srinivas
>Assignee: Srinivas
>  Labels: build, mesosphere
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Provide CMakeLists.txt and configuration files to build mesos tests using 
> CMake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6076) Implement RunTaskGroup handler on the agent.

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6076:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 
41)

> Implement RunTaskGroup handler on the agent.
> 
>
> Key: MESOS-6076
> URL: https://issues.apache.org/jira/browse/MESOS-6076
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need to implement the {{RunTaskGroup}} handler on the agent. This would be 
> similar to the {{RunTask}} handler that already exists except that this would 
> have the relevant logic to send the task group to the executor atomically.
> Ideally, we would like to re-use as much pieces of the already existing 
> functionality from the {{runTask()}} handler. We also need to add a state 
> {{queuedTaskGroups}} since it is needed for dispatching queued task groups to 
> the executor upon registration. Also, we should ensure to populate 
> {{queuedTasks}} with the task group information too thereby enabling users to 
> query it via the `/state` endpoint/master reconciliation messages etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5931) Support auto backend in Unified Containerizer.

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5931:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 
41)

> Support auto backend in Unified Containerizer.
> --
>
> Key: MESOS-5931
> URL: https://issues.apache.org/jira/browse/MESOS-5931
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: backend, containerizer, mesosphere
>
> Currently in Unified Containerizer, copy backend will be selected by default. 
> This is not ideal, especially for production environment. It would take a 
> long time to prepare an huge container image to copy it from the store to 
> provisioner.
> Ideally, we should support `auto backend`, which would 
> automatically/intelligently select the best/optimal backend for image 
> provisioner if user does not specify one from the agent flag.
> We should have a logic design first in this ticket, to determine how we want 
> to choose the right backend (e.g., overlayfs or aufs should be preferred if 
> available from the kernel).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5303) Add capabilities support for mesos execute cli.

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5303:
-
Sprint: Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 37, 
Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere 
Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 34, Mesosphere Sprint 
35, Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere Sprint 39, 
Mesosphere Sprint 40, Mesosphere Sprint 41)

> Add capabilities support for mesos execute cli.
> ---
>
> Key: MESOS-5303
> URL: https://issues.apache.org/jira/browse/MESOS-5303
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> Add support for `user` and `capabilities` to execute cli. This will help in 
> testing the `capabilities` feature for unified containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3910) Libprocess: Implement cleanup of the SocketManager in process::finalize

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3910:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42  
(was: Mesosphere Sprint 40, Mesosphere Sprint 41)

> Libprocess: Implement cleanup of the SocketManager in process::finalize
> ---
>
> Key: MESOS-3910
> URL: https://issues.apache.org/jira/browse/MESOS-3910
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> The {{socket_manager}} and {{process_manager}} are intricately tied together. 
>  Currently, only the {{process_manager}} is cleaned up by 
> {{process::finalize}}.
> To clean up the {{socket_manager}}, we must close all sockets and deallocate 
> any existing {{HttpProxy}} or {{Encoder}} objects.  And we should prevent 
> further objects from being created/tracked by the {{socket_manager}}.
> *Proposal*
> # Clean up all processes other than {{gc}}.  This will clear all links and 
> delete all {{HttpProxy}} s while {{socket_manager}} still exists.
> # Close all sockets via {{SocketManager::close}}.  All of {{socket_manager}} 
> 's state is cleaned up via {{SocketManager::close}}, including termination of 
> {{HttpProxy}} (termination is idempotent, meaning that killing {{HttpProxy}} 
> s via {{process_manager}} is safe).
> # At this point, {{socket_manager}} should be empty and only the {{gc}} 
> process should be running.  (Since we're finalizing, assume there are no 
> threads trying to spawn processes.)  {{socket_manager}} can be deleted.
> # {{gc}} can be deleted.  This is currently a leaked pointer, so we'll also 
> need to track and delete that.
> # {{process_manager}} should be devoid of processes, so we can proceed with 
> cleanup (join threads, stop the {{EventLoop}}, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5779) Allow Docker v1 ImageManifests to be parsed from the output of `docker inspect`

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5779:
-
Sprint: Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 41, 
Mesosphere Sprint 42  (was: Mesosphere Sprint 38, Mesosphere Sprint 39, 
Mesosphere Sprint 41)

> Allow Docker v1 ImageManifests to be parsed from the output of `docker 
> inspect`
> ---
>
> Key: MESOS-5779
> URL: https://issues.apache.org/jira/browse/MESOS-5779
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>
> The `docker::spec::v1::ImageManifest` protobuf implements the
> official v1 image manifest specification found at:
> 
> https://github.com/docker/docker/blob/master/image/spec/v1.md
> 
> The field names in this spec are all written in snake_case as are the
> field names of the JSON representing the image manifest when reading
> it from disk (for example after performing a `docker save`). As such,
> the protobuf for ImageManifest also provides these fields in
> snake_case. Unfortunately, the `docker inspect` command also provides
> a method of retrieving the JSON for an image manifest, with one major
> caveat -- it represents all of its top level keys in CamelCase.
> 
> To allow both representations to be parsed in the same way, we
> should intercept the incoming JSON from either source (disk or `docker
> inspect`) and convert it to a canonical snake_case representation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6104) Potential FD double close in libevent's implementation of `sendfile`.

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6104:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 
41)

> Potential FD double close in libevent's implementation of `sendfile`.
> -
>
> Key: MESOS-6104
> URL: https://issues.apache.org/jira/browse/MESOS-6104
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 0.27.3, 0.28.2, 1.0.1
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere, ssl
>
> Repro copied from: https://reviews.apache.org/r/51509/
> It is possible to make the master CHECK fail by repeatedly hitting the web UI 
> and reloading the static assets:
> 1) Paste lots of text (16KB or more) of text into 
> `src/webui/master/static/home.html`.  The more text, the more reliable the 
> repro.
> 2) Start the master with SSL enabled:
> {code}
> LIBPROCESS_SSL_ENABLED=true LIBPROCESS_SSL_KEY_FILE=key.pem 
> LIBPROCESS_SSL_CERT_FILE=cert.pem bin/mesos-master.sh --work_dir=/tmp/master
> {code}
> 3) Run two instances of this python script repeatedly:
> {code}
> import socket
> import ssl
> s = ssl.wrap_socket(socket.socket())
> s.connect(("localhost", 5050))
> s.sendall("""GET /static/home.html HTTP/1.1
> User-Agent: foobar
> Host: localhost:5050
> Accept: */*
> Connection: Keep-Alive
> """)
> # The HTTP part of the response
> print s.recv(1000)
> {code}
> i.e. 
> {code}
> while python test.py; do :; done & while python test.py; do :; done
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6052) Unable to launch containers on CNI networks on CoreOS

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6052:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 
41)

> Unable to launch containers on CNI networks on CoreOS
> -
>
> Key: MESOS-6052
> URL: https://issues.apache.org/jira/browse/MESOS-6052
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.0.0
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> CoreOS does not have an `/etc/hosts`. Currently, in the `network/cni` 
> isolator, if we don't see a `/etc/hosts` on the host filesystem we don't bind 
> mount the containers `hosts` file to this target for the `command executor`. 
> On distros such as CoreOS this fails the container launch since the 
> `libprocess` initialization of the `command executor` fails cause it can't 
> resolve its `hostname`.
> We should be creating the `/etc/hosts` and `/etc/hostname` files when they 
> are absent on the host filesystem since creating these files should not 
> affect name resolution on the host network namespace, and it will allow the 
> `/etc/hosts` file to be bind mounted correctly and allow name resolution in 
> the containers network namespace as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3934) Libprocess: Unify the initialization of the MetricsProcess and ReaperProcess

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3934:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42  
(was: Mesosphere Sprint 40, Mesosphere Sprint 41)

> Libprocess: Unify the initialization of the MetricsProcess and ReaperProcess
> 
>
> Key: MESOS-3934
> URL: https://issues.apache.org/jira/browse/MESOS-3934
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> Related to this 
> [TODO|https://github.com/apache/mesos/blob/aa0cd7ed4edf1184cbc592b5caa2429a8373e813/3rdparty/libprocess/src/process.cpp#L949-L950].
> The {{MetricsProcess}} and {{ReaperProcess}} are global processes 
> (singletons) which are initialized upon first use.  The two processes could 
> be initialized alongside the {{gc}}, {{help}}, {{logging}}, {{profiler}}, and 
> {{system}} (statistics) processes inside {{process::initialize}}.
> This is also necessary for libprocess re-initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5570) Improve CHANGELOG and upgrades.md

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5570:
-
Sprint: Mesosphere Sprint 37, Mesosphere Sprint 38, Mesosphere Sprint 39, 
Mesosphere Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 37, 
Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 41)

> Improve CHANGELOG and upgrades.md
> -
>
> Key: MESOS-5570
> URL: https://issues.apache.org/jira/browse/MESOS-5570
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>
> Currently we have a lot of data duplication between the CHANGELOG and 
> upgrades.md. We should try to improve this and potentially make the CHANGLOG 
> a markdown file as well. For inspiration see the Hadoop changelog: 
> https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-common-project/hadoop-common/src/site/markdown/release/1.2.0/CHANGES.1.2.0.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3753:
-
Sprint: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, 
Mesosphere Sprint 42  (was: Mesosphere Sprint 39, Mesosphere Sprint 40, 
Mesosphere Sprint 41)

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4766) Improve allocator performance.

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4766:
-
Sprint: Mesosphere Sprint 32, Mesosphere Sprint 33, Mesosphere Sprint 34, 
Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere Sprint 37, Mesosphere 
Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, 
Mesosphere Sprint 42  (was: Mesosphere Sprint 32, Mesosphere Sprint 33, 
Mesosphere Sprint 34, Mesosphere Sprint 35, Mesosphere Sprint 36, Mesosphere 
Sprint 37, Mesosphere Sprint 38, Mesosphere Sprint 39, Mesosphere Sprint 40, 
Mesosphere Sprint 41)

> Improve allocator performance.
> --
>
> Key: MESOS-4766
> URL: https://issues.apache.org/jira/browse/MESOS-4766
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> This is an epic to track the various tickets around improving the performance 
> of the allocator, including the following:
> * Preventing un-necessary backup of the allocator.
> * Reducing the cost of allocations and allocator state updates.
> * Improving performance of the DRF sorter.
> * More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6088) Update 'launcher' to checkpoint exit status of launched process.

2016-09-03 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6088:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 
41)

> Update 'launcher' to checkpoint exit status of launched process.
> 
>
> Key: MESOS-6088
> URL: https://issues.apache.org/jira/browse/MESOS-6088
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
> Fix For: 1.1.0
>
>
> Currently the 'mesos-containerizer launch' binary simply execs
> into the actual command we wanted to launch after doing some set of
> preperatory work. The problem with this approach, however, is that
> this gives us no opportunity to checkpoint the exit status of the
> command so the agent can recover it in cases where it is offline at
> the time the command completes.  We should add support for this 
> checkpointing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   >