[jira] [Updated] (MESOS-8176) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8176:
---
Description: 
Observed it today in the internal CI:
{noformat}
../../src/tests/slave_recovery_tests.cpp:4708
Value of: usage->has_cpus_limit()
  Actual: false
Expected: true
{noformat}
Full log attached.

This seems to be different from MESOS-5048 and MESOS-6481

  was:
Observed it today in the internal CI:
{noformat}
../../src/tests/slave_recovery_tests.cpp:4708
Value of: usage->has_cpus_limit()
  Actual: false
Expected: true
{noformat}
Full log attached.


> MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky.
> 
>
> Key: MESOS-8176
> URL: https://issues.apache.org/jira/browse/MESOS-8176
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
> Environment: ec2 Ubuntu 16.04, autotools, no SSL
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: ResourceStatistics-badrun.txt
>
>
> Observed it today in the internal CI:
> {noformat}
> ../../src/tests/slave_recovery_tests.cpp:4708
> Value of: usage->has_cpus_limit()
>   Actual: false
> Expected: true
> {noformat}
> Full log attached.
> This seems to be different from MESOS-5048 and MESOS-6481



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5048) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246842#comment-16246842
 ] 

Alexander Rukletsov commented on MESOS-5048:


Observed in the internal CI, attached "ResourceStatistics-badrun2.txt" log.

> MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
> ---
>
> Key: MESOS-5048
> URL: https://issues.apache.org/jira/browse/MESOS-5048
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.28.0
> Environment: Ubuntu 15.04, Ubuntu 16.04
>Reporter: Jian Qiu
>  Labels: flaky-test
> Attachments: ResourceStatistics-badrun2.txt
>
>
> ./mesos-tests.sh 
> --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics 
> --gtest_repeat=100 --gtest_break_on_failure
> This is found in rb, and reproduced in my local machine. There are two types 
> of failures. However, the failure does not appear when enabling verbose...
> {code}
> ../../src/tests/environment.cpp:790: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests 
>  \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor 
>\--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor 
> {code}
> And
> {code}
> I0328 15:42:36.982471  5687 exec.cpp:150] Version: 0.29.0
> I0328 15:42:37.008765  5708 exec.cpp:225] Executor registered on slave 
> 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0
> Registered executor on mesos
> ../../src/tests/slave_recovery_tests.cpp:3506: Failure
> Value of: containers.get().size()
>   Actual: 0
> Expected: 1u
> Which is: 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5048) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5048:
---
Attachment: ResourceStatistics-badrun2.txt

> MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
> ---
>
> Key: MESOS-5048
> URL: https://issues.apache.org/jira/browse/MESOS-5048
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.28.0
> Environment: Ubuntu 15.04, Ubuntu 16.04
>Reporter: Jian Qiu
>  Labels: flaky-test
> Attachments: ResourceStatistics-badrun2.txt
>
>
> ./mesos-tests.sh 
> --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics 
> --gtest_repeat=100 --gtest_break_on_failure
> This is found in rb, and reproduced in my local machine. There are two types 
> of failures. However, the failure does not appear when enabling verbose...
> {code}
> ../../src/tests/environment.cpp:790: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests 
>  \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor 
>\--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor 
> {code}
> And
> {code}
> I0328 15:42:36.982471  5687 exec.cpp:150] Version: 0.29.0
> I0328 15:42:37.008765  5708 exec.cpp:225] Executor registered on slave 
> 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0
> Registered executor on mesos
> ../../src/tests/slave_recovery_tests.cpp:3506: Failure
> Value of: containers.get().size()
>   Actual: 0
> Expected: 1u
> Which is: 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5048) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5048:
---
Environment: Ubuntu 15.04, Ubuntu 16.04  (was: Ubuntu 15.04)

> MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
> ---
>
> Key: MESOS-5048
> URL: https://issues.apache.org/jira/browse/MESOS-5048
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.28.0
> Environment: Ubuntu 15.04, Ubuntu 16.04
>Reporter: Jian Qiu
>  Labels: flaky-test
>
> ./mesos-tests.sh 
> --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics 
> --gtest_repeat=100 --gtest_break_on_failure
> This is found in rb, and reproduced in my local machine. There are two types 
> of failures. However, the failure does not appear when enabling verbose...
> {code}
> ../../src/tests/environment.cpp:790: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests 
>  \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor 
>\--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor 
> {code}
> And
> {code}
> I0328 15:42:36.982471  5687 exec.cpp:150] Version: 0.29.0
> I0328 15:42:37.008765  5708 exec.cpp:225] Executor registered on slave 
> 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0
> Registered executor on mesos
> ../../src/tests/slave_recovery_tests.cpp:3506: Failure
> Value of: containers.get().size()
>   Actual: 0
> Expected: 1u
> Which is: 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8205) MasterAuthorizerTest/1.FilterOrphanedTasks is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8205:
---
Attachment: FilterOrphanedTasks-badrun.txt

> MasterAuthorizerTest/1.FilterOrphanedTasks is flaky.
> 
>
> Key: MESOS-8205
> URL: https://issues.apache.org/jira/browse/MESOS-8205
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: CentOS 6
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: FilterOrphanedTasks-badrun.txt
>
>
> Observed today in the internal CI. Full log attached.
> {noformat}
> ../../src/tests/master_authorization_tests.cpp:2239
> Failed to wait 15secs for statusUpdate
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8205) MasterAuthorizerTest/1.FilterOrphanedTasks is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8205:
--

 Summary: MasterAuthorizerTest/1.FilterOrphanedTasks is flaky.
 Key: MESOS-8205
 URL: https://issues.apache.org/jira/browse/MESOS-8205
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: CentOS 6
Reporter: Alexander Rukletsov


Observed today in the internal CI. Full log attached.
{noformat}
../../src/tests/master_authorization_tests.cpp:2239
Failed to wait 15secs for statusUpdate
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.

2017-11-09 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-8200:
--

Assignee: Yan Xu  (was: James Peach)

> Suppressed roles are not honoured for v1 scheduler subscribe requests.
> --
>
> Key: MESOS-8200
> URL: https://issues.apache.org/jira/browse/MESOS-8200
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler api, scheduler driver
>Reporter: Alexander Rukletsov
>Assignee: Yan Xu
>
> When triaging MESOS-7996 I've found out that 
> {{Call.subscribe.suppressed_roles}} field is empty when the master processes 
> the request from a v1 HTTP scheduler. More precisely, [this 
> conversion|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/master/http.cpp#L969]
>  wipes the field. This is likely because this conversion relies on a general 
> [protobuf conversion 
> utility|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/internal/devolve.cpp#L28-L50],
>  which fails to copy {{suppressed_roles}} because they have different tags, 
> compare 
> [v0|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/scheduler/scheduler.proto#L271]
>  and 
> [v1|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/v1/scheduler/scheduler.proto#L258].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8052) "protoc" not found when running "make -j4 check" directly in stout

2017-11-09 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-8052:
--
Fix Version/s: (was: 1.4.1)

> "protoc" not found when running "make -j4 check" directly in stout
> --
>
> Key: MESOS-8052
> URL: https://issues.apache.org/jira/browse/MESOS-8052
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: compile-error
>
> If we run {{make -j4 check}} without running {{make}} first, we will get the 
> following error message:
> {noformat}
> 3rdparty/protobuf-3.3.0/src/protoc -I../tests --cpp_out=. 
> ../tests/protobuf_tests.proto
> /bin/bash: 3rdparty/protobuf-3.3.0/src/protoc: No such file or directory
> Makefile:1934: recipe for target 'protobuf_tests.pb.cc' failed
> make: *** [protobuf_tests.pb.cc] Error 127
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.

2017-11-09 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-7996:
--

Assignee: Yan Xu  (was: James Peach)

> ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
> --
>
> Key: MESOS-7996
> URL: https://issues.apache.org/jira/browse/MESOS-7996
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Observed on Ubuntu 17.04 with SSL enabled
>Reporter: Alexander Rukletsov
>Assignee: Yan Xu
>  Labels: flaky-test, mesosphere
> Attachments: NoOffersWithAllRolesSuppressed-modified.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt
>
>
> Observed the failure on internal CI:
> {noformat}
> ../../src/tests/scheduler_tests.cpp:1474
> Mock function called more times than expected - returning directly.
> Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object 
> <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>)
>  Expected: to be never called
>Actual: called once - over-saturated and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery

2017-11-09 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7939:
--
Fix Version/s: (was: 1.4.1)

> Early disk usage check for garbage collection during recovery
> -
>
> Key: MESOS-7939
> URL: https://issues.apache.org/jira/browse/MESOS-7939
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Critical
>
> Currently the default value for `disk_watch_interval` is 1 minute. This is 
> not fast enough and could lead to the following scenario:
> 1. The disk usage was checked and there was not enough headroom:
> {noformat}
> I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max 
> allowed age: 0ns
> {noformat}
> But no container was pruned because no container had been scheduled for GC.
> 2. A task was completed. The task itself contained a lot of nested 
> containers, each used a lot of disk space. Note that there is no way for 
> Mesos agent to schedule individual nested containers for GC since nested 
> containers are not necessarily tied to tasks. When the top-lovel container is 
> completed, it was scheduled for GC, and the nested containers would be GC'ed 
> as well: 
> {noformat}
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.9466483852days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466405037days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.946635763days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466324148days in the future
> {noformat}
> 3. Since the next disk usage check was still 40ish seconds away, no GC was 
> performed even though the disk was full. As a result, Mesos agent failed to 
> checkpoint the task status:
> {noformat}
> I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status 
> update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> 4. When the agent restarted, it tried to checkpoint the task status again. 
> However, since the first disk usage check was scheduled 1 minute after 
> startup, the agent failed before GC kicked in, falling into a restart failure 
> loop:
> {noformat}
> F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> We should kick in GC early, so the agent can recover from this state.
> Related ticket: MESOS-7031



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8204) HealthCheckTest.ROOT_DOCKER_DockerHealthyTaskViaHTTP is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8204:
---
Attachment: ROOT_DOCKER_DockerHealthyTaskViaHTTP-badrun.txt

> HealthCheckTest.ROOT_DOCKER_DockerHealthyTaskViaHTTP is flaky.
> --
>
> Key: MESOS-8204
> URL: https://issues.apache.org/jira/browse/MESOS-8204
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 17.04 with SSL
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: ROOT_DOCKER_DockerHealthyTaskViaHTTP-badrun.txt
>
>
> Observed today in the internal CI. Full log attached.
> {noformat}
> ../../src/tests/health_check_tests.cpp:2048
> Failed to wait 15secs for statusHealthy
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8204) HealthCheckTest.ROOT_DOCKER_DockerHealthyTaskViaHTTP is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8204:
--

 Summary: HealthCheckTest.ROOT_DOCKER_DockerHealthyTaskViaHTTP is 
flaky.
 Key: MESOS-8204
 URL: https://issues.apache.org/jira/browse/MESOS-8204
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: Ubuntu 17.04 with SSL
Reporter: Alexander Rukletsov


Observed today in the internal CI. Full log attached.
{noformat}
../../src/tests/health_check_tests.cpp:2048
Failed to wait 15secs for statusHealthy
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8203) SchedulerTest.TaskGroupRunning is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8203:
---
Attachment: TaskGroupRunning-badrun.txt

> SchedulerTest.TaskGroupRunning is flaky.
> 
>
> Key: MESOS-8203
> URL: https://issues.apache.org/jira/browse/MESOS-8203
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: TaskGroupRunning-badrun.txt
>
>
> Observed today in the internal CI. Full log attached.
> {noformat}
> ../../src/tests/scheduler_tests.cpp:726
>   Expected: v1::TASK_RUNNING
>   Which is: TASK_RUNNING
> To be equal to: runningUpdate2->status().state()
>   Which is: TASK_FINISHED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8203) SchedulerTest.TaskGroupRunning is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8203:
--

 Summary: SchedulerTest.TaskGroupRunning is flaky.
 Key: MESOS-8203
 URL: https://issues.apache.org/jira/browse/MESOS-8203
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Alexander Rukletsov


Observed today in the internal CI. Full log attached.
{noformat}
../../src/tests/scheduler_tests.cpp:726
  Expected: v1::TASK_RUNNING
  Which is: TASK_RUNNING
To be equal to: runningUpdate2->status().state()
  Which is: TASK_FINISHED
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7742:
---
Attachment: LaunchNestedContainerSessionDisconnected-badrun.txt

> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> --
>
> Key: MESOS-7742
> URL: https://issues.apache.org/jira/browse/MESOS-7742
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Gastón Kleiman
>  Labels: flaky-test, mesosphere-oncall
> Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt, 
> LaunchNestedContainerSessionDisconnected-badrun.txt
>
>
> Observed this on ASF CI and internal Mesosphere CI. Affected tests:
> {noformat}
> AgentAPIStreamingTest.AttachInputToNestedContainerSession
> AgentAPITest.LaunchNestedContainerSession
> AgentAPITest.AttachContainerInputAuthorization/0
> AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> {noformat}
> {code}
> [ RUN  ] 
> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
> I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0629 05:49:33.182234 25306 master.cpp:436] Master 
> 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 
> 172.17.0.3:45726
> I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" -
> -allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --au
> thenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
> --framework_sorter="drf" --help="false" --hostn
> ame_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" 
> --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="10
> 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="in_memory" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registr
> y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" -
> -version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
> I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/a5h5J3/credentials'
> I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
> I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
> allocator process
> I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
> I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
> I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
> I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
> I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 183040ns
> I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 
> 6441ns; attempting to update the registry
> I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
> registry in 147200ns
> I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered 
> registrar
> I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0629 05:49:33.186769 25301 

[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7742:
---
Description: 
Observed this on ASF CI and internal Mesosphere CI. Affected tests:
{noformat}
AgentAPIStreamingTest.AttachInputToNestedContainerSession
AgentAPITest.LaunchNestedContainerSession
AgentAPITest.AttachContainerInputAuthorization/0
AgentAPITest.LaunchNestedContainerSessionWithTTY/0
AgentAPITest.LaunchNestedContainerSessionDisconnected/1
{noformat}

{code}
[ RUN  ] 
ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer
I0629 05:49:33.182234 25306 master.cpp:436] Master 
90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726
I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" -
-allocator="HierarchicalDRF" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --au
thenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
--framework_sorter="drf" --help="false" --hostn
ame_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="10
00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="in_memory" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registr
y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" -
-version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated 
agents to register
I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/a5h5J3/credentials'
I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
authenticator
I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
allocator process
I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
registry (0B) in 183040ns
I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; 
attempting to update the registry
I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
registry in 147200ns
I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar
I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
hierarchical allocator: nothing to recover
I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy'
I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer
I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on 
(644)@172.17.0.3:45726
I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" 

[jira] [Assigned] (MESOS-7519) OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-7519:
--

Assignee: Alexander Rukletsov
  Sprint: Mesosphere Sprint 68
Story Points: 1

> OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky
> -
>
> Key: MESOS-7519
> URL: https://issues.apache.org/jira/browse/MESOS-7519
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt
>
>
> {noformat}
> [ RUN  ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable
> I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0517 10:43:58.155712 260517888 master.cpp:436] Master 
> a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on 
> 169.254.161.216:51870
> I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/master"
>  --zk_session_timeout="10secs"
> I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials'
> I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled
> I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading 
> master!
> I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar
> I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 0ns
> I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in 
> 14us; attempting to update the registry
> I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the 
> registry in 0ns
> I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered 
> registrar
> I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the 
> registry (164B); allowing 10mins for agents to re-register
> I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend 
> 'copy'
> I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0517 10:43:58.164876 262127616 slave.cpp:225] Mesos agent started on 
> (7)@169.254.161.216:51870
> I0517 10:43:58.164902 262127616 slave.cpp:226] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> 

[jira] [Updated] (MESOS-7519) OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7519:
---
Summary: OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable 
is flaky.  (was: 
OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky)

> OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky.
> --
>
> Key: MESOS-7519
> URL: https://issues.apache.org/jira/browse/MESOS-7519
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt
>
>
> {noformat}
> [ RUN  ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable
> I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0517 10:43:58.155712 260517888 master.cpp:436] Master 
> a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on 
> 169.254.161.216:51870
> I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/master"
>  --zk_session_timeout="10secs"
> I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials'
> I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled
> I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading 
> master!
> I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar
> I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 0ns
> I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in 
> 14us; attempting to update the registry
> I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the 
> registry in 0ns
> I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered 
> registrar
> I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the 
> registry (164B); allowing 10mins for agents to re-register
> I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend 
> 'copy'
> I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0517 10:43:58.164876 262127616 slave.cpp:225] Mesos agent started on 
> (7)@169.254.161.216:51870
> I0517 10:43:58.164902 262127616 slave.cpp:226] Flags at startup: --acls="" 
> 

[jira] [Commented] (MESOS-7519) OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246771#comment-16246771
 ] 

Alexander Rukletsov commented on MESOS-7519:


It looks like review https://reviews.apache.org/r/55893/ does not really fix 
what it aims to fix. The problem is in [this 
loop|https://github.com/apache/mesos/blob/master/src/tests/oversubscription_tests.cpp?utf8=%E2%9C%93#L549-L552]:
 {{offers.get()}} removes items from the collection we iterate over and hence 
defeats the purpose of merging resources from all offers.

> OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky
> -
>
> Key: MESOS-7519
> URL: https://issues.apache.org/jira/browse/MESOS-7519
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: flaky-test, mesosphere
> Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt
>
>
> {noformat}
> [ RUN  ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable
> I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0517 10:43:58.155712 260517888 master.cpp:436] Master 
> a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on 
> 169.254.161.216:51870
> I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/master"
>  --zk_session_timeout="10secs"
> I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials'
> I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled
> I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading 
> master!
> I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar
> I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 0ns
> I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in 
> 14us; attempting to update the registry
> I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the 
> registry in 0ns
> I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered 
> registrar
> I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the 
> registry (164B); allowing 10mins for agents to re-register
> I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend 
> 'copy'
> I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local' 
> authorizer
> 

[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery

2017-11-09 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7939:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, 
Mesosphere Sprint 66  (was: Mesosphere Sprint 63, Mesosphere Sprint 64, 
Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67)

> Early disk usage check for garbage collection during recovery
> -
>
> Key: MESOS-7939
> URL: https://issues.apache.org/jira/browse/MESOS-7939
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Critical
> Fix For: 1.4.1
>
>
> Currently the default value for `disk_watch_interval` is 1 minute. This is 
> not fast enough and could lead to the following scenario:
> 1. The disk usage was checked and there was not enough headroom:
> {noformat}
> I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max 
> allowed age: 0ns
> {noformat}
> But no container was pruned because no container had been scheduled for GC.
> 2. A task was completed. The task itself contained a lot of nested 
> containers, each used a lot of disk space. Note that there is no way for 
> Mesos agent to schedule individual nested containers for GC since nested 
> containers are not necessarily tied to tasks. When the top-lovel container is 
> completed, it was scheduled for GC, and the nested containers would be GC'ed 
> as well: 
> {noformat}
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.9466483852days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466405037days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.946635763days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466324148days in the future
> {noformat}
> 3. Since the next disk usage check was still 40ish seconds away, no GC was 
> performed even though the disk was full. As a result, Mesos agent failed to 
> checkpoint the task status:
> {noformat}
> I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status 
> update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> 4. When the agent restarted, it tried to checkpoint the task status again. 
> However, since the first disk usage check was scheduled 1 minute after 
> startup, the agent failed before GC kicked in, falling into a restart failure 
> loop:
> {noformat}
> F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> We should kick in GC early, so the agent can recover 

[jira] [Updated] (MESOS-7881) Building gRPC with CMake

2017-11-09 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7881:
--
Sprint: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 66  (was: Mesosphere Sprint 61, 
Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere 
Sprint 66, Mesosphere Sprint 68)

> Building gRPC with CMake
> 
>
> Key: MESOS-7881
> URL: https://issues.apache.org/jira/browse/MESOS-7881
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
> Fix For: 1.4.0
>
>
> gRPC manages its own third-party libraries, which overlap with Mesos' 
> third-party library bundles. We need to write proper rules in CMake to 
> configure gRPC's CMake properly to build it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery

2017-11-09 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7939:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, 
Mesosphere Sprint 66  (was: Mesosphere Sprint 63, Mesosphere Sprint 64, 
Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 68)

> Early disk usage check for garbage collection during recovery
> -
>
> Key: MESOS-7939
> URL: https://issues.apache.org/jira/browse/MESOS-7939
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Critical
> Fix For: 1.4.1
>
>
> Currently the default value for `disk_watch_interval` is 1 minute. This is 
> not fast enough and could lead to the following scenario:
> 1. The disk usage was checked and there was not enough headroom:
> {noformat}
> I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max 
> allowed age: 0ns
> {noformat}
> But no container was pruned because no container had been scheduled for GC.
> 2. A task was completed. The task itself contained a lot of nested 
> containers, each used a lot of disk space. Note that there is no way for 
> Mesos agent to schedule individual nested containers for GC since nested 
> containers are not necessarily tied to tasks. When the top-lovel container is 
> completed, it was scheduled for GC, and the nested containers would be GC'ed 
> as well: 
> {noformat}
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.9466483852days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466405037days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.946635763days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466324148days in the future
> {noformat}
> 3. Since the next disk usage check was still 40ish seconds away, no GC was 
> performed even though the disk was full. As a result, Mesos agent failed to 
> checkpoint the task status:
> {noformat}
> I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status 
> update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> 4. When the agent restarted, it tried to checkpoint the task status again. 
> However, since the first disk usage check was scheduled 1 minute after 
> startup, the agent failed before GC kicked in, falling into a restart failure 
> loop:
> {noformat}
> F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> We should kick in GC early, so the agent can recover 

[jira] [Created] (MESOS-8202) Eliminate agent failover after resource checkpointing failure

2017-11-09 Thread JIRA
Gastón Kleiman created MESOS-8202:
-

 Summary: Eliminate agent failover after resource checkpointing 
failure
 Key: MESOS-8202
 URL: https://issues.apache.org/jira/browse/MESOS-8202
 Project: Mesos
  Issue Type: Task
Reporter: Gastón Kleiman






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery

2017-11-09 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7939:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, 
Mesosphere Sprint 66, Mesosphere Sprint 68  (was: Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66)

> Early disk usage check for garbage collection during recovery
> -
>
> Key: MESOS-7939
> URL: https://issues.apache.org/jira/browse/MESOS-7939
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Critical
> Fix For: 1.4.1
>
>
> Currently the default value for `disk_watch_interval` is 1 minute. This is 
> not fast enough and could lead to the following scenario:
> 1. The disk usage was checked and there was not enough headroom:
> {noformat}
> I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max 
> allowed age: 0ns
> {noformat}
> But no container was pruned because no container had been scheduled for GC.
> 2. A task was completed. The task itself contained a lot of nested 
> containers, each used a lot of disk space. Note that there is no way for 
> Mesos agent to schedule individual nested containers for GC since nested 
> containers are not necessarily tied to tasks. When the top-lovel container is 
> completed, it was scheduled for GC, and the nested containers would be GC'ed 
> as well: 
> {noformat}
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.9466483852days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466405037days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc'
>  for gc 1.946635763days in the future
> I0901 17:54:44.00 25510 gc.cpp:59] Scheduling 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e'
>  for gc 1.9466324148days in the future
> {noformat}
> 3. Since the next disk usage check was still 40ish seconds away, no GC was 
> performed even though the disk was full. As a result, Mesos agent failed to 
> checkpoint the task status:
> {noformat}
> I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status 
> update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> 4. When the agent restarted, it tried to checkpoint the task status again. 
> However, since the first disk usage check was scheduled 1 minute after 
> startup, the agent failed before GC kicked in, falling into a restart failure 
> loop:
> {noformat}
> F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: 
> Failed to open 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates'
>  for status updates: No space left on device Failed to handle status update 
> TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task 
> node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework 
> 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005
> {noformat}
> We should kick in GC early, so the agent can recover 

[jira] [Created] (MESOS-8201) Add end to end tests for offer operation feedback

2017-11-09 Thread JIRA
Gastón Kleiman created MESOS-8201:
-

 Summary: Add end to end tests for offer operation feedback
 Key: MESOS-8201
 URL: https://issues.apache.org/jira/browse/MESOS-8201
 Project: Mesos
  Issue Type: Task
Reporter: Gastón Kleiman






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7519) OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7519:
---
Attachment: RescindRevocableOfferWithIncreasedRevocable-badrun.txt

> OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky
> -
>
> Key: MESOS-7519
> URL: https://issues.apache.org/jira/browse/MESOS-7519
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: flaky-test, mesosphere
> Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt
>
>
> {noformat}
> [ RUN  ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable
> I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0517 10:43:58.155712 260517888 master.cpp:436] Master 
> a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on 
> 169.254.161.216:51870
> I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/master"
>  --zk_session_timeout="10secs"
> I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials'
> I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled
> I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading 
> master!
> I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar
> I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 0ns
> I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in 
> 14us; attempting to update the registry
> I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the 
> registry in 0ns
> I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered 
> registrar
> I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the 
> registry (164B); allowing 10mins for agents to re-register
> I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend 
> 'copy'
> I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0517 10:43:58.164876 262127616 slave.cpp:225] Mesos agent started on 
> (7)@169.254.161.216:51870
> I0517 10:43:58.164902 262127616 slave.cpp:226] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/mesos/store/appc"
>  

[jira] [Updated] (MESOS-8189) Master’s OperationStatusUpdate handler should forward updates to the framework when OfferOperationID is set.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8189:
--
Story Points: 2

> Master’s OperationStatusUpdate handler should forward updates to the 
> framework when OfferOperationID is set.
> 
>
> Key: MESOS-8189
> URL: https://issues.apache.org/jira/browse/MESOS-8189
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8190) Update the master to accept OfferOperationIDs from frameworks.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8190:
--
Story Points: 3

> Update the master to accept OfferOperationIDs from frameworks.
> --
>
> Key: MESOS-8190
> URL: https://issues.apache.org/jira/browse/MESOS-8190
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> Master’s {{ACCEPT}} handler should send failed operation updates when a 
> framework sets the {{OfferOperationID}} on an operation destined for an agent 
> without the {{RESOURCE_PROVIDER}} capability.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8192) Update the scheduler library to support request/response API calls.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8192:
--
Story Points: 3

> Update the scheduler library to support request/response API calls.
> ---
>
> Key: MESOS-8192
> URL: https://issues.apache.org/jira/browse/MESOS-8192
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The scheduler client/library should be updated to add support for API calls 
> following the request/response model, e.g., {{ReconcileOfferOperations}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8200:
---
Shepherd: Alexander Rukletsov

> Suppressed roles are not honoured for v1 scheduler subscribe requests.
> --
>
> Key: MESOS-8200
> URL: https://issues.apache.org/jira/browse/MESOS-8200
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler api, scheduler driver
>Reporter: Alexander Rukletsov
>Assignee: James Peach
>
> When triaging MESOS-7996 I've found out that 
> {{Call.subscribe.suppressed_roles}} field is empty when the master processes 
> the request from a v1 HTTP scheduler. More precisely, [this 
> conversion|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/master/http.cpp#L969]
>  wipes the field. This is likely because this conversion relies on a general 
> [protobuf conversion 
> utility|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/internal/devolve.cpp#L28-L50],
>  which fails to copy {{suppressed_roles}} because they have different tags, 
> compare 
> [v0|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/scheduler/scheduler.proto#L271]
>  and 
> [v1|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/v1/scheduler/scheduler.proto#L258].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246671#comment-16246671
 ] 

Alexander Rukletsov commented on MESOS-7996:


The root cause is MESOS-8200. The test in its current form does not test 
anything: when it fails it actually surfaces the actual bug in the code; when 
it succeeds, it is due to the race. I've verified that fixing MESOS-8200 also 
fixes the improved test (attached to the ticket).

> ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
> --
>
> Key: MESOS-7996
> URL: https://issues.apache.org/jira/browse/MESOS-7996
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Observed on Ubuntu 17.04 with SSL enabled
>Reporter: Alexander Rukletsov
>Assignee: James Peach
>  Labels: flaky-test, mesosphere
> Attachments: NoOffersWithAllRolesSuppressed-modified.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt
>
>
> Observed the failure on internal CI:
> {noformat}
> ../../src/tests/scheduler_tests.cpp:1474
> Mock function called more times than expected - returning directly.
> Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object 
> <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>)
>  Expected: to be never called
>Actual: called once - over-saturated and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7996:
---
Shepherd: Alexander Rukletsov

> ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
> --
>
> Key: MESOS-7996
> URL: https://issues.apache.org/jira/browse/MESOS-7996
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Observed on Ubuntu 17.04 with SSL enabled
>Reporter: Alexander Rukletsov
>Assignee: James Peach
>  Labels: flaky-test, mesosphere
> Attachments: NoOffersWithAllRolesSuppressed-modified.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt
>
>
> Observed the failure on internal CI:
> {noformat}
> ../../src/tests/scheduler_tests.cpp:1474
> Mock function called more times than expected - returning directly.
> Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object 
> <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>)
>  Expected: to be never called
>Actual: called once - over-saturated and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8193) Update master’s OfferOperationStatusUpdate handler to acknowledge updates to the agent if OfferOperationID is not set.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8193:
--
Story Points: 2

> Update master’s OfferOperationStatusUpdate handler to acknowledge updates to 
> the agent if OfferOperationID is not set.
> --
>
> Key: MESOS-8193
> URL: https://issues.apache.org/jira/browse/MESOS-8193
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8194) Make agent’s ApplyOfferOperationMessage handler support operations affecting default resources.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8194:
--
Story Points: 8

> Make agent’s ApplyOfferOperationMessage handler support operations affecting 
> default resources.
> ---
>
> Key: MESOS-8194
> URL: https://issues.apache.org/jira/browse/MESOS-8194
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The operations should be applied and it should send 
> {{OperationStatusUpdates}} to the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.

2017-11-09 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-7996:
--

Assignee: James Peach

> ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
> --
>
> Key: MESOS-7996
> URL: https://issues.apache.org/jira/browse/MESOS-7996
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Observed on Ubuntu 17.04 with SSL enabled
>Reporter: Alexander Rukletsov
>Assignee: James Peach
>  Labels: flaky-test, mesosphere
> Attachments: NoOffersWithAllRolesSuppressed-modified.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt
>
>
> Observed the failure on internal CI:
> {noformat}
> ../../src/tests/scheduler_tests.cpp:1474
> Mock function called more times than expected - returning directly.
> Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object 
> <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>)
>  Expected: to be never called
>Actual: called once - over-saturated and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.

2017-11-09 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-8200:
--

Assignee: James Peach

> Suppressed roles are not honoured for v1 scheduler subscribe requests.
> --
>
> Key: MESOS-8200
> URL: https://issues.apache.org/jira/browse/MESOS-8200
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler api, scheduler driver
>Reporter: Alexander Rukletsov
>Assignee: James Peach
>
> When triaging MESOS-7996 I've found out that 
> {{Call.subscribe.suppressed_roles}} field is empty when the master processes 
> the request from a v1 HTTP scheduler. More precisely, [this 
> conversion|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/master/http.cpp#L969]
>  wipes the field. This is likely because this conversion relies on a general 
> [protobuf conversion 
> utility|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/internal/devolve.cpp#L28-L50],
>  which fails to copy {{suppressed_roles}} because they have different tags, 
> compare 
> [v0|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/scheduler/scheduler.proto#L271]
>  and 
> [v1|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/v1/scheduler/scheduler.proto#L258].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8195) Implement explicit offer operation reconciliation between the master, agent and RPs.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8195:
--
Story Points: 3

> Implement explicit offer operation reconciliation between the master, agent 
> and RPs.
> 
>
> Key: MESOS-8195
> URL: https://issues.apache.org/jira/browse/MESOS-8195
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> Upon receiving an {{UpdateSlave}} message the master should compare its list 
> of pending operations for the agent/LRPs to the list of pending operations 
> contained in the message. It should then build a{{ ReconcileOfferOperations}} 
> message with all the operations missing in the {{UpdateSlave}} message and 
> send it to the agent.
> The agent will receive these messages and should handle them by itself if the 
> operations affect the default resources, or forward them to the RP manager 
> otherwise.
> The agent/RP handler should check if the operations are pending. If an 
> operation is not pending, then an {{ApplyOfferOperation}} message got 
> dropped, and the agent/LRP should send an {{OFFER_OPERATION_DROPPED}} status 
> update to the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8186:
--
Story Points: 3

> Implement the agent's AcknowledgeOfferOperationMessage handler.
> ---
>
> Key: MESOS-8186
> URL: https://issues.apache.org/jira/browse/MESOS-8186
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The handler should handle acks for operations handled by the agent, and 
> forward the ack to the RP manager for all other operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.

2017-11-09 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8200:
--

 Summary: Suppressed roles are not honoured for v1 scheduler 
subscribe requests.
 Key: MESOS-8200
 URL: https://issues.apache.org/jira/browse/MESOS-8200
 Project: Mesos
  Issue Type: Bug
  Components: scheduler api, scheduler driver
Reporter: Alexander Rukletsov


When triaging MESOS-7996 I've found out that 
{{Call.subscribe.suppressed_roles}} field is empty when the master processes 
the request from a v1 HTTP scheduler. More precisely, [this 
conversion|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/master/http.cpp#L969]
 wipes the field. This is likely because this conversion relies on a general 
[protobuf conversion 
utility|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/internal/devolve.cpp#L28-L50],
 which fails to copy {{suppressed_roles}} because they have different tags, 
compare 
[v0|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/scheduler/scheduler.proto#L271]
 and 
[v1|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/v1/scheduler/scheduler.proto#L258].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8187) Enable LRP to send operation status updates, checkpoint, and retry using the SUM

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8187:
--
Story Points: 8

> Enable LRP to send operation status updates, checkpoint, and retry using the 
> SUM
> 
>
> Key: MESOS-8187
> URL: https://issues.apache.org/jira/browse/MESOS-8187
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8191) Implement ReconcileOfferOperations handler in the master

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8191:
--
Story Points: 5

> Implement ReconcileOfferOperations handler in the master
> 
>
> Key: MESOS-8191
> URL: https://issues.apache.org/jira/browse/MESOS-8191
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The master will synchronously respond to the framework with a 
> {{OFFER_OPERATIONS_RECONCILIATION}} response.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8188) Enable agent to send operation status updates, checkpoint, and retry using the SUM

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8188:
--
Story Points: 8

> Enable agent to send operation status updates, checkpoint, and retry using 
> the SUM
> --
>
> Key: MESOS-8188
> URL: https://issues.apache.org/jira/browse/MESOS-8188
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8195) Implement explicit offer operation reconciliation between the master, agent and RPs.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8195:
--
Summary: Implement explicit offer operation reconciliation between the 
master, agent and RPs.  (was: Add an explicit offer operation reconciliation 
between the master, agent and RPs.)

> Implement explicit offer operation reconciliation between the master, agent 
> and RPs.
> 
>
> Key: MESOS-8195
> URL: https://issues.apache.org/jira/browse/MESOS-8195
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> Upon receiving an {{UpdateSlave}} message the master should compare its list 
> of pending operations for the agent/LRPs to the list of pending operations 
> contained in the message. It should then build a{{ ReconcileOfferOperations}} 
> message with all the operations missing in the {{UpdateSlave}} message and 
> send it to the agent.
> The agent will receive these messages and should handle them by itself if the 
> operations affect the default resources, or forward them to the RP manager 
> otherwise.
> The agent/RP handler should check if the operations are pending. If an 
> operation is not pending, then an {{ApplyOfferOperation}} message got 
> dropped, and the agent/LRP should send an {{OFFER_OPERATION_DROPPED}} status 
> update to the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.

2017-11-09 Thread JIRA
Gastón Kleiman created MESOS-8199:
-

 Summary: Add plumbing for explicit offer operation reconciliation 
between master, agent, and RPs.
 Key: MESOS-8199
 URL: https://issues.apache.org/jira/browse/MESOS-8199
 Project: Mesos
  Issue Type: Bug
Reporter: Gastón Kleiman






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8197) Implement a library to send offer operation status updates

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8197:
--
Issue Type: Task  (was: Bug)

> Implement a library to send offer operation status updates
> --
>
> Key: MESOS-8197
> URL: https://issues.apache.org/jira/browse/MESOS-8197
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8197) Implement a library to send offer operation status updates

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman reassigned MESOS-8197:
-

Shepherd: Greg Mann
Assignee: Gastón Kleiman

> Implement a library to send offer operation status updates
> --
>
> Key: MESOS-8197
> URL: https://issues.apache.org/jira/browse/MESOS-8197
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8199:
--
Labels: mesosphere  (was: )

> Add plumbing for explicit offer operation reconciliation between master, 
> agent, and RPs.
> 
>
> Key: MESOS-8199
> URL: https://issues.apache.org/jira/browse/MESOS-8199
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8190) Update the master to accept OfferOperationIDs from frameworks.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8190:
--
Labels: mesosphere  (was: )

> Update the master to accept OfferOperationIDs from frameworks.
> --
>
> Key: MESOS-8190
> URL: https://issues.apache.org/jira/browse/MESOS-8190
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> Master’s {{ACCEPT}} handler should send failed operation updates when a 
> framework sets the {{OfferOperationID}} on an operation destined for an agent 
> without the {{RESOURCE_PROVIDER}} capability.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8188) Enable agent to send operation status updates, checkpoint, and retry using the SUM

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8188:
--
Labels: mesosphere  (was: )

> Enable agent to send operation status updates, checkpoint, and retry using 
> the SUM
> --
>
> Key: MESOS-8188
> URL: https://issues.apache.org/jira/browse/MESOS-8188
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman reassigned MESOS-8199:
-

Shepherd: Jie Yu
Assignee: Greg Mann

> Add plumbing for explicit offer operation reconciliation between master, 
> agent, and RPs.
> 
>
> Key: MESOS-8199
> URL: https://issues.apache.org/jira/browse/MESOS-8199
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8191) Implement ReconcileOfferOperations handler in the master

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8191:
--
Labels: mesosphere  (was: )

> Implement ReconcileOfferOperations handler in the master
> 
>
> Key: MESOS-8191
> URL: https://issues.apache.org/jira/browse/MESOS-8191
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The master will synchronously respond to the framework with a 
> {{OFFER_OPERATIONS_RECONCILIATION}} response.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8184) Implement master's AcknowledgeOfferOperationMessage handler.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8184:
--
Labels: mesosphere  (was: )

> Implement master's AcknowledgeOfferOperationMessage handler.
> 
>
> Key: MESOS-8184
> URL: https://issues.apache.org/jira/browse/MESOS-8184
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> This handler should validate the message and forward it to the corresponding 
> agent/ERP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8198:
--
Labels: mesosphere  (was: )

> Update the ReconcileOfferOperations protos
> --
>
> Key: MESOS-8198
> URL: https://issues.apache.org/jira/browse/MESOS-8198
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> Some protos have been committed, but they follow an event-based API.
> We decided to follow the request/response model for this API, so we need to 
> update the protos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8186:
--
Labels: mesosphere  (was: )

> Implement the agent's AcknowledgeOfferOperationMessage handler.
> ---
>
> Key: MESOS-8186
> URL: https://issues.apache.org/jira/browse/MESOS-8186
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The handler should handle acks for operations handled by the agent, and 
> forward the ack to the RP manager for all other operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8193) Update master’s OfferOperationStatusUpdate handler to acknowledge updates to the agent if OfferOperationID is not set.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8193:
--
Labels: mesosphere  (was: )

> Update master’s OfferOperationStatusUpdate handler to acknowledge updates to 
> the agent if OfferOperationID is not set.
> --
>
> Key: MESOS-8193
> URL: https://issues.apache.org/jira/browse/MESOS-8193
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8199:
--
  Sprint: Mesosphere Sprint 68
Story Points: 2

> Add plumbing for explicit offer operation reconciliation between master, 
> agent, and RPs.
> 
>
> Key: MESOS-8199
> URL: https://issues.apache.org/jira/browse/MESOS-8199
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8198:
--
Issue Type: Task  (was: Bug)

> Update the ReconcileOfferOperations protos
> --
>
> Key: MESOS-8198
> URL: https://issues.apache.org/jira/browse/MESOS-8198
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> Some protos have been committed, but they follow an event-based API.
> We decided to follow the request/response model for this API, so we need to 
> update the protos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8199:
--
Issue Type: Task  (was: Bug)

> Add plumbing for explicit offer operation reconciliation between master, 
> agent, and RPs.
> 
>
> Key: MESOS-8199
> URL: https://issues.apache.org/jira/browse/MESOS-8199
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8192) Update the scheduler library to support request/response API calls.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8192:
--
Labels: mesosphere  (was: )

> Update the scheduler library to support request/response API calls.
> ---
>
> Key: MESOS-8192
> URL: https://issues.apache.org/jira/browse/MESOS-8192
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The scheduler client/library should be updated to add support for API calls 
> following the request/response model, e.g., {{ReconcileOfferOperations}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8189) Master’s OperationStatusUpdate handler should forward updates to the framework when OfferOperationID is set.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8189:
--
Labels: mesosphere  (was: )

> Master’s OperationStatusUpdate handler should forward updates to the 
> framework when OfferOperationID is set.
> 
>
> Key: MESOS-8189
> URL: https://issues.apache.org/jira/browse/MESOS-8189
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8187) Enable LRP to send operation status updates, checkpoint, and retry using the SUM

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8187:
--
Labels: mesosphere  (was: )

> Enable LRP to send operation status updates, checkpoint, and retry using the 
> SUM
> 
>
> Key: MESOS-8187
> URL: https://issues.apache.org/jira/browse/MESOS-8187
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8195) Add an explicit offer operation reconciliation between the master, agent and RPs.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8195:
--
Labels: mesosphere  (was: )

> Add an explicit offer operation reconciliation between the master, agent and 
> RPs.
> -
>
> Key: MESOS-8195
> URL: https://issues.apache.org/jira/browse/MESOS-8195
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> Upon receiving an {{UpdateSlave}} message the master should compare its list 
> of pending operations for the agent/LRPs to the list of pending operations 
> contained in the message. It should then build a{{ ReconcileOfferOperations}} 
> message with all the operations missing in the {{UpdateSlave}} message and 
> send it to the agent.
> The agent will receive these messages and should handle them by itself if the 
> operations affect the default resources, or forward them to the RP manager 
> otherwise.
> The agent/RP handler should check if the operations are pending. If an 
> operation is not pending, then an {{ApplyOfferOperation}} message got 
> dropped, and the agent/LRP should send an {{OFFER_OPERATION_DROPPED}} status 
> update to the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8055) Design doc for offer operations feedback

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8055:
--
Labels: mesosphere  (was: )

> Design doc for offer operations feedback
> 
>
> Key: MESOS-8055
> URL: https://issues.apache.org/jira/browse/MESOS-8055
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8197) Implement a library to send offer operation status updates

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8197:
--
Labels: mesosphere  (was: )

> Implement a library to send offer operation status updates
> --
>
> Key: MESOS-8197
> URL: https://issues.apache.org/jira/browse/MESOS-8197
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8194) Make agent’s ApplyOfferOperationMessage handler support operations affecting default resources.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8194:
--
Labels: mesosphere  (was: )

> Make agent’s ApplyOfferOperationMessage handler support operations affecting 
> default resources.
> ---
>
> Key: MESOS-8194
> URL: https://issues.apache.org/jira/browse/MESOS-8194
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> The operations should be applied and it should send 
> {{OperationStatusUpdates}} to the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8197) Implement a library to send offer operation status updates

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8197:
--
  Sprint: Mesosphere Sprint 68
Story Points: 8

> Implement a library to send offer operation status updates
> --
>
> Key: MESOS-8197
> URL: https://issues.apache.org/jira/browse/MESOS-8197
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8186:
--
Summary: Implement the agent's AcknowledgeOfferOperationMessage handler.  
(was: Add an AcknowledgeOfferOperationMessage handler to the agent)

> Implement the agent's AcknowledgeOfferOperationMessage handler.
> ---
>
> Key: MESOS-8186
> URL: https://issues.apache.org/jira/browse/MESOS-8186
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>
> The handler should handle acks for operations handled by the agent, and 
> forward the ack to the RP manager for all other operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8184) Implement master's AcknowledgeOfferOperationMessage handler.

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8184:
--
  Sprint: Mesosphere Sprint 68
Story Points: 3

> Implement master's AcknowledgeOfferOperationMessage handler.
> 
>
> Key: MESOS-8184
> URL: https://issues.apache.org/jira/browse/MESOS-8184
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>
> This handler should validate the message and forward it to the corresponding 
> agent/ERP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8198:
--
Sprint: Mesosphere Sprint 68

> Update the ReconcileOfferOperations protos
> --
>
> Key: MESOS-8198
> URL: https://issues.apache.org/jira/browse/MESOS-8198
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>
> Some protos have been committed, but they follow an event-based API.
> We decided to follow the request/response model for this API, so we need to 
> update the protos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos

2017-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-8198:
--
Story Points: 1

> Update the ReconcileOfferOperations protos
> --
>
> Key: MESOS-8198
> URL: https://issues.apache.org/jira/browse/MESOS-8198
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>
> Some protos have been committed, but they follow an event-based API.
> We decided to follow the request/response model for this API, so we need to 
> update the protos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8198) Update the ReconcileOfferOperations protos

2017-11-09 Thread JIRA
Gastón Kleiman created MESOS-8198:
-

 Summary: Update the ReconcileOfferOperations protos
 Key: MESOS-8198
 URL: https://issues.apache.org/jira/browse/MESOS-8198
 Project: Mesos
  Issue Type: Bug
Reporter: Gastón Kleiman


Some protos have been committed, but they follow an event-based API.

We decided to follow the request/response model for this API, so we need to 
update the protos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8197) Implement a library to send offer operation status updates

2017-11-09 Thread JIRA
Gastón Kleiman created MESOS-8197:
-

 Summary: Implement a library to send offer operation status updates
 Key: MESOS-8197
 URL: https://issues.apache.org/jira/browse/MESOS-8197
 Project: Mesos
  Issue Type: Bug
Reporter: Gastón Kleiman






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6792) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6792:
---
Labels: flaky-test tech-debt  (was: tech-debt)

> MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
> -
>
> Key: MESOS-6792
> URL: https://issues.apache.org/jira/browse/MESOS-6792
> Project: Mesos
>  Issue Type: Bug
> Environment: Fedora 25, clang, w/ optimizations, SSL build
>Reporter: Benjamin Bannier
>  Labels: flaky-test, tech-debt
>
> The test {{MasterSlaveReconciliationTest.ReconcileLostTask}} is flaky for me 
> as of {{e99ea9ce8b1de01dd8b3cac6675337edb6320f38}},
> {code}
> Repeating all tests (iteration 912) . . .
> Note: Google Test filter = <...>
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from MasterSlaveReconciliationTest
> [ RUN  ] MasterSlaveReconciliationTest.SlaveReregisterTerminatedExecutor
> I1214 04:41:11.559672  2005 cluster.cpp:160] Creating default 'local' 
> authorizer
> I1214 04:41:11.560848  2045 master.cpp:380] Master 
> 87dd8179-dd7d-4270-ace2-ea771b57371c (gru1.hw.ca1.mesosphere.com) started on 
> 192.99.40.208:37659
> I1214 04:41:11.560878  2045 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/cXHI89/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/home/bbannier/src/mesos/build/P/share/mesos/webui" 
> --work_dir="/tmp/cXHI89/master" --zk_session_timeout="10secs"
> I1214 04:41:11.561079  2045 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1214 04:41:11.561089  2045 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1214 04:41:11.561095  2045 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1214 04:41:11.561101  2045 credentials.hpp:39] Loading credentials for 
> authentication from '/tmp/cXHI89/credentials'
> I1214 04:41:11.561194  2045 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1214 04:41:11.561236  2045 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1214 04:41:11.561274  2045 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1214 04:41:11.561301  2045 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1214 04:41:11.561326  2045 master.cpp:584] Authorization enabled
> I1214 04:41:11.562155  2039 master.cpp:2045] Elected as the leading master!
> I1214 04:41:11.562173  2039 master.cpp:1568] Recovering from registrar
> I1214 04:41:11.562347  2045 registrar.cpp:362] Successfully fetched the 
> registry (0B) in 114944ns
> I1214 04:41:11.562441  2045 registrar.cpp:461] Applied 1 operations in 
> 7920ns; attempting to update the registry
> I1214 04:41:11.562621  2048 registrar.cpp:506] Successfully updated the 
> registry in 155136ns
> I1214 04:41:11.562664  2048 registrar.cpp:392] Successfully recovered 
> registrar
> I1214 04:41:11.562832  2044 master.cpp:1684] Recovered 0 agents from the 
> registry (166B); allowing 10mins for agents to re-register
> I1214 04:41:11.568444  2005 cluster.cpp:446] Creating default 'local' 
> authorizer
> I1214 04:41:11.569344  2005 sched.cpp:232] Version: 1.2.0
> I1214 04:41:11.569842  2035 slave.cpp:209] Mesos agent started on 
> (912)@192.99.40.208:37659
> I1214 04:41:11.570080  2040 sched.cpp:336] New master detected at 
> master@192.99.40.208:37659
> I1214 04:41:11.570117  2040 sched.cpp:402] Authenticating with master 
> master@192.99.40.208:37659
> I1214 04:41:11.570127  2040 sched.cpp:409] Using default CRAM-MD5 
> authenticatee
> I1214 04:41:11.570220  2040 authenticatee.cpp:121] Creating new client SASL 
> 

[jira] [Assigned] (MESOS-8000) DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-8000:
--

   Resolution: Duplicate
 Assignee: Armand Grillet
Fix Version/s: 1.5.0

> DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.
> ---
>
> Key: MESOS-8000
> URL: https://issues.apache.org/jira/browse/MESOS-8000
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>Assignee: Armand Grillet
>  Labels: flaky-test, mesosphere
> Fix For: 1.5.0
>
> Attachments: ROOT_VerifyContainerIP_badrun.txt, 
> ROOT_VerifyContainerIP_goodrun.txt
>
>
> Observed a failure on internal CI:
> {noformat}
> ../../src/tests/containerizer/cni_isolator_tests.cpp:1419
> Failed to wait 15secs for subscribed
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8049) MasterTest.RecoveredFramework is flaky and crashes.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8049:
---
Labels: flaky flaky-test  (was: crashed flaky flaky-test)

> MasterTest.RecoveredFramework is flaky and crashes.
> ---
>
> Key: MESOS-8049
> URL: https://issues.apache.org/jira/browse/MESOS-8049
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.5.0
> Environment: ubuntu-17.04
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test
>
> Observed on internal CI:
> {noformat}
> 00:35:26 [ RUN  ] MasterTest.RecoveredFramework
> 00:35:26 I0930 00:35:26.319862 27033 cluster.cpp:162] Creating default 
> 'local' authorizer
> 00:35:26 I0930 00:35:26.321624 27053 master.cpp:445] Master 
> 94ab36ee-4c02-457d-ae35-2f130ae826f5 (ip-172-16-10-150) started on 
> 172.16.10.150:37345
> 00:35:26 I0930 00:35:26.321647 27053 master.cpp:447] Flags at startup: 
> --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/Z8B1GQ/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/Z8B1GQ/master" 
> --zk_session_timeout="10secs"
> 00:35:26 I0930 00:35:26.321758 27053 master.cpp:497] Master only allowing 
> authenticated frameworks to register
> 00:35:26 I0930 00:35:26.321768 27053 master.cpp:511] Master only allowing 
> authenticated agents to register
> 00:35:26 I0930 00:35:26.321772 27053 master.cpp:524] Master only allowing 
> authenticated HTTP frameworks to register
> 00:35:26 I0930 00:35:26.321777 27053 credentials.hpp:37] Loading credentials 
> for authentication from '/tmp/Z8B1GQ/credentials'
> 00:35:26 I0930 00:35:26.321853 27053 master.cpp:569] Using default 'crammd5' 
> authenticator
> 00:35:26 I0930 00:35:26.321892 27053 http.cpp:1045] Creating default 'basic' 
> HTTP authenticator for realm 'mesos-master-readonly'
> 00:35:26 I0930 00:35:26.321923 27053 http.cpp:1045] Creating default 'basic' 
> HTTP authenticator for realm 'mesos-master-readwrite'
> 00:35:26 I0930 00:35:26.321946 27053 http.cpp:1045] Creating default 'basic' 
> HTTP authenticator for realm 'mesos-master-scheduler'
> 00:35:26 I0930 00:35:26.321969 27053 master.cpp:649] Authorization enabled
> 00:35:26 I0930 00:35:26.322120 27048 hierarchical.cpp:171] Initialized 
> hierarchical allocator process
> 00:35:26 I0930 00:35:26.322145 27048 whitelist_watcher.cpp:77] No whitelist 
> given
> 00:35:26 I0930 00:35:26.322657 27053 master.cpp:2216] Elected as the leading 
> master!
> 00:35:26 I0930 00:35:26.322679 27053 master.cpp:1705] Recovering from 
> registrar
> 00:35:26 I0930 00:35:26.322721 27053 registrar.cpp:347] Recovering registrar
> 00:35:26 I0930 00:35:26.322829 27048 registrar.cpp:391] Successfully fetched 
> the registry (0B) in 90368ns
> 00:35:26 I0930 00:35:26.322856 27048 registrar.cpp:495] Applied 1 operations 
> in 4113ns; attempting to update the registry
> 00:35:26 I0930 00:35:26.322960 27053 registrar.cpp:552] Successfully updated 
> the registry in 89088ns
> 00:35:26 I0930 00:35:26.323011 27053 registrar.cpp:424] Successfully 
> recovered registrar
> 00:35:26 I0930 00:35:26.323148 27054 master.cpp:1809] Recovered 0 agents from 
> the registry (146B); allowing 10mins for agents to re-register
> 00:35:26 I0930 00:35:26.323161 27047 hierarchical.cpp:209] Skipping recovery 
> of hierarchical allocator: nothing to recover
> 00:35:26 W0930 00:35:26.325556 27033 process.cpp:3194] Attempted to spawn 
> already running process files@172.16.10.150:37345
> 00:35:26 I0930 00:35:26.325654 27033 cluster.cpp:448] Creating default 
> 'local' authorizer
> 00:35:26 I0930 00:35:26.326050 27048 slave.cpp:254] Mesos agent started on 
> (250)@172.16.10.150:37345
> 

[jira] [Commented] (MESOS-7985) Use ASF CI for automating RPM packaging and upload to bintray.

2017-11-09 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246312#comment-16246312
 ] 

Kapil Arya commented on MESOS-7985:
---

{code}
commit 1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9 (HEAD -> master, apache/master)
Author: Kapil Arya 
Date:   Fri Nov 3 10:37:32 2017 -0400

Added bintray publishing scripts.

Review: https://reviews.apache.org/r/63543
{code}

CI Job: https://builds.apache.org/job/Mesos/job/Packaging/job/CentosRPMs/20/

The failure was due to the max file upload size limit set to ~250MB. The 
scripts have been updated to not upload debug rpms.

> Use ASF CI for automating RPM packaging and upload to bintray.
> --
>
> Key: MESOS-7985
> URL: https://issues.apache.org/jira/browse/MESOS-7985
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Kapil Arya
> Fix For: 1.5.0
>
>
> RR: https://reviews.apache.org/r/63543/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-2239) MasterAuthorizationTest.DuplicateRegistration is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-2239:
--

Assignee: Vinod Kone  (was: Chen Zhiwei)

> MasterAuthorizationTest.DuplicateRegistration is flaky
> --
>
> Key: MESOS-2239
> URL: https://issues.apache.org/jira/browse/MESOS-2239
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: CentOS5 gcc-4.8
>Reporter: Jie Yu
>Assignee: Vinod Kone
>  Labels: flaky
>
> {noformat}
> 19:30:44 DEBUG: [ RUN  ] MasterAuthorizationTest.DuplicateRegistration
> 19:30:44 DEBUG: Using temporary directory 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_lTKlxz'
> 19:30:44 DEBUG: I0121 19:30:44.583595 54842 leveldb.cpp:176] Opened db in 
> 2.002477ms
> 19:30:44 DEBUG: I0121 19:30:44.584470 54842 leveldb.cpp:183] Compacted db in 
> 848351ns
> 19:30:44 DEBUG: I0121 19:30:44.584492 54842 leveldb.cpp:198] Created db 
> iterator in 3830ns
> 19:30:44 DEBUG: I0121 19:30:44.584506 54842 leveldb.cpp:204] Seeked to 
> beginning of db in 962ns
> 19:30:44 DEBUG: I0121 19:30:44.584519 54842 leveldb.cpp:273] Iterated through 
> 0 keys in the db in 598ns
> 19:30:44 DEBUG: I0121 19:30:44.584537 54842 replica.cpp:744] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> 19:30:44 DEBUG: I0121 19:30:44.584684 54873 recover.cpp:449] Starting replica 
> recovery
> 19:30:44 DEBUG: I0121 19:30:44.584774 54859 recover.cpp:475] Replica is in 
> EMPTY status
> 19:30:44 DEBUG: I0121 19:30:44.586305 54881 replica.cpp:641] Replica in EMPTY 
> status received a broadcasted recover request
> 19:30:44 DEBUG: I0121 19:30:44.586943 54866 recover.cpp:195] Received a 
> recover response from a replica in EMPTY status
> 19:30:44 DEBUG: I0121 19:30:44.587247 54872 recover.cpp:566] Updating replica 
> status to STARTING
> 19:30:44 DEBUG: I0121 19:30:44.587838 54867 leveldb.cpp:306] Persisting 
> metadata (8 bytes) to leveldb took 393697ns
> 19:30:44 DEBUG: I0121 19:30:44.587862 54867 replica.cpp:323] Persisted 
> replica status to STARTING
> 19:30:44 DEBUG: I0121 19:30:44.587920 54877 recover.cpp:475] Replica is in 
> STARTING status
> 19:30:44 DEBUG: I0121 19:30:44.588341 54868 replica.cpp:641] Replica in 
> STARTING status received a broadcasted recover request
> 19:30:44 DEBUG: I0121 19:30:44.588577 54877 recover.cpp:195] Received a 
> recover response from a replica in STARTING status
> 19:30:44 DEBUG: I0121 19:30:44.589040 54863 recover.cpp:566] Updating replica 
> status to VOTING
> 19:30:44 DEBUG: I0121 19:30:44.589344 54871 leveldb.cpp:306] Persisting 
> metadata (8 bytes) to leveldb took 268257ns
> 19:30:44 DEBUG: I0121 19:30:44.589361 54871 replica.cpp:323] Persisted 
> replica status to VOTING
> 19:30:44 DEBUG: I0121 19:30:44.589426 54858 recover.cpp:580] Successfully 
> joined the Paxos group
> 19:30:44 DEBUG: I0121 19:30:44.589735 54858 recover.cpp:464] Recover process 
> terminated
> 19:30:44 DEBUG: I0121 19:30:44.593657 54866 master.cpp:262] Master 
> 20150121-193044-1711542956-52053-54842 (atlc-bev-05-sr1.corpdc.twttr.net) 
> started on 172.18.4.102:52053
> 19:30:44 DEBUG: I0121 19:30:44.593690 54866 master.cpp:308] Master only 
> allowing authenticated frameworks to register
> 19:30:44 DEBUG: I0121 19:30:44.593699 54866 master.cpp:313] Master only 
> allowing authenticated slaves to register
> 19:30:44 DEBUG: I0121 19:30:44.593708 54866 credentials.hpp:36] Loading 
> credentials for authentication from 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_lTKlxz/credentials'
> 19:30:44 DEBUG: I0121 19:30:44.593808 54866 master.cpp:357] Authorization 
> enabled
> 19:30:44 DEBUG: I0121 19:30:44.594319 54871 master.cpp:1219] The newly 
> elected leader is master@172.18.4.102:52053 with id 
> 20150121-193044-1711542956-52053-54842
> 19:30:44 DEBUG: I0121 19:30:44.594336 54871 master.cpp:1232] Elected as the 
> leading master!
> 19:30:44 DEBUG: I0121 19:30:44.594343 54871 master.cpp:1050] Recovering from 
> registrar
> 19:30:44 DEBUG: I0121 19:30:44.594403 54867 registrar.cpp:313] Recovering 
> registrar
> 19:30:44 DEBUG: I0121 19:30:44.594558 54858 log.cpp:660] Attempting to start 
> the writer
> 19:30:44 DEBUG: I0121 19:30:44.595000 54859 replica.cpp:477] Replica received 
> implicit promise request with proposal 1
> 19:30:44 DEBUG: I0121 19:30:44.595340 54859 leveldb.cpp:306] Persisting 
> metadata (8 bytes) to leveldb took 319942ns
> 19:30:44 DEBUG: I0121 19:30:44.595360 54859 replica.cpp:345] Persisted 
> promised to 1
> 19:30:44 DEBUG: I0121 19:30:44.595700 54878 coordinator.cpp:230] Coordinator 
> attemping to fill missing position
> 19:30:44 DEBUG: I0121 19:30:44.596330 54859 replica.cpp:378] Replica received 
> explicit promise request for position 0 with proposal 2
> 

[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7742:
---
Description: 
Observed this on ASF CI and internal Mesosphere CI. Affected tests:
{noformat}
AgentAPIStreamingTest.AttachInputToNestedContainerSession
AgentAPITest.LaunchNestedContainerSession
ContentType/AgentAPITest.AttachContainerInputAuthorization/0
AgentAPITest.LaunchNestedContainerSessionWithTTY/0
{noformat}

{code}
[ RUN  ] 
ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer
I0629 05:49:33.182234 25306 master.cpp:436] Master 
90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726
I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" -
-allocator="HierarchicalDRF" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --au
thenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
--framework_sorter="drf" --help="false" --hostn
ame_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="10
00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="in_memory" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registr
y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" -
-version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated 
agents to register
I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/a5h5J3/credentials'
I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
authenticator
I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
allocator process
I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
registry (0B) in 183040ns
I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; 
attempting to update the registry
I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
registry in 147200ns
I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar
I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
hierarchical allocator: nothing to recover
I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy'
I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer
I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on 
(644)@172.17.0.3:45726
I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 

[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-09 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu updated MESOS-8171:

Sprint: Mesosphere Sprint 67

> Using a failoverTimeout of 0 with Mesos native scheduler client can result in 
> infinite subscribe loop
> -
>
> Key: MESOS-8171
> URL: https://issues.apache.org/jira/browse/MESOS-8171
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api, java api, scheduler driver
>Affects Versions: 1.1.3, 1.2.2, 1.3.1, 1.4.0
>Reporter: Tim Harper
>Assignee: Meng Zhu
>Priority: Minor
>  Labels: mesosphere
>
> Over the past year, the Marathon team has been plagued with an issue that 
> hits our CI builds periodically in which the scheduler driver enters a tight 
> loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned 
> on debug logging for the client and the server, and it pointed to an issue 
> with the {{doReliableRegistration}} method in sched.cpp. Here's the logs:
> {code}
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 
> 127.0.1.1:60957 with 8 worker threads
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151785 13791 group.cpp:341] Group process 
> (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size 
> (joins, cancels, datas) = (0, 0, 0)
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in 
> ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' 
> at '/mesos' in ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0')
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152762 13791 group.cpp:700] Trying to get 
> '/mesos/json.info_00' in ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master 
> (UPID=master@172.16.10.95:32856) is detected
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157347 13787 sched.cpp:336] New master detected at 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to 
> register without authentication
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159785 13789 

[jira] [Commented] (MESOS-7699) "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable freshly released)

2017-11-09 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246113#comment-16246113
 ] 

Vinod Kone commented on MESOS-7699:
---

[~bennoe] should this be in reviewable?

> "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable 
> freshly released)
> ---
>
> Key: MESOS-7699
> URL: https://issues.apache.org/jira/browse/MESOS-7699
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.2.0
>Reporter: Adam Cecile
>Assignee: Benno Evers
>  Labels: autotools
>
> Hi,
> It seems the issue comes from a workaround added a while ago:
> https://reviews.apache.org/r/40326/
> https://reviews.apache.org/r/40327/
> When building with external libraries it turns out creating build commands 
> line with -isystem /usr/include which is clearly stated as being wrong, 
> according to GCC guys:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70129
> I'll do some testing by reverting all -isystem to -I and I'll let it know if 
> it gets built.
> Regards, Adam.
> {noformat}
> configure:21642: result: no
> configure:21642: checking glog/logging.h presence
> configure:21642: g++ -E -I/usr/include -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -Wdate-time -D_FORTIFY_SOURCE=2 -isystem /usr/include 
> -I/usr/include conftest.cpp
> In file included from /usr/include/c++/6/ext/string_conversions.h:41:0,
>  from /usr/include/c++/6/bits/basic_string.h:5417,
>  from /usr/include/c++/6/string:52,
>  from /usr/include/c++/6/bits/locale_classes.h:40,
>  from /usr/include/c++/6/bits/ios_base.h:41,
>  from /usr/include/c++/6/ios:42,
>  from /usr/include/c++/6/ostream:38,
>  from /usr/include/glog/logging.h:43,
>  from conftest.cpp:32:
> /usr/include/c++/6/cstdlib:75:25: fatal error: stdlib.h: No such file or 
> directory
>  #include_next 
>  ^
> compilation terminated.
> configure:21642: $? = 1
> configure: failed program was:
> | /* confdefs.h */
> | #define PACKAGE_NAME "mesos"
> | #define PACKAGE_TARNAME "mesos"
> | #define PACKAGE_VERSION "1.2.0"
> | #define PACKAGE_STRING "mesos 1.2.0"
> | #define PACKAGE_BUGREPORT ""
> | #define PACKAGE_URL ""
> | #define PACKAGE "mesos"
> | #define VERSION "1.2.0"
> | #define STDC_HEADERS 1
> | #define HAVE_SYS_TYPES_H 1
> | #define HAVE_SYS_STAT_H 1
> | #define HAVE_STDLIB_H 1
> | #define HAVE_STRING_H 1
> | #define HAVE_MEMORY_H 1
> | #define HAVE_STRINGS_H 1
> | #define HAVE_INTTYPES_H 1
> | #define HAVE_STDINT_H 1
> | #define HAVE_UNISTD_H 1
> | #define HAVE_DLFCN_H 1
> | #define LT_OBJDIR ".libs/"
> | #define HAVE_CXX11 1
> | #define HAVE_PTHREAD_PRIO_INHERIT 1
> | #define HAVE_PTHREAD 1
> | #define HAVE_LIBZ 1
> | #define HAVE_FTS_H 1
> | #define HAVE_APR_POOLS_H 1
> | #define HAVE_LIBAPR_1 1
> | #define HAVE_BOOST_VERSION_HPP 1
> | #define HAVE_LIBCURL 1
> | /* end confdefs.h.  */
> | #include 
> configure:21642: result: no
> configure:21642: checking for glog/logging.h
> configure:21642: result: no
> configure:21674: error: cannot find glog
> ---
> You have requested the use of a non-bundled glog but no suitable
> glog could be found.
> You may want specify the location of glog by providing a prefix
> path via --with-glog=DIR, or check that the path you provided is
> correct if you're already doing this.
> ---
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8078) Some fields went missing with no replacement in api/v1

2017-11-09 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8078:
--
Shepherd: Greg Mann  (was: Till Toenshoff)

> Some fields went missing with no replacement in api/v1
> --
>
> Key: MESOS-8078
> URL: https://issues.apache.org/jira/browse/MESOS-8078
> Project: Mesos
>  Issue Type: Story
>  Components: HTTP API
>Reporter: Dmitrii Rozhkov
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Hi friends, 
> These fields are available via the state.json but went missing in the v1 of 
> the API:
> leader_info
> start_time
> elected_time
> As we're showing them on the Overview page of the DC/OS UI, yet would like 
> not be using state.json, it would be great to have them somewhere in V1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7790) Design hierarchical quota allocation.

2017-11-09 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7790:
--
Shepherd: Benjamin Mahler

> Design hierarchical quota allocation.
> -
>
> Key: MESOS-7790
> URL: https://issues.apache.org/jira/browse/MESOS-7790
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>  Labels: multitenancy
>
> When quota is assigned in the role hierarchy (see MESOS-6375), it's possible 
> for there to be "undelegated" quota for a role. For example:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> Here, the "eng" role has 60 of its 90 cpus of quota delegated to its 
> children, and 30 cpus remain undelegated. We need to design how to allocate 
> these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" 
> role? Are they allocated to the "eng" role tree? If so, how do we determine 
> how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", 
> "eng/build").



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7790) Design hierarchical quota allocation.

2017-11-09 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246079#comment-16246079
 ] 

Vinod Kone commented on MESOS-7790:
---

[~mcypark] Can you link the design doc and move this to reviewable?

> Design hierarchical quota allocation.
> -
>
> Key: MESOS-7790
> URL: https://issues.apache.org/jira/browse/MESOS-7790
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>  Labels: multitenancy
>
> When quota is assigned in the role hierarchy (see MESOS-6375), it's possible 
> for there to be "undelegated" quota for a role. For example:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> Here, the "eng" role has 60 of its 90 cpus of quota delegated to its 
> children, and 30 cpus remain undelegated. We need to design how to allocate 
> these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" 
> role? Are they allocated to the "eng" role tree? If so, how do we determine 
> how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", 
> "eng/build").



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7905) GarbageCollectorIntegrationTest.ExitedFramework is flaky

2017-11-09 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7905:
--
Sprint: Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, 
Mesosphere Sprint 66  (was: Mesosphere Sprint 62, Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 67)

> GarbageCollectorIntegrationTest.ExitedFramework is flaky
> 
>
> Key: MESOS-7905
> URL: https://issues.apache.org/jira/browse/MESOS-7905
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Vinod Kone
>
> Observed this on ASF CI.
> {code}
> [ RUN  ] GarbageCollectorIntegrationTest.ExitedFramework
> I0818 23:51:42.881799  5882 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0818 23:51:42.884285  5907 master.cpp:442] Master 
> 6d3f4c59-27e2-4701-9f7f-7c1f301e7fba (ef22537e2401) started on 
> 172.17.0.3:57495
> I0818 23:51:42.884332  5907 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/rYJzr3/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/rYJzr3/master" --zk_session_timeout="10secs"
> I0818 23:51:42.884627  5907 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0818 23:51:42.884644  5907 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0818 23:51:42.884658  5907 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0818 23:51:42.884774  5907 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/rYJzr3/credentials'
> I0818 23:51:42.885066  5907 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0818 23:51:42.885213  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0818 23:51:42.885382  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0818 23:51:42.885512  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0818 23:51:42.885640  5907 master.cpp:646] Authorization enabled
> I0818 23:51:42.885818  5903 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0818 23:51:42.886016  5905 whitelist_watcher.cpp:77] No whitelist given
> I0818 23:51:42.889050  5908 master.cpp:2163] Elected as the leading master!
> I0818 23:51:42.889081  5908 master.cpp:1702] Recovering from registrar
> I0818 23:51:42.889387  5909 registrar.cpp:347] Recovering registrar
> I0818 23:51:42.889838  5909 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 409856ns
> I0818 23:51:42.889966  5909 registrar.cpp:495] Applied 1 operations in 
> 38859ns; attempting to update the registry
> I0818 23:51:42.890450  5909 registrar.cpp:552] Successfully updated the 
> registry in 425216ns
> I0818 23:51:42.890552  5909 registrar.cpp:424] Successfully recovered 
> registrar
> I0818 23:51:42.890890  5909 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0818 23:51:42.890969  5910 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0818 23:51:42.895795  5882 process.cpp:3228] Attempting to spawn already 
> spawned process files@172.17.0.3:57495
> I0818 23:51:42.896057  5882 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0818 23:51:42.897809  5904 slave.cpp:250] Mesos agent started on 
> (85)@172.17.0.3:57495
> I0818 23:51:42.897848  5904 slave.cpp:251] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> 

[jira] [Updated] (MESOS-8055) Design doc for offer operations feedback

2017-11-09 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8055:
--
Shepherd: Jie Yu

> Design doc for offer operations feedback
> 
>
> Key: MESOS-8055
> URL: https://issues.apache.org/jira/browse/MESOS-8055
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7905) GarbageCollectorIntegrationTest.ExitedFramework is flaky

2017-11-09 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya reassigned MESOS-7905:
-

Assignee: (was: Kapil Arya)

> GarbageCollectorIntegrationTest.ExitedFramework is flaky
> 
>
> Key: MESOS-7905
> URL: https://issues.apache.org/jira/browse/MESOS-7905
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Vinod Kone
>
> Observed this on ASF CI.
> {code}
> [ RUN  ] GarbageCollectorIntegrationTest.ExitedFramework
> I0818 23:51:42.881799  5882 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0818 23:51:42.884285  5907 master.cpp:442] Master 
> 6d3f4c59-27e2-4701-9f7f-7c1f301e7fba (ef22537e2401) started on 
> 172.17.0.3:57495
> I0818 23:51:42.884332  5907 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/rYJzr3/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/rYJzr3/master" --zk_session_timeout="10secs"
> I0818 23:51:42.884627  5907 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0818 23:51:42.884644  5907 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0818 23:51:42.884658  5907 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0818 23:51:42.884774  5907 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/rYJzr3/credentials'
> I0818 23:51:42.885066  5907 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0818 23:51:42.885213  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0818 23:51:42.885382  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0818 23:51:42.885512  5907 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0818 23:51:42.885640  5907 master.cpp:646] Authorization enabled
> I0818 23:51:42.885818  5903 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0818 23:51:42.886016  5905 whitelist_watcher.cpp:77] No whitelist given
> I0818 23:51:42.889050  5908 master.cpp:2163] Elected as the leading master!
> I0818 23:51:42.889081  5908 master.cpp:1702] Recovering from registrar
> I0818 23:51:42.889387  5909 registrar.cpp:347] Recovering registrar
> I0818 23:51:42.889838  5909 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 409856ns
> I0818 23:51:42.889966  5909 registrar.cpp:495] Applied 1 operations in 
> 38859ns; attempting to update the registry
> I0818 23:51:42.890450  5909 registrar.cpp:552] Successfully updated the 
> registry in 425216ns
> I0818 23:51:42.890552  5909 registrar.cpp:424] Successfully recovered 
> registrar
> I0818 23:51:42.890890  5909 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0818 23:51:42.890969  5910 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0818 23:51:42.895795  5882 process.cpp:3228] Attempting to spawn already 
> spawned process files@172.17.0.3:57495
> I0818 23:51:42.896057  5882 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0818 23:51:42.897809  5904 slave.cpp:250] Mesos agent started on 
> (85)@172.17.0.3:57495
> I0818 23:51:42.897848  5904 slave.cpp:251] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/GarbageCollectorIntegrationTest_ExitedFramework_DayibR/store/appc"
>  --authenticate_http_executors="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" 

[jira] [Assigned] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers

2017-11-09 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-8158:
---

Assignee: Gilbert Song

> Mesos Agent in docker neglects to retry discovering Task docker containers
> --
>
> Key: MESOS-8158
> URL: https://issues.apache.org/jira/browse/MESOS-8158
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, docker, executor
>Affects Versions: 1.4.0
> Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
>Reporter: Charles Allen
>Assignee: Gilbert Song
>
> I have attempted to launch Mesos agents inside of a docker container in such 
> a way where the agent docker can be replaced and recovered. Unfortunately I 
> hit a major snag in the way the mesos docker launching works.
> To test simple functionality a marathon app is setup that simply has the 
> following command: {{date && python -m SimpleHTTPServer $PORT0}} 
> That way the HTTP port can be accessed to assure things are being assigned 
> correctly, and the date is printed out in the log.
> When I attempt to start this marathon app, the mesos agent (inside a docker 
> container) properly launches an executor which properly creates a second task 
> that launches the python code. Here's the output from the executor logs (this 
> looks correct):
> {code}
> I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
> I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent 
> d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
> I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 
> 10.0.75.2
> I1101 20:34:03.428680 68281 executor.cpp:160] Starting task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002
> I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e 
> HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e 
> MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
> =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e 
> MARATHON_APP_RESOURCE_MEM=128.0 -e 
> MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e 
> MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
> SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e 
> PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v 
> /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
> .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 
> --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 
> -c date && p
> ython -m SimpleHTTPServer $PORT0
> I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container 
> not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> Wed Nov  1 20:34:06 UTC 2017
> {code}
> But, somehow there is a TASK_FAILED message sent to marathon.
> Upon further investigation, the following snippet can be found in the agent 
> logs (running in a docker container)
> {code}
> I1101 20:34:00.949129 9 slave.cpp:1736] Got assigned task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.950150 9 gc.cpp:93] Unscheduling 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
>  from gc
> I1101 20:34:00.950225 9 gc.cpp:93] Unscheduling 
> 

[jira] [Updated] (MESOS-7881) Building gRPC with CMake

2017-11-09 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7881:
--
Sprint: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, 
Mesosphere Sprint 64, Mesosphere Sprint 66  (was: Mesosphere Sprint 61, 
Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere 
Sprint 66, Mesosphere Sprint 67)

> Building gRPC with CMake
> 
>
> Key: MESOS-7881
> URL: https://issues.apache.org/jira/browse/MESOS-7881
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
> Fix For: 1.4.0
>
>
> gRPC manages its own third-party libraries, which overlap with Mesos' 
> third-party library bundles. We need to write proper rules in CMake to 
> configure gRPC's CMake properly to build it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7924) Add a javascript linter to the webui.

2017-11-09 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245903#comment-16245903
 ] 

Kevin Klues commented on MESOS-7924:


{noformat}
commit 844590611067d04de86a2de923b21ef377554728 (HEAD -> master, 
upstream/master)
Author: Armand Grillet 
Date:   Thu Nov 9 16:53:40 2017 +0100

Added JavaScript linter.

The linter runs when changes on a JavaScript file are being committed.
We use ESLint with a configuration based on our current JS code base.
The linter and its dependencies (i.e. Node.js) are installed in a
virtual environment using Virtualenv and then Nodeenv.

Review: https://reviews.apache.org/r/62214/
{noformat}

> Add a javascript linter to the webui.
> -
>
> Key: MESOS-7924
> URL: https://issues.apache.org/jira/browse/MESOS-7924
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Armand Grillet
>  Labels: tech-debt
> Fix For: 1.5.0
>
>
> As far as I can tell, javascript linters (e.g. ESLint) help catch some 
> functional errors as well, for example, we've made some "strict" mistakes a 
> few times that ESLint can catch: MESOS-6624, MESOS-7912.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7506:
---
Description: 
I've observed a number of flaky tests that leave orphan containers upon 
cleanup. A typical log looks like this:
{noformat}
../../src/tests/cluster.cpp:580: Failure
Value of: containers->empty()
  Actual: false
Expected: true
Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
{noformat}

All currently affected tests:
{noformat}
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
SlaveRecoveryTest/0.RecoverUnregisteredExecutor
SlaveRecoveryTest/0.CleanupExecutor
SlaveRecoveryTest/0.RecoverTerminatedExecutor
SlaveTest.ShutdownUnregisteredExecutor
SlaveTest.RestartSlaveRequireExecutorAuthentication
ShutdownUnregisteredExecutor
{noformat}

  was:
I've observed a number of flaky tests that leave orphan containers upon 
cleanup. A typical log looks like this:
{noformat}
../../src/tests/cluster.cpp:580: Failure
Value of: containers->empty()
  Actual: false
Expected: true
Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
{noformat}

All currently affected tests:
{noformat}
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
SlaveRecoveryTest/0.RecoverUnregisteredExecutor
SlaveRecoveryTest/0.CleanupExecutor
SlaveRecoveryTest/0.RecoverTerminatedExecutor
SlaveTest.ShutdownUnregisteredExecutor
SlaveTest.RestartSlaveRequireExecutorAuthentication
ShutdownUnregisteredExecutor
{noformat}


> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
> Attachments: KillMultipleTasks-badrun.txt, 
> ResourceLimitation-badrun.txt, TaskWithFileURI-badrun.txt
>
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}
> All currently affected tests:
> {noformat}
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
> SlaveRecoveryTest/0.RecoverUnregisteredExecutor
> SlaveRecoveryTest/0.CleanupExecutor
> SlaveRecoveryTest/0.RecoverTerminatedExecutor
> SlaveTest.ShutdownUnregisteredExecutor
> SlaveTest.RestartSlaveRequireExecutorAuthentication
> ShutdownUnregisteredExecutor
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7506:
---
Description: 
I've observed a number of flaky tests that leave orphan containers upon 
cleanup. A typical log looks like this:
{noformat}
../../src/tests/cluster.cpp:580: Failure
Value of: containers->empty()
  Actual: false
Expected: true
Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
{noformat}

All currently affected tests:
{noformat}
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
SlaveRecoveryTest/0.RecoverUnregisteredExecutor
SlaveRecoveryTest/0.CleanupExecutor
SlaveRecoveryTest/0.RecoverTerminatedExecutor
SlaveTest.ShutdownUnregisteredExecutor
SlaveTest.RestartSlaveRequireExecutorAuthentication
ShutdownUnregisteredExecutor
{noformat}

  was:
I've observed a number of flaky tests that leave orphan containers upon 
cleanup. A typical log looks like this:
{noformat}
../../src/tests/cluster.cpp:580: Failure
Value of: containers->empty()
  Actual: false
Expected: true
Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
{noformat}

All currently affected tests:
{noformat}
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
SlaveRecoveryTest/0.RecoverUnregisteredExecutor
SlaveRecoveryTest/0.CleanupExecutor
SlaveRecoveryTest/0.RecoverTerminatedExecutor
SlaveTest.ShutdownUnregisteredExecutor
SlaveTest.RestartSlaveRequireExecutorAuthentication
ShutdownUnregisteredExecutor
{noformat}


> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
> Attachments: KillMultipleTasks-badrun.txt, 
> ResourceLimitation-badrun.txt, TaskWithFileURI-badrun.txt
>
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}
> All currently affected tests:
> {noformat}
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
> SlaveRecoveryTest/0.RecoverUnregisteredExecutor
> SlaveRecoveryTest/0.CleanupExecutor
> SlaveRecoveryTest/0.RecoverTerminatedExecutor
> SlaveTest.ShutdownUnregisteredExecutor
> SlaveTest.RestartSlaveRequireExecutorAuthentication
> ShutdownUnregisteredExecutor
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7924) Add a javascript linter to the webui.

2017-11-09 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-7924:
---
Shepherd: Kevin Klues  (was: Benjamin Mahler)

> Add a javascript linter to the webui.
> -
>
> Key: MESOS-7924
> URL: https://issues.apache.org/jira/browse/MESOS-7924
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Armand Grillet
>  Labels: tech-debt
> Fix For: 1.5.0
>
>
> As far as I can tell, javascript linters (e.g. ESLint) help catch some 
> functional errors as well, for example, we've made some "strict" mistakes a 
> few times that ESLint can catch: MESOS-6624, MESOS-7912.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7924) Add a javascript linter to the webui.

2017-11-09 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245903#comment-16245903
 ] 

Kevin Klues edited comment on MESOS-7924 at 11/9/17 4:01 PM:
-

{noformat}
commit 844590611067d04de86a2de923b21ef377554728
Author: Armand Grillet 
Date:   Thu Nov 9 16:53:40 2017 +0100

Added JavaScript linter.

The linter runs when changes on a JavaScript file are being committed.
We use ESLint with a configuration based on our current JS code base.
The linter and its dependencies (i.e. Node.js) are installed in a
virtual environment using Virtualenv and then Nodeenv.

Review: https://reviews.apache.org/r/62214/
{noformat}


was (Author: klueska):
{noformat}
commit 844590611067d04de86a2de923b21ef377554728 (HEAD -> master, 
upstream/master)
Author: Armand Grillet 
Date:   Thu Nov 9 16:53:40 2017 +0100

Added JavaScript linter.

The linter runs when changes on a JavaScript file are being committed.
We use ESLint with a configuration based on our current JS code base.
The linter and its dependencies (i.e. Node.js) are installed in a
virtual environment using Virtualenv and then Nodeenv.

Review: https://reviews.apache.org/r/62214/
{noformat}

> Add a javascript linter to the webui.
> -
>
> Key: MESOS-7924
> URL: https://issues.apache.org/jira/browse/MESOS-7924
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Armand Grillet
>  Labels: tech-debt
> Fix For: 1.5.0
>
>
> As far as I can tell, javascript linters (e.g. ESLint) help catch some 
> functional errors as well, for example, we've made some "strict" mistakes a 
> few times that ESLint can catch: MESOS-6624, MESOS-7912.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7506:
---
Attachment: ResourceLimitation-badrun2.txt

> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
> Attachments: KillMultipleTasks-badrun.txt, 
> ResourceLimitation-badrun.txt, ResourceLimitation-badrun2.txt, 
> TaskWithFileURI-badrun.txt
>
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}
> All currently affected tests:
> {noformat}
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
> SlaveRecoveryTest/0.RecoverUnregisteredExecutor
> SlaveRecoveryTest/0.CleanupExecutor
> SlaveRecoveryTest/0.RecoverTerminatedExecutor
> SlaveTest.ShutdownUnregisteredExecutor
> SlaveTest.RestartSlaveRequireExecutorAuthentication
> ShutdownUnregisteredExecutor
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7506:
---
Description: 
I've observed a number of flaky tests that leave orphan containers upon 
cleanup. A typical log looks like this:
{noformat}
../../src/tests/cluster.cpp:580: Failure
Value of: containers->empty()
  Actual: false
Expected: true
Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
{noformat}

All currently affected tests:
{noformat}
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
SlaveRecoveryTest/0.RecoverUnregisteredExecutor
SlaveRecoveryTest/0.CleanupExecutor
SlaveRecoveryTest/0.RecoverTerminatedExecutor
SlaveTest.ShutdownUnregisteredExecutor
SlaveTest.RestartSlaveRequireExecutorAuthentication
ShutdownUnregisteredExecutor
{noformat}

  was:
I've observed a number of flaky tests that leave orphan containers upon 
cleanup. A typical log looks like this:
{noformat}
../../src/tests/cluster.cpp:580: Failure
Value of: containers->empty()
  Actual: false
Expected: true
Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
{noformat}

All currently affected tests:
{noformat}
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
SlaveRecoveryTest/0.RecoverUnregisteredExecutor
SlaveRecoveryTest/0.CleanupExecutor
SlaveRecoveryTest/0.RecoverTerminatedExecutor
SlaveTest.ShutdownUnregisteredExecutor
ShutdownUnregisteredExecutor
{noformat}


> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
> Attachments: KillMultipleTasks-badrun.txt, 
> ResourceLimitation-badrun.txt, TaskWithFileURI-badrun.txt
>
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}
> All currently affected tests:
> {noformat}
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0
> SlaveRecoveryTest/0.RecoverUnregisteredExecutor
> SlaveRecoveryTest/0.CleanupExecutor
> SlaveRecoveryTest/0.RecoverTerminatedExecutor
> SlaveTest.ShutdownUnregisteredExecutor
> SlaveTest.RestartSlaveRequireExecutorAuthentication
> ShutdownUnregisteredExecutor
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7742:
---
Description: 
Observed this on ASF CI and internal Mesosphere CI. Affected tests:
{noformat}
AgentAPIStreamingTest.AttachInputToNestedContainerSession
AgentAPITest.LaunchNestedContainerSession
ContentType/AgentAPITest.AttachContainerInputAuthorization/0
{noformat}

{code}
[ RUN  ] 
ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer
I0629 05:49:33.182234 25306 master.cpp:436] Master 
90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726
I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" -
-allocator="HierarchicalDRF" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --au
thenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
--framework_sorter="drf" --help="false" --hostn
ame_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="10
00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="in_memory" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registr
y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" -
-version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated 
agents to register
I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/a5h5J3/credentials'
I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
authenticator
I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
allocator process
I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
registry (0B) in 183040ns
I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; 
attempting to update the registry
I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
registry in 147200ns
I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar
I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
hierarchical allocator: nothing to recover
I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy'
I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer
I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on 
(644)@172.17.0.3:45726
I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 

[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7742:
---
Description: 
Observed this on ASF CI and internal Mesosphere CI. Affected tests:
AgentAPIStreamingTest.AttachInputToNestedContainerSession
AgentAPITest.LaunchNestedContainerSession
ContentType/AgentAPITest.AttachContainerInputAuthorization/0

{code}
[ RUN  ] 
ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer
I0629 05:49:33.182234 25306 master.cpp:436] Master 
90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726
I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" -
-allocator="HierarchicalDRF" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --au
thenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
--framework_sorter="drf" --help="false" --hostn
ame_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="10
00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="in_memory" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registr
y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" -
-version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated 
agents to register
I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/a5h5J3/credentials'
I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
authenticator
I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
allocator process
I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
registry (0B) in 183040ns
I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; 
attempting to update the registry
I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
registry in 147200ns
I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar
I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
hierarchical allocator: nothing to recover
I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy'
I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer
I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on 
(644)@172.17.0.3:45726
I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 

[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7742:
---
Description: 
Observed this on ASF CI and internal Mesosphere CI. Affected tests:
AgentAPIStreamingTest.AttachInputToNestedContainerSession

{code}
[ RUN  ] 
ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer
I0629 05:49:33.182234 25306 master.cpp:436] Master 
90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726
I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" -
-allocator="HierarchicalDRF" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --au
thenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
--framework_sorter="drf" --help="false" --hostn
ame_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="10
00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="in_memory" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registr
y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" -
-version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated 
agents to register
I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/a5h5J3/credentials'
I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
authenticator
I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
allocator process
I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
registry (0B) in 183040ns
I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; 
attempting to update the registry
I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
registry in 147200ns
I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar
I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
hierarchical allocator: nothing to recover
I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy'
I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer
I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on 
(644)@172.17.0.3:45726
I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" 

[jira] [Commented] (MESOS-7924) Add a javascript linter to the webui.

2017-11-09 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245805#comment-16245805
 ] 

Kevin Klues commented on MESOS-7924:


{noformat}
commit 0f674cb7fcc827ef241dc76fa40139e86717
Author: Armand Grillet 
Date:   Thu Nov 9 16:17:12 2017 +0100

Removed pylint from the CLI requirements.

Due to the new virtual environment located in /support, we do
not need to have pylint in the CLI virtual environment anymore.

Review: https://reviews.apache.org/r/63582/
{noformat}

> Add a javascript linter to the webui.
> -
>
> Key: MESOS-7924
> URL: https://issues.apache.org/jira/browse/MESOS-7924
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Armand Grillet
>  Labels: tech-debt
> Fix For: 1.5.0
>
>
> As far as I can tell, javascript linters (e.g. ESLint) help catch some 
> functional errors as well, for example, we've made some "strict" mistakes a 
> few times that ESLint can catch: MESOS-6624, MESOS-7912.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7742:
---
Description: 
Observed this on ASF CI and internal Mesosphere CI. Affected tests:

{code}
[ RUN  ] 
ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer
I0629 05:49:33.182234 25306 master.cpp:436] Master 
90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726
I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" -
-allocator="HierarchicalDRF" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --au
thenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
--framework_sorter="drf" --help="false" --hostn
ame_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="10
00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="in_memory" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registr
y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" -
-version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated 
agents to register
I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/a5h5J3/credentials'
I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
authenticator
I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
allocator process
I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
registry (0B) in 183040ns
I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; 
attempting to update the registry
I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
registry in 147200ns
I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar
I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
hierarchical allocator: nothing to recover
I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy'
I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer
I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on 
(644)@172.17.0.3:45726
I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" 

  1   2   >