[jira] [Updated] (MESOS-8176) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-8176: --- Description: Observed it today in the internal CI: {noformat} ../../src/tests/slave_recovery_tests.cpp:4708 Value of: usage->has_cpus_limit() Actual: false Expected: true {noformat} Full log attached. This seems to be different from MESOS-5048 and MESOS-6481 was: Observed it today in the internal CI: {noformat} ../../src/tests/slave_recovery_tests.cpp:4708 Value of: usage->has_cpus_limit() Actual: false Expected: true {noformat} Full log attached. > MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky. > > > Key: MESOS-8176 > URL: https://issues.apache.org/jira/browse/MESOS-8176 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.4.0 > Environment: ec2 Ubuntu 16.04, autotools, no SSL >Reporter: Alexander Rukletsov > Labels: flaky-test > Attachments: ResourceStatistics-badrun.txt > > > Observed it today in the internal CI: > {noformat} > ../../src/tests/slave_recovery_tests.cpp:4708 > Value of: usage->has_cpus_limit() > Actual: false > Expected: true > {noformat} > Full log attached. > This seems to be different from MESOS-5048 and MESOS-6481 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-5048) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
[ https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246842#comment-16246842 ] Alexander Rukletsov commented on MESOS-5048: Observed in the internal CI, attached "ResourceStatistics-badrun2.txt" log. > MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky > --- > > Key: MESOS-5048 > URL: https://issues.apache.org/jira/browse/MESOS-5048 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.28.0 > Environment: Ubuntu 15.04, Ubuntu 16.04 >Reporter: Jian Qiu > Labels: flaky-test > Attachments: ResourceStatistics-badrun2.txt > > > ./mesos-tests.sh > --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics > --gtest_repeat=100 --gtest_break_on_failure > This is found in rb, and reproduced in my local machine. There are two types > of failures. However, the failure does not appear when enabling verbose... > {code} > ../../src/tests/environment.cpp:790: Failure > Failed > Tests completed with child processes remaining: > -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests > \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor >\--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor > {code} > And > {code} > I0328 15:42:36.982471 5687 exec.cpp:150] Version: 0.29.0 > I0328 15:42:37.008765 5708 exec.cpp:225] Executor registered on slave > 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0 > Registered executor on mesos > ../../src/tests/slave_recovery_tests.cpp:3506: Failure > Value of: containers.get().size() > Actual: 0 > Expected: 1u > Which is: 1 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-5048) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
[ https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-5048: --- Attachment: ResourceStatistics-badrun2.txt > MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky > --- > > Key: MESOS-5048 > URL: https://issues.apache.org/jira/browse/MESOS-5048 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.28.0 > Environment: Ubuntu 15.04, Ubuntu 16.04 >Reporter: Jian Qiu > Labels: flaky-test > Attachments: ResourceStatistics-badrun2.txt > > > ./mesos-tests.sh > --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics > --gtest_repeat=100 --gtest_break_on_failure > This is found in rb, and reproduced in my local machine. There are two types > of failures. However, the failure does not appear when enabling verbose... > {code} > ../../src/tests/environment.cpp:790: Failure > Failed > Tests completed with child processes remaining: > -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests > \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor >\--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor > {code} > And > {code} > I0328 15:42:36.982471 5687 exec.cpp:150] Version: 0.29.0 > I0328 15:42:37.008765 5708 exec.cpp:225] Executor registered on slave > 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0 > Registered executor on mesos > ../../src/tests/slave_recovery_tests.cpp:3506: Failure > Value of: containers.get().size() > Actual: 0 > Expected: 1u > Which is: 1 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-5048) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
[ https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-5048: --- Environment: Ubuntu 15.04, Ubuntu 16.04 (was: Ubuntu 15.04) > MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky > --- > > Key: MESOS-5048 > URL: https://issues.apache.org/jira/browse/MESOS-5048 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.28.0 > Environment: Ubuntu 15.04, Ubuntu 16.04 >Reporter: Jian Qiu > Labels: flaky-test > > ./mesos-tests.sh > --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics > --gtest_repeat=100 --gtest_break_on_failure > This is found in rb, and reproduced in my local machine. There are two types > of failures. However, the failure does not appear when enabling verbose... > {code} > ../../src/tests/environment.cpp:790: Failure > Failed > Tests completed with child processes remaining: > -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests > \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor >\--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor > {code} > And > {code} > I0328 15:42:36.982471 5687 exec.cpp:150] Version: 0.29.0 > I0328 15:42:37.008765 5708 exec.cpp:225] Executor registered on slave > 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0 > Registered executor on mesos > ../../src/tests/slave_recovery_tests.cpp:3506: Failure > Value of: containers.get().size() > Actual: 0 > Expected: 1u > Which is: 1 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8205) MasterAuthorizerTest/1.FilterOrphanedTasks is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-8205: --- Attachment: FilterOrphanedTasks-badrun.txt > MasterAuthorizerTest/1.FilterOrphanedTasks is flaky. > > > Key: MESOS-8205 > URL: https://issues.apache.org/jira/browse/MESOS-8205 > Project: Mesos > Issue Type: Bug > Components: test > Environment: CentOS 6 >Reporter: Alexander Rukletsov > Labels: flaky-test > Attachments: FilterOrphanedTasks-badrun.txt > > > Observed today in the internal CI. Full log attached. > {noformat} > ../../src/tests/master_authorization_tests.cpp:2239 > Failed to wait 15secs for statusUpdate > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8205) MasterAuthorizerTest/1.FilterOrphanedTasks is flaky.
Alexander Rukletsov created MESOS-8205: -- Summary: MasterAuthorizerTest/1.FilterOrphanedTasks is flaky. Key: MESOS-8205 URL: https://issues.apache.org/jira/browse/MESOS-8205 Project: Mesos Issue Type: Bug Components: test Environment: CentOS 6 Reporter: Alexander Rukletsov Observed today in the internal CI. Full log attached. {noformat} ../../src/tests/master_authorization_tests.cpp:2239 Failed to wait 15secs for statusUpdate {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.
[ https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-8200: -- Assignee: Yan Xu (was: James Peach) > Suppressed roles are not honoured for v1 scheduler subscribe requests. > -- > > Key: MESOS-8200 > URL: https://issues.apache.org/jira/browse/MESOS-8200 > Project: Mesos > Issue Type: Bug > Components: scheduler api, scheduler driver >Reporter: Alexander Rukletsov >Assignee: Yan Xu > > When triaging MESOS-7996 I've found out that > {{Call.subscribe.suppressed_roles}} field is empty when the master processes > the request from a v1 HTTP scheduler. More precisely, [this > conversion|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/master/http.cpp#L969] > wipes the field. This is likely because this conversion relies on a general > [protobuf conversion > utility|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/internal/devolve.cpp#L28-L50], > which fails to copy {{suppressed_roles}} because they have different tags, > compare > [v0|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/scheduler/scheduler.proto#L271] > and > [v1|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/v1/scheduler/scheduler.proto#L258]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8052) "protoc" not found when running "make -j4 check" directly in stout
[ https://issues.apache.org/jira/browse/MESOS-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-8052: -- Fix Version/s: (was: 1.4.1) > "protoc" not found when running "make -j4 check" directly in stout > -- > > Key: MESOS-8052 > URL: https://issues.apache.org/jira/browse/MESOS-8052 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > Labels: compile-error > > If we run {{make -j4 check}} without running {{make}} first, we will get the > following error message: > {noformat} > 3rdparty/protobuf-3.3.0/src/protoc -I../tests --cpp_out=. > ../tests/protobuf_tests.proto > /bin/bash: 3rdparty/protobuf-3.3.0/src/protoc: No such file or directory > Makefile:1934: recipe for target 'protobuf_tests.pb.cc' failed > make: *** [protobuf_tests.pb.cc] Error 127 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-7996: -- Assignee: Yan Xu (was: James Peach) > ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky. > -- > > Key: MESOS-7996 > URL: https://issues.apache.org/jira/browse/MESOS-7996 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.5.0 > Environment: Observed on Ubuntu 17.04 with SSL enabled >Reporter: Alexander Rukletsov >Assignee: Yan Xu > Labels: flaky-test, mesosphere > Attachments: NoOffersWithAllRolesSuppressed-modified.txt, > SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, > SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt > > > Observed the failure on internal CI: > {noformat} > ../../src/tests/scheduler_tests.cpp:1474 > Mock function called more times than expected - returning directly. > Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object > <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>) > Expected: to be never called >Actual: called once - over-saturated and active > {noformat} > Full log attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery
[ https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7939: -- Fix Version/s: (was: 1.4.1) > Early disk usage check for garbage collection during recovery > - > > Key: MESOS-7939 > URL: https://issues.apache.org/jira/browse/MESOS-7939 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Critical > > Currently the default value for `disk_watch_interval` is 1 minute. This is > not fast enough and could lead to the following scenario: > 1. The disk usage was checked and there was not enough headroom: > {noformat} > I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max > allowed age: 0ns > {noformat} > But no container was pruned because no container had been scheduled for GC. > 2. A task was completed. The task itself contained a lot of nested > containers, each used a lot of disk space. Note that there is no way for > Mesos agent to schedule individual nested containers for GC since nested > containers are not necessarily tied to tasks. When the top-lovel container is > completed, it was scheduled for GC, and the nested containers would be GC'ed > as well: > {noformat} > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc' > for gc 1.9466483852days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e' > for gc 1.9466405037days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc' > for gc 1.946635763days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e' > for gc 1.9466324148days in the future > {noformat} > 3. Since the next disk usage check was still 40ish seconds away, no GC was > performed even though the disk was full. As a result, Mesos agent failed to > checkpoint the task status: > {noformat} > I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status > update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: > Failed to open > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates' > for status updates: No space left on device Failed to handle status update > TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > {noformat} > 4. When the agent restarted, it tried to checkpoint the task status again. > However, since the first disk usage check was scheduled 1 minute after > startup, the agent failed before GC kicked in, falling into a restart failure > loop: > {noformat} > F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: > Failed to open > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates' > for status updates: No space left on device Failed to handle status update > TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > {noformat} > We should kick in GC early, so the agent can recover from this state. > Related ticket: MESOS-7031 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8204) HealthCheckTest.ROOT_DOCKER_DockerHealthyTaskViaHTTP is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-8204: --- Attachment: ROOT_DOCKER_DockerHealthyTaskViaHTTP-badrun.txt > HealthCheckTest.ROOT_DOCKER_DockerHealthyTaskViaHTTP is flaky. > -- > > Key: MESOS-8204 > URL: https://issues.apache.org/jira/browse/MESOS-8204 > Project: Mesos > Issue Type: Bug > Components: test > Environment: Ubuntu 17.04 with SSL >Reporter: Alexander Rukletsov > Labels: flaky-test > Attachments: ROOT_DOCKER_DockerHealthyTaskViaHTTP-badrun.txt > > > Observed today in the internal CI. Full log attached. > {noformat} > ../../src/tests/health_check_tests.cpp:2048 > Failed to wait 15secs for statusHealthy > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8204) HealthCheckTest.ROOT_DOCKER_DockerHealthyTaskViaHTTP is flaky.
Alexander Rukletsov created MESOS-8204: -- Summary: HealthCheckTest.ROOT_DOCKER_DockerHealthyTaskViaHTTP is flaky. Key: MESOS-8204 URL: https://issues.apache.org/jira/browse/MESOS-8204 Project: Mesos Issue Type: Bug Components: test Environment: Ubuntu 17.04 with SSL Reporter: Alexander Rukletsov Observed today in the internal CI. Full log attached. {noformat} ../../src/tests/health_check_tests.cpp:2048 Failed to wait 15secs for statusHealthy {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8203) SchedulerTest.TaskGroupRunning is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-8203: --- Attachment: TaskGroupRunning-badrun.txt > SchedulerTest.TaskGroupRunning is flaky. > > > Key: MESOS-8203 > URL: https://issues.apache.org/jira/browse/MESOS-8203 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Alexander Rukletsov > Labels: flaky-test > Attachments: TaskGroupRunning-badrun.txt > > > Observed today in the internal CI. Full log attached. > {noformat} > ../../src/tests/scheduler_tests.cpp:726 > Expected: v1::TASK_RUNNING > Which is: TASK_RUNNING > To be equal to: runningUpdate2->status().state() > Which is: TASK_FINISHED > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8203) SchedulerTest.TaskGroupRunning is flaky.
Alexander Rukletsov created MESOS-8203: -- Summary: SchedulerTest.TaskGroupRunning is flaky. Key: MESOS-8203 URL: https://issues.apache.org/jira/browse/MESOS-8203 Project: Mesos Issue Type: Bug Components: test Reporter: Alexander Rukletsov Observed today in the internal CI. Full log attached. {noformat} ../../src/tests/scheduler_tests.cpp:726 Expected: v1::TASK_RUNNING Which is: TASK_RUNNING To be equal to: runningUpdate2->status().state() Which is: TASK_FINISHED {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7742: --- Attachment: LaunchNestedContainerSessionDisconnected-badrun.txt > ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky > -- > > Key: MESOS-7742 > URL: https://issues.apache.org/jira/browse/MESOS-7742 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Gastón Kleiman > Labels: flaky-test, mesosphere-oncall > Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt, > LaunchNestedContainerSessionDisconnected-badrun.txt > > > Observed this on ASF CI and internal Mesosphere CI. Affected tests: > {noformat} > AgentAPIStreamingTest.AttachInputToNestedContainerSession > AgentAPITest.LaunchNestedContainerSession > AgentAPITest.AttachContainerInputAuthorization/0 > AgentAPITest.LaunchNestedContainerSessionWithTTY/0 > AgentAPITest.LaunchNestedContainerSessionDisconnected/1 > {noformat} > {code} > [ RUN ] > ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 > I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' > authorizer > I0629 05:49:33.182234 25306 master.cpp:436] Master > 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on > 172.17.0.3:45726 > I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" - > -allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --au > thenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/a5h5J3/credentials" > --framework_sorter="drf" --help="false" --hostn > ame_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" > --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="10 > 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" > --registry="in_memory" --registry_fetch_timeout="1mins" > --registry_gc_interval="15mins" --registr > y_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" - > -version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" > I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing > authenticated frameworks to register > I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing > authenticated agents to register > I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for > authentication from '/tmp/a5h5J3/credentials' > I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' > authenticator > I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled > I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical > allocator process > I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given > I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! > I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar > I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar > I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the > registry (0B) in 183040ns > I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in > 6441ns; attempting to update the registry > I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the > registry in 147200ns > I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered > registrar > I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of > hierarchical allocator: nothing to recover > I0629 05:49:33.186769 25301
[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7742: --- Description: Observed this on ASF CI and internal Mesosphere CI. Affected tests: {noformat} AgentAPIStreamingTest.AttachInputToNestedContainerSession AgentAPITest.LaunchNestedContainerSession AgentAPITest.AttachContainerInputAuthorization/0 AgentAPITest.LaunchNestedContainerSessionWithTTY/0 AgentAPITest.LaunchNestedContainerSessionDisconnected/1 {noformat} {code} [ RUN ] ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer I0629 05:49:33.182234 25306 master.cpp:436] Master 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726 I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" - -allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --au thenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/a5h5J3/credentials" --framework_sorter="drf" --help="false" --hostn ame_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="10 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registr y_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" - -version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated frameworks to register I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated agents to register I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated HTTP frameworks to register I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for authentication from '/tmp/a5h5J3/credentials' I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' authenticator I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical allocator process I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the registry (0B) in 183040ns I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; attempting to update the registry I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the registry in 147200ns I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of hierarchical allocator: nothing to recover I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy' I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on (644)@172.17.0.3:45726 I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true"
[jira] [Assigned] (MESOS-7519) OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky
[ https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reassigned MESOS-7519: -- Assignee: Alexander Rukletsov Sprint: Mesosphere Sprint 68 Story Points: 1 > OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky > - > > Key: MESOS-7519 > URL: https://issues.apache.org/jira/browse/MESOS-7519 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Alexander Rukletsov > Labels: flaky-test, mesosphere > Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt > > > {noformat} > [ RUN ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable > I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local' > authorizer > I0517 10:43:58.155712 260517888 master.cpp:436] Master > a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on > 169.254.161.216:51870 > I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/master" > --zk_session_timeout="10secs" > I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing > authenticated frameworks to register > I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing > authenticated agents to register > I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials' > I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5' > authenticator > I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled > I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading > master! > I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar > I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the > registry (0B) in 0ns > I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in > 14us; attempting to update the registry > I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the > registry in 0ns > I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered > registrar > I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the > registry (164B); allowing 10mins for agents to re-register > I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend > 'copy' > I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local' > authorizer > I0517 10:43:58.164876 262127616 slave.cpp:225] Mesos agent started on > (7)@169.254.161.216:51870 > I0517 10:43:58.164902 262127616 slave.cpp:226] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://; >
[jira] [Updated] (MESOS-7519) OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7519: --- Summary: OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky. (was: OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky) > OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky. > -- > > Key: MESOS-7519 > URL: https://issues.apache.org/jira/browse/MESOS-7519 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Alexander Rukletsov > Labels: flaky-test, mesosphere > Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt > > > {noformat} > [ RUN ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable > I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local' > authorizer > I0517 10:43:58.155712 260517888 master.cpp:436] Master > a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on > 169.254.161.216:51870 > I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/master" > --zk_session_timeout="10secs" > I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing > authenticated frameworks to register > I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing > authenticated agents to register > I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials' > I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5' > authenticator > I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled > I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading > master! > I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar > I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the > registry (0B) in 0ns > I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in > 14us; attempting to update the registry > I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the > registry in 0ns > I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered > registrar > I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the > registry (164B); allowing 10mins for agents to re-register > I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend > 'copy' > I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local' > authorizer > I0517 10:43:58.164876 262127616 slave.cpp:225] Mesos agent started on > (7)@169.254.161.216:51870 > I0517 10:43:58.164902 262127616 slave.cpp:226] Flags at startup: --acls="" >
[jira] [Commented] (MESOS-7519) OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky
[ https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246771#comment-16246771 ] Alexander Rukletsov commented on MESOS-7519: It looks like review https://reviews.apache.org/r/55893/ does not really fix what it aims to fix. The problem is in [this loop|https://github.com/apache/mesos/blob/master/src/tests/oversubscription_tests.cpp?utf8=%E2%9C%93#L549-L552]: {{offers.get()}} removes items from the collection we iterate over and hence defeats the purpose of merging resources from all offers. > OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky > - > > Key: MESOS-7519 > URL: https://issues.apache.org/jira/browse/MESOS-7519 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway > Labels: flaky-test, mesosphere > Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt > > > {noformat} > [ RUN ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable > I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local' > authorizer > I0517 10:43:58.155712 260517888 master.cpp:436] Master > a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on > 169.254.161.216:51870 > I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/master" > --zk_session_timeout="10secs" > I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing > authenticated frameworks to register > I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing > authenticated agents to register > I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials' > I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5' > authenticator > I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled > I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading > master! > I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar > I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the > registry (0B) in 0ns > I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in > 14us; attempting to update the registry > I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the > registry in 0ns > I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered > registrar > I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the > registry (164B); allowing 10mins for agents to re-register > I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend > 'copy' > I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local' > authorizer >
[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery
[ https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7939: -- Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66 (was: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67) > Early disk usage check for garbage collection during recovery > - > > Key: MESOS-7939 > URL: https://issues.apache.org/jira/browse/MESOS-7939 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Critical > Fix For: 1.4.1 > > > Currently the default value for `disk_watch_interval` is 1 minute. This is > not fast enough and could lead to the following scenario: > 1. The disk usage was checked and there was not enough headroom: > {noformat} > I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max > allowed age: 0ns > {noformat} > But no container was pruned because no container had been scheduled for GC. > 2. A task was completed. The task itself contained a lot of nested > containers, each used a lot of disk space. Note that there is no way for > Mesos agent to schedule individual nested containers for GC since nested > containers are not necessarily tied to tasks. When the top-lovel container is > completed, it was scheduled for GC, and the nested containers would be GC'ed > as well: > {noformat} > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc' > for gc 1.9466483852days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e' > for gc 1.9466405037days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc' > for gc 1.946635763days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e' > for gc 1.9466324148days in the future > {noformat} > 3. Since the next disk usage check was still 40ish seconds away, no GC was > performed even though the disk was full. As a result, Mesos agent failed to > checkpoint the task status: > {noformat} > I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status > update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: > Failed to open > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates' > for status updates: No space left on device Failed to handle status update > TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > {noformat} > 4. When the agent restarted, it tried to checkpoint the task status again. > However, since the first disk usage check was scheduled 1 minute after > startup, the agent failed before GC kicked in, falling into a restart failure > loop: > {noformat} > F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: > Failed to open > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates' > for status updates: No space left on device Failed to handle status update > TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > {noformat} > We should kick in GC early, so the agent can recover
[jira] [Updated] (MESOS-7881) Building gRPC with CMake
[ https://issues.apache.org/jira/browse/MESOS-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7881: -- Sprint: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66 (was: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 68) > Building gRPC with CMake > > > Key: MESOS-7881 > URL: https://issues.apache.org/jira/browse/MESOS-7881 > Project: Mesos > Issue Type: Improvement >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > Labels: storage > Fix For: 1.4.0 > > > gRPC manages its own third-party libraries, which overlap with Mesos' > third-party library bundles. We need to write proper rules in CMake to > configure gRPC's CMake properly to build it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery
[ https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7939: -- Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66 (was: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 68) > Early disk usage check for garbage collection during recovery > - > > Key: MESOS-7939 > URL: https://issues.apache.org/jira/browse/MESOS-7939 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Critical > Fix For: 1.4.1 > > > Currently the default value for `disk_watch_interval` is 1 minute. This is > not fast enough and could lead to the following scenario: > 1. The disk usage was checked and there was not enough headroom: > {noformat} > I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max > allowed age: 0ns > {noformat} > But no container was pruned because no container had been scheduled for GC. > 2. A task was completed. The task itself contained a lot of nested > containers, each used a lot of disk space. Note that there is no way for > Mesos agent to schedule individual nested containers for GC since nested > containers are not necessarily tied to tasks. When the top-lovel container is > completed, it was scheduled for GC, and the nested containers would be GC'ed > as well: > {noformat} > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc' > for gc 1.9466483852days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e' > for gc 1.9466405037days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc' > for gc 1.946635763days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e' > for gc 1.9466324148days in the future > {noformat} > 3. Since the next disk usage check was still 40ish seconds away, no GC was > performed even though the disk was full. As a result, Mesos agent failed to > checkpoint the task status: > {noformat} > I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status > update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: > Failed to open > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates' > for status updates: No space left on device Failed to handle status update > TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > {noformat} > 4. When the agent restarted, it tried to checkpoint the task status again. > However, since the first disk usage check was scheduled 1 minute after > startup, the agent failed before GC kicked in, falling into a restart failure > loop: > {noformat} > F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: > Failed to open > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates' > for status updates: No space left on device Failed to handle status update > TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > {noformat} > We should kick in GC early, so the agent can recover
[jira] [Created] (MESOS-8202) Eliminate agent failover after resource checkpointing failure
Gastón Kleiman created MESOS-8202: - Summary: Eliminate agent failover after resource checkpointing failure Key: MESOS-8202 URL: https://issues.apache.org/jira/browse/MESOS-8202 Project: Mesos Issue Type: Task Reporter: Gastón Kleiman -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7939) Early disk usage check for garbage collection during recovery
[ https://issues.apache.org/jira/browse/MESOS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7939: -- Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 68 (was: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66) > Early disk usage check for garbage collection during recovery > - > > Key: MESOS-7939 > URL: https://issues.apache.org/jira/browse/MESOS-7939 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Critical > Fix For: 1.4.1 > > > Currently the default value for `disk_watch_interval` is 1 minute. This is > not fast enough and could lead to the following scenario: > 1. The disk usage was checked and there was not enough headroom: > {noformat} > I0901 17:54:33.00 25510 slave.cpp:5896] Current disk usage 99.87%. Max > allowed age: 0ns > {noformat} > But no container was pruned because no container had been scheduled for GC. > 2. A task was completed. The task itself contained a lot of nested > containers, each used a lot of disk space. Note that there is no way for > Mesos agent to schedule individual nested containers for GC since nested > containers are not necessarily tied to tasks. When the top-lovel container is > completed, it was scheduled for GC, and the nested containers would be GC'ed > as well: > {noformat} > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc' > for gc 1.9466483852days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e' > for gc 1.9466405037days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e/runs/5e70adb1-939e-4d0f-a513-0f77704620bc' > for gc 1.946635763days in the future > I0901 17:54:44.00 25510 gc.cpp:59] Scheduling > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__81953586-6f33-4abf-921d-2bba0481836e' > for gc 1.9466324148days in the future > {noformat} > 3. Since the next disk usage check was still 40ish seconds away, no GC was > performed even though the disk was full. As a result, Mesos agent failed to > checkpoint the task status: > {noformat} > I0901 17:54:49.00 25513 status_update_manager.cpp:323] Received status > update TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > F0901 17:54:49.00 25513 slave.cpp:4748] CHECK_READY(future): is FAILED: > Failed to open > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates' > for status updates: No space left on device Failed to handle status update > TASK_FAILED (UUID: bf24c3da-db23-4c82-a09f-a3b859e8cad4) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > {noformat} > 4. When the agent restarted, it tried to checkpoint the task status again. > However, since the first disk usage check was scheduled 1 minute after > startup, the agent failed before GC kicked in, falling into a restart failure > loop: > {noformat} > F0901 17:55:06.00 31114 slave.cpp:4748] CHECK_READY(future): is FAILED: > Failed to open > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S5/frameworks/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005/executors/node__4ae69c7c-e32e-41d2-a485-88145a3e385c/runs/602befac-3ff5-44d7-acac-aeebdc0e4666/tasks/node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84/task.updates' > for status updates: No space left on device Failed to handle status update > TASK_FAILED (UUID: fb9c3951-9a93-4925-a7f0-9ba7e38d2398) for task > node-0-server__e5e468a3-b2ee-42ee-80e8-edc19a3aef84 of framework > 9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-0005 > {noformat} > We should kick in GC early, so the agent can recover
[jira] [Created] (MESOS-8201) Add end to end tests for offer operation feedback
Gastón Kleiman created MESOS-8201: - Summary: Add end to end tests for offer operation feedback Key: MESOS-8201 URL: https://issues.apache.org/jira/browse/MESOS-8201 Project: Mesos Issue Type: Task Reporter: Gastón Kleiman -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7519) OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky
[ https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7519: --- Attachment: RescindRevocableOfferWithIncreasedRevocable-badrun.txt > OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky > - > > Key: MESOS-7519 > URL: https://issues.apache.org/jira/browse/MESOS-7519 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway > Labels: flaky-test, mesosphere > Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt > > > {noformat} > [ RUN ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable > I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local' > authorizer > I0517 10:43:58.155712 260517888 master.cpp:436] Master > a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on > 169.254.161.216:51870 > I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/master" > --zk_session_timeout="10secs" > I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing > authenticated frameworks to register > I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing > authenticated agents to register > I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/C5v4kE/credentials' > I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5' > authenticator > I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled > I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading > master! > I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar > I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the > registry (0B) in 0ns > I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in > 14us; attempting to update the registry > I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the > registry in 0ns > I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered > registrar > I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the > registry (164B); allowing 10mins for agents to re-register > I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend > 'copy' > I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local' > authorizer > I0517 10:43:58.164876 262127616 slave.cpp:225] Mesos agent started on > (7)@169.254.161.216:51870 > I0517 10:43:58.164902 262127616 slave.cpp:226] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://; > --appc_store_dir="/var/folders/g7/cj4h93hx15d_5195_2436lc0gn/T/mesos/store/appc" >
[jira] [Updated] (MESOS-8189) Master’s OperationStatusUpdate handler should forward updates to the framework when OfferOperationID is set.
[ https://issues.apache.org/jira/browse/MESOS-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8189: -- Story Points: 2 > Master’s OperationStatusUpdate handler should forward updates to the > framework when OfferOperationID is set. > > > Key: MESOS-8189 > URL: https://issues.apache.org/jira/browse/MESOS-8189 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8190) Update the master to accept OfferOperationIDs from frameworks.
[ https://issues.apache.org/jira/browse/MESOS-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8190: -- Story Points: 3 > Update the master to accept OfferOperationIDs from frameworks. > -- > > Key: MESOS-8190 > URL: https://issues.apache.org/jira/browse/MESOS-8190 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > Master’s {{ACCEPT}} handler should send failed operation updates when a > framework sets the {{OfferOperationID}} on an operation destined for an agent > without the {{RESOURCE_PROVIDER}} capability. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8192) Update the scheduler library to support request/response API calls.
[ https://issues.apache.org/jira/browse/MESOS-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8192: -- Story Points: 3 > Update the scheduler library to support request/response API calls. > --- > > Key: MESOS-8192 > URL: https://issues.apache.org/jira/browse/MESOS-8192 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > The scheduler client/library should be updated to add support for API calls > following the request/response model, e.g., {{ReconcileOfferOperations}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.
[ https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-8200: --- Shepherd: Alexander Rukletsov > Suppressed roles are not honoured for v1 scheduler subscribe requests. > -- > > Key: MESOS-8200 > URL: https://issues.apache.org/jira/browse/MESOS-8200 > Project: Mesos > Issue Type: Bug > Components: scheduler api, scheduler driver >Reporter: Alexander Rukletsov >Assignee: James Peach > > When triaging MESOS-7996 I've found out that > {{Call.subscribe.suppressed_roles}} field is empty when the master processes > the request from a v1 HTTP scheduler. More precisely, [this > conversion|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/master/http.cpp#L969] > wipes the field. This is likely because this conversion relies on a general > [protobuf conversion > utility|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/internal/devolve.cpp#L28-L50], > which fails to copy {{suppressed_roles}} because they have different tags, > compare > [v0|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/scheduler/scheduler.proto#L271] > and > [v1|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/v1/scheduler/scheduler.proto#L258]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246671#comment-16246671 ] Alexander Rukletsov commented on MESOS-7996: The root cause is MESOS-8200. The test in its current form does not test anything: when it fails it actually surfaces the actual bug in the code; when it succeeds, it is due to the race. I've verified that fixing MESOS-8200 also fixes the improved test (attached to the ticket). > ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky. > -- > > Key: MESOS-7996 > URL: https://issues.apache.org/jira/browse/MESOS-7996 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.5.0 > Environment: Observed on Ubuntu 17.04 with SSL enabled >Reporter: Alexander Rukletsov >Assignee: James Peach > Labels: flaky-test, mesosphere > Attachments: NoOffersWithAllRolesSuppressed-modified.txt, > SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, > SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt > > > Observed the failure on internal CI: > {noformat} > ../../src/tests/scheduler_tests.cpp:1474 > Mock function called more times than expected - returning directly. > Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object > <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>) > Expected: to be never called >Actual: called once - over-saturated and active > {noformat} > Full log attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7996: --- Shepherd: Alexander Rukletsov > ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky. > -- > > Key: MESOS-7996 > URL: https://issues.apache.org/jira/browse/MESOS-7996 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.5.0 > Environment: Observed on Ubuntu 17.04 with SSL enabled >Reporter: Alexander Rukletsov >Assignee: James Peach > Labels: flaky-test, mesosphere > Attachments: NoOffersWithAllRolesSuppressed-modified.txt, > SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, > SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt > > > Observed the failure on internal CI: > {noformat} > ../../src/tests/scheduler_tests.cpp:1474 > Mock function called more times than expected - returning directly. > Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object > <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>) > Expected: to be never called >Actual: called once - over-saturated and active > {noformat} > Full log attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8193) Update master’s OfferOperationStatusUpdate handler to acknowledge updates to the agent if OfferOperationID is not set.
[ https://issues.apache.org/jira/browse/MESOS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8193: -- Story Points: 2 > Update master’s OfferOperationStatusUpdate handler to acknowledge updates to > the agent if OfferOperationID is not set. > -- > > Key: MESOS-8193 > URL: https://issues.apache.org/jira/browse/MESOS-8193 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8194) Make agent’s ApplyOfferOperationMessage handler support operations affecting default resources.
[ https://issues.apache.org/jira/browse/MESOS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8194: -- Story Points: 8 > Make agent’s ApplyOfferOperationMessage handler support operations affecting > default resources. > --- > > Key: MESOS-8194 > URL: https://issues.apache.org/jira/browse/MESOS-8194 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > The operations should be applied and it should send > {{OperationStatusUpdates}} to the master. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-7996: -- Assignee: James Peach > ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky. > -- > > Key: MESOS-7996 > URL: https://issues.apache.org/jira/browse/MESOS-7996 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.5.0 > Environment: Observed on Ubuntu 17.04 with SSL enabled >Reporter: Alexander Rukletsov >Assignee: James Peach > Labels: flaky-test, mesosphere > Attachments: NoOffersWithAllRolesSuppressed-modified.txt, > SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, > SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt > > > Observed the failure on internal CI: > {noformat} > ../../src/tests/scheduler_tests.cpp:1474 > Mock function called more times than expected - returning directly. > Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object > <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>) > Expected: to be never called >Actual: called once - over-saturated and active > {noformat} > Full log attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.
[ https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-8200: -- Assignee: James Peach > Suppressed roles are not honoured for v1 scheduler subscribe requests. > -- > > Key: MESOS-8200 > URL: https://issues.apache.org/jira/browse/MESOS-8200 > Project: Mesos > Issue Type: Bug > Components: scheduler api, scheduler driver >Reporter: Alexander Rukletsov >Assignee: James Peach > > When triaging MESOS-7996 I've found out that > {{Call.subscribe.suppressed_roles}} field is empty when the master processes > the request from a v1 HTTP scheduler. More precisely, [this > conversion|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/master/http.cpp#L969] > wipes the field. This is likely because this conversion relies on a general > [protobuf conversion > utility|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/internal/devolve.cpp#L28-L50], > which fails to copy {{suppressed_roles}} because they have different tags, > compare > [v0|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/scheduler/scheduler.proto#L271] > and > [v1|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/v1/scheduler/scheduler.proto#L258]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8195) Implement explicit offer operation reconciliation between the master, agent and RPs.
[ https://issues.apache.org/jira/browse/MESOS-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8195: -- Story Points: 3 > Implement explicit offer operation reconciliation between the master, agent > and RPs. > > > Key: MESOS-8195 > URL: https://issues.apache.org/jira/browse/MESOS-8195 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > Upon receiving an {{UpdateSlave}} message the master should compare its list > of pending operations for the agent/LRPs to the list of pending operations > contained in the message. It should then build a{{ ReconcileOfferOperations}} > message with all the operations missing in the {{UpdateSlave}} message and > send it to the agent. > The agent will receive these messages and should handle them by itself if the > operations affect the default resources, or forward them to the RP manager > otherwise. > The agent/RP handler should check if the operations are pending. If an > operation is not pending, then an {{ApplyOfferOperation}} message got > dropped, and the agent/LRP should send an {{OFFER_OPERATION_DROPPED}} status > update to the master. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.
[ https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8186: -- Story Points: 3 > Implement the agent's AcknowledgeOfferOperationMessage handler. > --- > > Key: MESOS-8186 > URL: https://issues.apache.org/jira/browse/MESOS-8186 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > The handler should handle acks for operations handled by the agent, and > forward the ack to the RP manager for all other operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.
Alexander Rukletsov created MESOS-8200: -- Summary: Suppressed roles are not honoured for v1 scheduler subscribe requests. Key: MESOS-8200 URL: https://issues.apache.org/jira/browse/MESOS-8200 Project: Mesos Issue Type: Bug Components: scheduler api, scheduler driver Reporter: Alexander Rukletsov When triaging MESOS-7996 I've found out that {{Call.subscribe.suppressed_roles}} field is empty when the master processes the request from a v1 HTTP scheduler. More precisely, [this conversion|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/master/http.cpp#L969] wipes the field. This is likely because this conversion relies on a general [protobuf conversion utility|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/src/internal/devolve.cpp#L28-L50], which fails to copy {{suppressed_roles}} because they have different tags, compare [v0|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/scheduler/scheduler.proto#L271] and [v1|https://github.com/apache/mesos/blob/1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9/include/mesos/v1/scheduler/scheduler.proto#L258]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8187) Enable LRP to send operation status updates, checkpoint, and retry using the SUM
[ https://issues.apache.org/jira/browse/MESOS-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8187: -- Story Points: 8 > Enable LRP to send operation status updates, checkpoint, and retry using the > SUM > > > Key: MESOS-8187 > URL: https://issues.apache.org/jira/browse/MESOS-8187 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8191) Implement ReconcileOfferOperations handler in the master
[ https://issues.apache.org/jira/browse/MESOS-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8191: -- Story Points: 5 > Implement ReconcileOfferOperations handler in the master > > > Key: MESOS-8191 > URL: https://issues.apache.org/jira/browse/MESOS-8191 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > The master will synchronously respond to the framework with a > {{OFFER_OPERATIONS_RECONCILIATION}} response. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8188) Enable agent to send operation status updates, checkpoint, and retry using the SUM
[ https://issues.apache.org/jira/browse/MESOS-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8188: -- Story Points: 8 > Enable agent to send operation status updates, checkpoint, and retry using > the SUM > -- > > Key: MESOS-8188 > URL: https://issues.apache.org/jira/browse/MESOS-8188 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8195) Implement explicit offer operation reconciliation between the master, agent and RPs.
[ https://issues.apache.org/jira/browse/MESOS-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8195: -- Summary: Implement explicit offer operation reconciliation between the master, agent and RPs. (was: Add an explicit offer operation reconciliation between the master, agent and RPs.) > Implement explicit offer operation reconciliation between the master, agent > and RPs. > > > Key: MESOS-8195 > URL: https://issues.apache.org/jira/browse/MESOS-8195 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > Upon receiving an {{UpdateSlave}} message the master should compare its list > of pending operations for the agent/LRPs to the list of pending operations > contained in the message. It should then build a{{ ReconcileOfferOperations}} > message with all the operations missing in the {{UpdateSlave}} message and > send it to the agent. > The agent will receive these messages and should handle them by itself if the > operations affect the default resources, or forward them to the RP manager > otherwise. > The agent/RP handler should check if the operations are pending. If an > operation is not pending, then an {{ApplyOfferOperation}} message got > dropped, and the agent/LRP should send an {{OFFER_OPERATION_DROPPED}} status > update to the master. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.
Gastón Kleiman created MESOS-8199: - Summary: Add plumbing for explicit offer operation reconciliation between master, agent, and RPs. Key: MESOS-8199 URL: https://issues.apache.org/jira/browse/MESOS-8199 Project: Mesos Issue Type: Bug Reporter: Gastón Kleiman -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8197) Implement a library to send offer operation status updates
[ https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8197: -- Issue Type: Task (was: Bug) > Implement a library to send offer operation status updates > -- > > Key: MESOS-8197 > URL: https://issues.apache.org/jira/browse/MESOS-8197 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8197) Implement a library to send offer operation status updates
[ https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman reassigned MESOS-8197: - Shepherd: Greg Mann Assignee: Gastón Kleiman > Implement a library to send offer operation status updates > -- > > Key: MESOS-8197 > URL: https://issues.apache.org/jira/browse/MESOS-8197 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.
[ https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8199: -- Labels: mesosphere (was: ) > Add plumbing for explicit offer operation reconciliation between master, > agent, and RPs. > > > Key: MESOS-8199 > URL: https://issues.apache.org/jira/browse/MESOS-8199 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8190) Update the master to accept OfferOperationIDs from frameworks.
[ https://issues.apache.org/jira/browse/MESOS-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8190: -- Labels: mesosphere (was: ) > Update the master to accept OfferOperationIDs from frameworks. > -- > > Key: MESOS-8190 > URL: https://issues.apache.org/jira/browse/MESOS-8190 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > Master’s {{ACCEPT}} handler should send failed operation updates when a > framework sets the {{OfferOperationID}} on an operation destined for an agent > without the {{RESOURCE_PROVIDER}} capability. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8188) Enable agent to send operation status updates, checkpoint, and retry using the SUM
[ https://issues.apache.org/jira/browse/MESOS-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8188: -- Labels: mesosphere (was: ) > Enable agent to send operation status updates, checkpoint, and retry using > the SUM > -- > > Key: MESOS-8188 > URL: https://issues.apache.org/jira/browse/MESOS-8188 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.
[ https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman reassigned MESOS-8199: - Shepherd: Jie Yu Assignee: Greg Mann > Add plumbing for explicit offer operation reconciliation between master, > agent, and RPs. > > > Key: MESOS-8199 > URL: https://issues.apache.org/jira/browse/MESOS-8199 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman >Assignee: Greg Mann > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8191) Implement ReconcileOfferOperations handler in the master
[ https://issues.apache.org/jira/browse/MESOS-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8191: -- Labels: mesosphere (was: ) > Implement ReconcileOfferOperations handler in the master > > > Key: MESOS-8191 > URL: https://issues.apache.org/jira/browse/MESOS-8191 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > The master will synchronously respond to the framework with a > {{OFFER_OPERATIONS_RECONCILIATION}} response. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8184) Implement master's AcknowledgeOfferOperationMessage handler.
[ https://issues.apache.org/jira/browse/MESOS-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8184: -- Labels: mesosphere (was: ) > Implement master's AcknowledgeOfferOperationMessage handler. > > > Key: MESOS-8184 > URL: https://issues.apache.org/jira/browse/MESOS-8184 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > This handler should validate the message and forward it to the corresponding > agent/ERP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos
[ https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8198: -- Labels: mesosphere (was: ) > Update the ReconcileOfferOperations protos > -- > > Key: MESOS-8198 > URL: https://issues.apache.org/jira/browse/MESOS-8198 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman > Labels: mesosphere > > Some protos have been committed, but they follow an event-based API. > We decided to follow the request/response model for this API, so we need to > update the protos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.
[ https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8186: -- Labels: mesosphere (was: ) > Implement the agent's AcknowledgeOfferOperationMessage handler. > --- > > Key: MESOS-8186 > URL: https://issues.apache.org/jira/browse/MESOS-8186 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > The handler should handle acks for operations handled by the agent, and > forward the ack to the RP manager for all other operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8193) Update master’s OfferOperationStatusUpdate handler to acknowledge updates to the agent if OfferOperationID is not set.
[ https://issues.apache.org/jira/browse/MESOS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8193: -- Labels: mesosphere (was: ) > Update master’s OfferOperationStatusUpdate handler to acknowledge updates to > the agent if OfferOperationID is not set. > -- > > Key: MESOS-8193 > URL: https://issues.apache.org/jira/browse/MESOS-8193 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.
[ https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8199: -- Sprint: Mesosphere Sprint 68 Story Points: 2 > Add plumbing for explicit offer operation reconciliation between master, > agent, and RPs. > > > Key: MESOS-8199 > URL: https://issues.apache.org/jira/browse/MESOS-8199 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos
[ https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8198: -- Issue Type: Task (was: Bug) > Update the ReconcileOfferOperations protos > -- > > Key: MESOS-8198 > URL: https://issues.apache.org/jira/browse/MESOS-8198 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > Some protos have been committed, but they follow an event-based API. > We decided to follow the request/response model for this API, so we need to > update the protos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8199) Add plumbing for explicit offer operation reconciliation between master, agent, and RPs.
[ https://issues.apache.org/jira/browse/MESOS-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8199: -- Issue Type: Task (was: Bug) > Add plumbing for explicit offer operation reconciliation between master, > agent, and RPs. > > > Key: MESOS-8199 > URL: https://issues.apache.org/jira/browse/MESOS-8199 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8192) Update the scheduler library to support request/response API calls.
[ https://issues.apache.org/jira/browse/MESOS-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8192: -- Labels: mesosphere (was: ) > Update the scheduler library to support request/response API calls. > --- > > Key: MESOS-8192 > URL: https://issues.apache.org/jira/browse/MESOS-8192 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > The scheduler client/library should be updated to add support for API calls > following the request/response model, e.g., {{ReconcileOfferOperations}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8189) Master’s OperationStatusUpdate handler should forward updates to the framework when OfferOperationID is set.
[ https://issues.apache.org/jira/browse/MESOS-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8189: -- Labels: mesosphere (was: ) > Master’s OperationStatusUpdate handler should forward updates to the > framework when OfferOperationID is set. > > > Key: MESOS-8189 > URL: https://issues.apache.org/jira/browse/MESOS-8189 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8187) Enable LRP to send operation status updates, checkpoint, and retry using the SUM
[ https://issues.apache.org/jira/browse/MESOS-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8187: -- Labels: mesosphere (was: ) > Enable LRP to send operation status updates, checkpoint, and retry using the > SUM > > > Key: MESOS-8187 > URL: https://issues.apache.org/jira/browse/MESOS-8187 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8195) Add an explicit offer operation reconciliation between the master, agent and RPs.
[ https://issues.apache.org/jira/browse/MESOS-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8195: -- Labels: mesosphere (was: ) > Add an explicit offer operation reconciliation between the master, agent and > RPs. > - > > Key: MESOS-8195 > URL: https://issues.apache.org/jira/browse/MESOS-8195 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > Upon receiving an {{UpdateSlave}} message the master should compare its list > of pending operations for the agent/LRPs to the list of pending operations > contained in the message. It should then build a{{ ReconcileOfferOperations}} > message with all the operations missing in the {{UpdateSlave}} message and > send it to the agent. > The agent will receive these messages and should handle them by itself if the > operations affect the default resources, or forward them to the RP manager > otherwise. > The agent/RP handler should check if the operations are pending. If an > operation is not pending, then an {{ApplyOfferOperation}} message got > dropped, and the agent/LRP should send an {{OFFER_OPERATION_DROPPED}} status > update to the master. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8055) Design doc for offer operations feedback
[ https://issues.apache.org/jira/browse/MESOS-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8055: -- Labels: mesosphere (was: ) > Design doc for offer operations feedback > > > Key: MESOS-8055 > URL: https://issues.apache.org/jira/browse/MESOS-8055 > Project: Mesos > Issue Type: Documentation >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > > https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8197) Implement a library to send offer operation status updates
[ https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8197: -- Labels: mesosphere (was: ) > Implement a library to send offer operation status updates > -- > > Key: MESOS-8197 > URL: https://issues.apache.org/jira/browse/MESOS-8197 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8194) Make agent’s ApplyOfferOperationMessage handler support operations affecting default resources.
[ https://issues.apache.org/jira/browse/MESOS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8194: -- Labels: mesosphere (was: ) > Make agent’s ApplyOfferOperationMessage handler support operations affecting > default resources. > --- > > Key: MESOS-8194 > URL: https://issues.apache.org/jira/browse/MESOS-8194 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > Labels: mesosphere > > The operations should be applied and it should send > {{OperationStatusUpdates}} to the master. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8197) Implement a library to send offer operation status updates
[ https://issues.apache.org/jira/browse/MESOS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8197: -- Sprint: Mesosphere Sprint 68 Story Points: 8 > Implement a library to send offer operation status updates > -- > > Key: MESOS-8197 > URL: https://issues.apache.org/jira/browse/MESOS-8197 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8186) Implement the agent's AcknowledgeOfferOperationMessage handler.
[ https://issues.apache.org/jira/browse/MESOS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8186: -- Summary: Implement the agent's AcknowledgeOfferOperationMessage handler. (was: Add an AcknowledgeOfferOperationMessage handler to the agent) > Implement the agent's AcknowledgeOfferOperationMessage handler. > --- > > Key: MESOS-8186 > URL: https://issues.apache.org/jira/browse/MESOS-8186 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > > The handler should handle acks for operations handled by the agent, and > forward the ack to the RP manager for all other operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8184) Implement master's AcknowledgeOfferOperationMessage handler.
[ https://issues.apache.org/jira/browse/MESOS-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8184: -- Sprint: Mesosphere Sprint 68 Story Points: 3 > Implement master's AcknowledgeOfferOperationMessage handler. > > > Key: MESOS-8184 > URL: https://issues.apache.org/jira/browse/MESOS-8184 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman > > This handler should validate the message and forward it to the corresponding > agent/ERP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos
[ https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8198: -- Sprint: Mesosphere Sprint 68 > Update the ReconcileOfferOperations protos > -- > > Key: MESOS-8198 > URL: https://issues.apache.org/jira/browse/MESOS-8198 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman > > Some protos have been committed, but they follow an event-based API. > We decided to follow the request/response model for this API, so we need to > update the protos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos
[ https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-8198: -- Story Points: 1 > Update the ReconcileOfferOperations protos > -- > > Key: MESOS-8198 > URL: https://issues.apache.org/jira/browse/MESOS-8198 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman > > Some protos have been committed, but they follow an event-based API. > We decided to follow the request/response model for this API, so we need to > update the protos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8198) Update the ReconcileOfferOperations protos
Gastón Kleiman created MESOS-8198: - Summary: Update the ReconcileOfferOperations protos Key: MESOS-8198 URL: https://issues.apache.org/jira/browse/MESOS-8198 Project: Mesos Issue Type: Bug Reporter: Gastón Kleiman Some protos have been committed, but they follow an event-based API. We decided to follow the request/response model for this API, so we need to update the protos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8197) Implement a library to send offer operation status updates
Gastón Kleiman created MESOS-8197: - Summary: Implement a library to send offer operation status updates Key: MESOS-8197 URL: https://issues.apache.org/jira/browse/MESOS-8197 Project: Mesos Issue Type: Bug Reporter: Gastón Kleiman -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-6792) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
[ https://issues.apache.org/jira/browse/MESOS-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6792: --- Labels: flaky-test tech-debt (was: tech-debt) > MasterSlaveReconciliationTest.ReconcileLostTask test is flaky > - > > Key: MESOS-6792 > URL: https://issues.apache.org/jira/browse/MESOS-6792 > Project: Mesos > Issue Type: Bug > Environment: Fedora 25, clang, w/ optimizations, SSL build >Reporter: Benjamin Bannier > Labels: flaky-test, tech-debt > > The test {{MasterSlaveReconciliationTest.ReconcileLostTask}} is flaky for me > as of {{e99ea9ce8b1de01dd8b3cac6675337edb6320f38}}, > {code} > Repeating all tests (iteration 912) . . . > Note: Google Test filter = <...> > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from MasterSlaveReconciliationTest > [ RUN ] MasterSlaveReconciliationTest.SlaveReregisterTerminatedExecutor > I1214 04:41:11.559672 2005 cluster.cpp:160] Creating default 'local' > authorizer > I1214 04:41:11.560848 2045 master.cpp:380] Master > 87dd8179-dd7d-4270-ace2-ea771b57371c (gru1.hw.ca1.mesosphere.com) started on > 192.99.40.208:37659 > I1214 04:41:11.560878 2045 master.cpp:382] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/cXHI89/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/home/bbannier/src/mesos/build/P/share/mesos/webui" > --work_dir="/tmp/cXHI89/master" --zk_session_timeout="10secs" > I1214 04:41:11.561079 2045 master.cpp:432] Master only allowing > authenticated frameworks to register > I1214 04:41:11.561089 2045 master.cpp:446] Master only allowing > authenticated agents to register > I1214 04:41:11.561095 2045 master.cpp:459] Master only allowing > authenticated HTTP frameworks to register > I1214 04:41:11.561101 2045 credentials.hpp:39] Loading credentials for > authentication from '/tmp/cXHI89/credentials' > I1214 04:41:11.561194 2045 master.cpp:504] Using default 'crammd5' > authenticator > I1214 04:41:11.561236 2045 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1214 04:41:11.561274 2045 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1214 04:41:11.561301 2045 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1214 04:41:11.561326 2045 master.cpp:584] Authorization enabled > I1214 04:41:11.562155 2039 master.cpp:2045] Elected as the leading master! > I1214 04:41:11.562173 2039 master.cpp:1568] Recovering from registrar > I1214 04:41:11.562347 2045 registrar.cpp:362] Successfully fetched the > registry (0B) in 114944ns > I1214 04:41:11.562441 2045 registrar.cpp:461] Applied 1 operations in > 7920ns; attempting to update the registry > I1214 04:41:11.562621 2048 registrar.cpp:506] Successfully updated the > registry in 155136ns > I1214 04:41:11.562664 2048 registrar.cpp:392] Successfully recovered > registrar > I1214 04:41:11.562832 2044 master.cpp:1684] Recovered 0 agents from the > registry (166B); allowing 10mins for agents to re-register > I1214 04:41:11.568444 2005 cluster.cpp:446] Creating default 'local' > authorizer > I1214 04:41:11.569344 2005 sched.cpp:232] Version: 1.2.0 > I1214 04:41:11.569842 2035 slave.cpp:209] Mesos agent started on > (912)@192.99.40.208:37659 > I1214 04:41:11.570080 2040 sched.cpp:336] New master detected at > master@192.99.40.208:37659 > I1214 04:41:11.570117 2040 sched.cpp:402] Authenticating with master > master@192.99.40.208:37659 > I1214 04:41:11.570127 2040 sched.cpp:409] Using default CRAM-MD5 > authenticatee > I1214 04:41:11.570220 2040 authenticatee.cpp:121] Creating new client SASL >
[jira] [Assigned] (MESOS-8000) DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reassigned MESOS-8000: -- Resolution: Duplicate Assignee: Armand Grillet Fix Version/s: 1.5.0 > DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky. > --- > > Key: MESOS-8000 > URL: https://issues.apache.org/jira/browse/MESOS-8000 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.5.0 > Environment: Ubuntu 16.04 >Reporter: Alexander Rukletsov >Assignee: Armand Grillet > Labels: flaky-test, mesosphere > Fix For: 1.5.0 > > Attachments: ROOT_VerifyContainerIP_badrun.txt, > ROOT_VerifyContainerIP_goodrun.txt > > > Observed a failure on internal CI: > {noformat} > ../../src/tests/containerizer/cni_isolator_tests.cpp:1419 > Failed to wait 15secs for subscribed > {noformat} > Full log attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8049) MasterTest.RecoveredFramework is flaky and crashes.
[ https://issues.apache.org/jira/browse/MESOS-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-8049: --- Labels: flaky flaky-test (was: crashed flaky flaky-test) > MasterTest.RecoveredFramework is flaky and crashes. > --- > > Key: MESOS-8049 > URL: https://issues.apache.org/jira/browse/MESOS-8049 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.5.0 > Environment: ubuntu-17.04 >Reporter: Till Toenshoff > Labels: flaky, flaky-test > > Observed on internal CI: > {noformat} > 00:35:26 [ RUN ] MasterTest.RecoveredFramework > 00:35:26 I0930 00:35:26.319862 27033 cluster.cpp:162] Creating default > 'local' authorizer > 00:35:26 I0930 00:35:26.321624 27053 master.cpp:445] Master > 94ab36ee-4c02-457d-ae35-2f130ae826f5 (ip-172-16-10-150) started on > 172.16.10.150:37345 > 00:35:26 I0930 00:35:26.321647 27053 master.cpp:447] Flags at startup: > --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/Z8B1GQ/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/Z8B1GQ/master" > --zk_session_timeout="10secs" > 00:35:26 I0930 00:35:26.321758 27053 master.cpp:497] Master only allowing > authenticated frameworks to register > 00:35:26 I0930 00:35:26.321768 27053 master.cpp:511] Master only allowing > authenticated agents to register > 00:35:26 I0930 00:35:26.321772 27053 master.cpp:524] Master only allowing > authenticated HTTP frameworks to register > 00:35:26 I0930 00:35:26.321777 27053 credentials.hpp:37] Loading credentials > for authentication from '/tmp/Z8B1GQ/credentials' > 00:35:26 I0930 00:35:26.321853 27053 master.cpp:569] Using default 'crammd5' > authenticator > 00:35:26 I0930 00:35:26.321892 27053 http.cpp:1045] Creating default 'basic' > HTTP authenticator for realm 'mesos-master-readonly' > 00:35:26 I0930 00:35:26.321923 27053 http.cpp:1045] Creating default 'basic' > HTTP authenticator for realm 'mesos-master-readwrite' > 00:35:26 I0930 00:35:26.321946 27053 http.cpp:1045] Creating default 'basic' > HTTP authenticator for realm 'mesos-master-scheduler' > 00:35:26 I0930 00:35:26.321969 27053 master.cpp:649] Authorization enabled > 00:35:26 I0930 00:35:26.322120 27048 hierarchical.cpp:171] Initialized > hierarchical allocator process > 00:35:26 I0930 00:35:26.322145 27048 whitelist_watcher.cpp:77] No whitelist > given > 00:35:26 I0930 00:35:26.322657 27053 master.cpp:2216] Elected as the leading > master! > 00:35:26 I0930 00:35:26.322679 27053 master.cpp:1705] Recovering from > registrar > 00:35:26 I0930 00:35:26.322721 27053 registrar.cpp:347] Recovering registrar > 00:35:26 I0930 00:35:26.322829 27048 registrar.cpp:391] Successfully fetched > the registry (0B) in 90368ns > 00:35:26 I0930 00:35:26.322856 27048 registrar.cpp:495] Applied 1 operations > in 4113ns; attempting to update the registry > 00:35:26 I0930 00:35:26.322960 27053 registrar.cpp:552] Successfully updated > the registry in 89088ns > 00:35:26 I0930 00:35:26.323011 27053 registrar.cpp:424] Successfully > recovered registrar > 00:35:26 I0930 00:35:26.323148 27054 master.cpp:1809] Recovered 0 agents from > the registry (146B); allowing 10mins for agents to re-register > 00:35:26 I0930 00:35:26.323161 27047 hierarchical.cpp:209] Skipping recovery > of hierarchical allocator: nothing to recover > 00:35:26 W0930 00:35:26.325556 27033 process.cpp:3194] Attempted to spawn > already running process files@172.16.10.150:37345 > 00:35:26 I0930 00:35:26.325654 27033 cluster.cpp:448] Creating default > 'local' authorizer > 00:35:26 I0930 00:35:26.326050 27048 slave.cpp:254] Mesos agent started on > (250)@172.16.10.150:37345 >
[jira] [Commented] (MESOS-7985) Use ASF CI for automating RPM packaging and upload to bintray.
[ https://issues.apache.org/jira/browse/MESOS-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246312#comment-16246312 ] Kapil Arya commented on MESOS-7985: --- {code} commit 1132e1ddafa6a1a9bc8aa966bd01d7b35c7682d9 (HEAD -> master, apache/master) Author: Kapil AryaDate: Fri Nov 3 10:37:32 2017 -0400 Added bintray publishing scripts. Review: https://reviews.apache.org/r/63543 {code} CI Job: https://builds.apache.org/job/Mesos/job/Packaging/job/CentosRPMs/20/ The failure was due to the max file upload size limit set to ~250MB. The scripts have been updated to not upload debug rpms. > Use ASF CI for automating RPM packaging and upload to bintray. > -- > > Key: MESOS-7985 > URL: https://issues.apache.org/jira/browse/MESOS-7985 > Project: Mesos > Issue Type: Task >Reporter: Kapil Arya >Assignee: Kapil Arya > Fix For: 1.5.0 > > > RR: https://reviews.apache.org/r/63543/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-2239) MasterAuthorizationTest.DuplicateRegistration is flaky
[ https://issues.apache.org/jira/browse/MESOS-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reassigned MESOS-2239: -- Assignee: Vinod Kone (was: Chen Zhiwei) > MasterAuthorizationTest.DuplicateRegistration is flaky > -- > > Key: MESOS-2239 > URL: https://issues.apache.org/jira/browse/MESOS-2239 > Project: Mesos > Issue Type: Bug > Components: test > Environment: CentOS5 gcc-4.8 >Reporter: Jie Yu >Assignee: Vinod Kone > Labels: flaky > > {noformat} > 19:30:44 DEBUG: [ RUN ] MasterAuthorizationTest.DuplicateRegistration > 19:30:44 DEBUG: Using temporary directory > '/tmp/MasterAuthorizationTest_DuplicateRegistration_lTKlxz' > 19:30:44 DEBUG: I0121 19:30:44.583595 54842 leveldb.cpp:176] Opened db in > 2.002477ms > 19:30:44 DEBUG: I0121 19:30:44.584470 54842 leveldb.cpp:183] Compacted db in > 848351ns > 19:30:44 DEBUG: I0121 19:30:44.584492 54842 leveldb.cpp:198] Created db > iterator in 3830ns > 19:30:44 DEBUG: I0121 19:30:44.584506 54842 leveldb.cpp:204] Seeked to > beginning of db in 962ns > 19:30:44 DEBUG: I0121 19:30:44.584519 54842 leveldb.cpp:273] Iterated through > 0 keys in the db in 598ns > 19:30:44 DEBUG: I0121 19:30:44.584537 54842 replica.cpp:744] Replica > recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > 19:30:44 DEBUG: I0121 19:30:44.584684 54873 recover.cpp:449] Starting replica > recovery > 19:30:44 DEBUG: I0121 19:30:44.584774 54859 recover.cpp:475] Replica is in > EMPTY status > 19:30:44 DEBUG: I0121 19:30:44.586305 54881 replica.cpp:641] Replica in EMPTY > status received a broadcasted recover request > 19:30:44 DEBUG: I0121 19:30:44.586943 54866 recover.cpp:195] Received a > recover response from a replica in EMPTY status > 19:30:44 DEBUG: I0121 19:30:44.587247 54872 recover.cpp:566] Updating replica > status to STARTING > 19:30:44 DEBUG: I0121 19:30:44.587838 54867 leveldb.cpp:306] Persisting > metadata (8 bytes) to leveldb took 393697ns > 19:30:44 DEBUG: I0121 19:30:44.587862 54867 replica.cpp:323] Persisted > replica status to STARTING > 19:30:44 DEBUG: I0121 19:30:44.587920 54877 recover.cpp:475] Replica is in > STARTING status > 19:30:44 DEBUG: I0121 19:30:44.588341 54868 replica.cpp:641] Replica in > STARTING status received a broadcasted recover request > 19:30:44 DEBUG: I0121 19:30:44.588577 54877 recover.cpp:195] Received a > recover response from a replica in STARTING status > 19:30:44 DEBUG: I0121 19:30:44.589040 54863 recover.cpp:566] Updating replica > status to VOTING > 19:30:44 DEBUG: I0121 19:30:44.589344 54871 leveldb.cpp:306] Persisting > metadata (8 bytes) to leveldb took 268257ns > 19:30:44 DEBUG: I0121 19:30:44.589361 54871 replica.cpp:323] Persisted > replica status to VOTING > 19:30:44 DEBUG: I0121 19:30:44.589426 54858 recover.cpp:580] Successfully > joined the Paxos group > 19:30:44 DEBUG: I0121 19:30:44.589735 54858 recover.cpp:464] Recover process > terminated > 19:30:44 DEBUG: I0121 19:30:44.593657 54866 master.cpp:262] Master > 20150121-193044-1711542956-52053-54842 (atlc-bev-05-sr1.corpdc.twttr.net) > started on 172.18.4.102:52053 > 19:30:44 DEBUG: I0121 19:30:44.593690 54866 master.cpp:308] Master only > allowing authenticated frameworks to register > 19:30:44 DEBUG: I0121 19:30:44.593699 54866 master.cpp:313] Master only > allowing authenticated slaves to register > 19:30:44 DEBUG: I0121 19:30:44.593708 54866 credentials.hpp:36] Loading > credentials for authentication from > '/tmp/MasterAuthorizationTest_DuplicateRegistration_lTKlxz/credentials' > 19:30:44 DEBUG: I0121 19:30:44.593808 54866 master.cpp:357] Authorization > enabled > 19:30:44 DEBUG: I0121 19:30:44.594319 54871 master.cpp:1219] The newly > elected leader is master@172.18.4.102:52053 with id > 20150121-193044-1711542956-52053-54842 > 19:30:44 DEBUG: I0121 19:30:44.594336 54871 master.cpp:1232] Elected as the > leading master! > 19:30:44 DEBUG: I0121 19:30:44.594343 54871 master.cpp:1050] Recovering from > registrar > 19:30:44 DEBUG: I0121 19:30:44.594403 54867 registrar.cpp:313] Recovering > registrar > 19:30:44 DEBUG: I0121 19:30:44.594558 54858 log.cpp:660] Attempting to start > the writer > 19:30:44 DEBUG: I0121 19:30:44.595000 54859 replica.cpp:477] Replica received > implicit promise request with proposal 1 > 19:30:44 DEBUG: I0121 19:30:44.595340 54859 leveldb.cpp:306] Persisting > metadata (8 bytes) to leveldb took 319942ns > 19:30:44 DEBUG: I0121 19:30:44.595360 54859 replica.cpp:345] Persisted > promised to 1 > 19:30:44 DEBUG: I0121 19:30:44.595700 54878 coordinator.cpp:230] Coordinator > attemping to fill missing position > 19:30:44 DEBUG: I0121 19:30:44.596330 54859 replica.cpp:378] Replica received > explicit promise request for position 0 with proposal 2 >
[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7742: --- Description: Observed this on ASF CI and internal Mesosphere CI. Affected tests: {noformat} AgentAPIStreamingTest.AttachInputToNestedContainerSession AgentAPITest.LaunchNestedContainerSession ContentType/AgentAPITest.AttachContainerInputAuthorization/0 AgentAPITest.LaunchNestedContainerSessionWithTTY/0 {noformat} {code} [ RUN ] ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer I0629 05:49:33.182234 25306 master.cpp:436] Master 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726 I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" - -allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --au thenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/a5h5J3/credentials" --framework_sorter="drf" --help="false" --hostn ame_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="10 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registr y_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" - -version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated frameworks to register I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated agents to register I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated HTTP frameworks to register I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for authentication from '/tmp/a5h5J3/credentials' I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' authenticator I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical allocator process I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the registry (0B) in 183040ns I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; attempting to update the registry I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the registry in 147200ns I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of hierarchical allocator: nothing to recover I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy' I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on (644)@172.17.0.3:45726 I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5"
[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
[ https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meng Zhu updated MESOS-8171: Sprint: Mesosphere Sprint 67 > Using a failoverTimeout of 0 with Mesos native scheduler client can result in > infinite subscribe loop > - > > Key: MESOS-8171 > URL: https://issues.apache.org/jira/browse/MESOS-8171 > Project: Mesos > Issue Type: Bug > Components: c++ api, java api, scheduler driver >Affects Versions: 1.1.3, 1.2.2, 1.3.1, 1.4.0 >Reporter: Tim Harper >Assignee: Meng Zhu >Priority: Minor > Labels: mesosphere > > Over the past year, the Marathon team has been plagued with an issue that > hits our CI builds periodically in which the scheduler driver enters a tight > loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned > on debug logging for the client and the server, and it pointed to an issue > with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: > {code} > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on > 127.0.1.1:60957 with 8 worker threads > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151785 13791 group.cpp:341] Group process > (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size > (joins, cancels, datas) = (0, 0, 0) > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in > ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' > at '/mesos' in ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152762 13791 group.cpp:700] Trying to get > '/mesos/json.info_00' in ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master > (UPID=master@172.16.10.95:32856) is detected > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157347 13787 sched.cpp:336] New master detected at > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to > register without authentication > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159785 13789
[jira] [Commented] (MESOS-7699) "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable freshly released)
[ https://issues.apache.org/jira/browse/MESOS-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246113#comment-16246113 ] Vinod Kone commented on MESOS-7699: --- [~bennoe] should this be in reviewable? > "stdlib.h: No such file or directory" when building with GCC 6 (Debian stable > freshly released) > --- > > Key: MESOS-7699 > URL: https://issues.apache.org/jira/browse/MESOS-7699 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.2.0 >Reporter: Adam Cecile >Assignee: Benno Evers > Labels: autotools > > Hi, > It seems the issue comes from a workaround added a while ago: > https://reviews.apache.org/r/40326/ > https://reviews.apache.org/r/40327/ > When building with external libraries it turns out creating build commands > line with -isystem /usr/include which is clearly stated as being wrong, > according to GCC guys: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70129 > I'll do some testing by reverting all -isystem to -I and I'll let it know if > it gets built. > Regards, Adam. > {noformat} > configure:21642: result: no > configure:21642: checking glog/logging.h presence > configure:21642: g++ -E -I/usr/include -I/usr/include/apr-1 > -I/usr/include/apr-1.0 -Wdate-time -D_FORTIFY_SOURCE=2 -isystem /usr/include > -I/usr/include conftest.cpp > In file included from /usr/include/c++/6/ext/string_conversions.h:41:0, > from /usr/include/c++/6/bits/basic_string.h:5417, > from /usr/include/c++/6/string:52, > from /usr/include/c++/6/bits/locale_classes.h:40, > from /usr/include/c++/6/bits/ios_base.h:41, > from /usr/include/c++/6/ios:42, > from /usr/include/c++/6/ostream:38, > from /usr/include/glog/logging.h:43, > from conftest.cpp:32: > /usr/include/c++/6/cstdlib:75:25: fatal error: stdlib.h: No such file or > directory > #include_next > ^ > compilation terminated. > configure:21642: $? = 1 > configure: failed program was: > | /* confdefs.h */ > | #define PACKAGE_NAME "mesos" > | #define PACKAGE_TARNAME "mesos" > | #define PACKAGE_VERSION "1.2.0" > | #define PACKAGE_STRING "mesos 1.2.0" > | #define PACKAGE_BUGREPORT "" > | #define PACKAGE_URL "" > | #define PACKAGE "mesos" > | #define VERSION "1.2.0" > | #define STDC_HEADERS 1 > | #define HAVE_SYS_TYPES_H 1 > | #define HAVE_SYS_STAT_H 1 > | #define HAVE_STDLIB_H 1 > | #define HAVE_STRING_H 1 > | #define HAVE_MEMORY_H 1 > | #define HAVE_STRINGS_H 1 > | #define HAVE_INTTYPES_H 1 > | #define HAVE_STDINT_H 1 > | #define HAVE_UNISTD_H 1 > | #define HAVE_DLFCN_H 1 > | #define LT_OBJDIR ".libs/" > | #define HAVE_CXX11 1 > | #define HAVE_PTHREAD_PRIO_INHERIT 1 > | #define HAVE_PTHREAD 1 > | #define HAVE_LIBZ 1 > | #define HAVE_FTS_H 1 > | #define HAVE_APR_POOLS_H 1 > | #define HAVE_LIBAPR_1 1 > | #define HAVE_BOOST_VERSION_HPP 1 > | #define HAVE_LIBCURL 1 > | /* end confdefs.h. */ > | #include > configure:21642: result: no > configure:21642: checking for glog/logging.h > configure:21642: result: no > configure:21674: error: cannot find glog > --- > You have requested the use of a non-bundled glog but no suitable > glog could be found. > You may want specify the location of glog by providing a prefix > path via --with-glog=DIR, or check that the path you provided is > correct if you're already doing this. > --- > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8078) Some fields went missing with no replacement in api/v1
[ https://issues.apache.org/jira/browse/MESOS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-8078: -- Shepherd: Greg Mann (was: Till Toenshoff) > Some fields went missing with no replacement in api/v1 > -- > > Key: MESOS-8078 > URL: https://issues.apache.org/jira/browse/MESOS-8078 > Project: Mesos > Issue Type: Story > Components: HTTP API >Reporter: Dmitrii Rozhkov >Assignee: Vinod Kone > Labels: mesosphere > > Hi friends, > These fields are available via the state.json but went missing in the v1 of > the API: > leader_info > start_time > elected_time > As we're showing them on the Overview page of the DC/OS UI, yet would like > not be using state.json, it would be great to have them somewhere in V1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7790) Design hierarchical quota allocation.
[ https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7790: -- Shepherd: Benjamin Mahler > Design hierarchical quota allocation. > - > > Key: MESOS-7790 > URL: https://issues.apache.org/jira/browse/MESOS-7790 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Benjamin Mahler >Assignee: Michael Park > Labels: multitenancy > > When quota is assigned in the role hierarchy (see MESOS-6375), it's possible > for there to be "undelegated" quota for a role. For example: > {noformat} > ^ > / \ > / \ >eng (90 cpus) sales (10 cpus) > ^ >/ \ > / \ > ads (50 cpus) build (10 cpus) > {noformat} > Here, the "eng" role has 60 of its 90 cpus of quota delegated to its > children, and 30 cpus remain undelegated. We need to design how to allocate > these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" > role? Are they allocated to the "eng" role tree? If so, how do we determine > how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", > "eng/build"). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7790) Design hierarchical quota allocation.
[ https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246079#comment-16246079 ] Vinod Kone commented on MESOS-7790: --- [~mcypark] Can you link the design doc and move this to reviewable? > Design hierarchical quota allocation. > - > > Key: MESOS-7790 > URL: https://issues.apache.org/jira/browse/MESOS-7790 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Benjamin Mahler >Assignee: Michael Park > Labels: multitenancy > > When quota is assigned in the role hierarchy (see MESOS-6375), it's possible > for there to be "undelegated" quota for a role. For example: > {noformat} > ^ > / \ > / \ >eng (90 cpus) sales (10 cpus) > ^ >/ \ > / \ > ads (50 cpus) build (10 cpus) > {noformat} > Here, the "eng" role has 60 of its 90 cpus of quota delegated to its > children, and 30 cpus remain undelegated. We need to design how to allocate > these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" > role? Are they allocated to the "eng" role tree? If so, how do we determine > how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", > "eng/build"). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7905) GarbageCollectorIntegrationTest.ExitedFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7905: -- Sprint: Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66 (was: Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 67) > GarbageCollectorIntegrationTest.ExitedFramework is flaky > > > Key: MESOS-7905 > URL: https://issues.apache.org/jira/browse/MESOS-7905 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Vinod Kone > > Observed this on ASF CI. > {code} > [ RUN ] GarbageCollectorIntegrationTest.ExitedFramework > I0818 23:51:42.881799 5882 cluster.cpp:162] Creating default 'local' > authorizer > I0818 23:51:42.884285 5907 master.cpp:442] Master > 6d3f4c59-27e2-4701-9f7f-7c1f301e7fba (ef22537e2401) started on > 172.17.0.3:57495 > I0818 23:51:42.884332 5907 master.cpp:444] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/rYJzr3/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" > --work_dir="/tmp/rYJzr3/master" --zk_session_timeout="10secs" > I0818 23:51:42.884627 5907 master.cpp:494] Master only allowing > authenticated frameworks to register > I0818 23:51:42.884644 5907 master.cpp:508] Master only allowing > authenticated agents to register > I0818 23:51:42.884658 5907 master.cpp:521] Master only allowing > authenticated HTTP frameworks to register > I0818 23:51:42.884774 5907 credentials.hpp:37] Loading credentials for > authentication from '/tmp/rYJzr3/credentials' > I0818 23:51:42.885066 5907 master.cpp:566] Using default 'crammd5' > authenticator > I0818 23:51:42.885213 5907 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0818 23:51:42.885382 5907 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0818 23:51:42.885512 5907 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0818 23:51:42.885640 5907 master.cpp:646] Authorization enabled > I0818 23:51:42.885818 5903 hierarchical.cpp:171] Initialized hierarchical > allocator process > I0818 23:51:42.886016 5905 whitelist_watcher.cpp:77] No whitelist given > I0818 23:51:42.889050 5908 master.cpp:2163] Elected as the leading master! > I0818 23:51:42.889081 5908 master.cpp:1702] Recovering from registrar > I0818 23:51:42.889387 5909 registrar.cpp:347] Recovering registrar > I0818 23:51:42.889838 5909 registrar.cpp:391] Successfully fetched the > registry (0B) in 409856ns > I0818 23:51:42.889966 5909 registrar.cpp:495] Applied 1 operations in > 38859ns; attempting to update the registry > I0818 23:51:42.890450 5909 registrar.cpp:552] Successfully updated the > registry in 425216ns > I0818 23:51:42.890552 5909 registrar.cpp:424] Successfully recovered > registrar > I0818 23:51:42.890890 5909 master.cpp:1801] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0818 23:51:42.890969 5910 hierarchical.cpp:209] Skipping recovery of > hierarchical allocator: nothing to recover > I0818 23:51:42.895795 5882 process.cpp:3228] Attempting to spawn already > spawned process files@172.17.0.3:57495 > I0818 23:51:42.896057 5882 cluster.cpp:448] Creating default 'local' > authorizer > I0818 23:51:42.897809 5904 slave.cpp:250] Mesos agent started on > (85)@172.17.0.3:57495 > I0818 23:51:42.897848 5904 slave.cpp:251] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://; >
[jira] [Updated] (MESOS-8055) Design doc for offer operations feedback
[ https://issues.apache.org/jira/browse/MESOS-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-8055: -- Shepherd: Jie Yu > Design doc for offer operations feedback > > > Key: MESOS-8055 > URL: https://issues.apache.org/jira/browse/MESOS-8055 > Project: Mesos > Issue Type: Documentation >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > > https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7905) GarbageCollectorIntegrationTest.ExitedFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya reassigned MESOS-7905: - Assignee: (was: Kapil Arya) > GarbageCollectorIntegrationTest.ExitedFramework is flaky > > > Key: MESOS-7905 > URL: https://issues.apache.org/jira/browse/MESOS-7905 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Vinod Kone > > Observed this on ASF CI. > {code} > [ RUN ] GarbageCollectorIntegrationTest.ExitedFramework > I0818 23:51:42.881799 5882 cluster.cpp:162] Creating default 'local' > authorizer > I0818 23:51:42.884285 5907 master.cpp:442] Master > 6d3f4c59-27e2-4701-9f7f-7c1f301e7fba (ef22537e2401) started on > 172.17.0.3:57495 > I0818 23:51:42.884332 5907 master.cpp:444] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/rYJzr3/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" > --work_dir="/tmp/rYJzr3/master" --zk_session_timeout="10secs" > I0818 23:51:42.884627 5907 master.cpp:494] Master only allowing > authenticated frameworks to register > I0818 23:51:42.884644 5907 master.cpp:508] Master only allowing > authenticated agents to register > I0818 23:51:42.884658 5907 master.cpp:521] Master only allowing > authenticated HTTP frameworks to register > I0818 23:51:42.884774 5907 credentials.hpp:37] Loading credentials for > authentication from '/tmp/rYJzr3/credentials' > I0818 23:51:42.885066 5907 master.cpp:566] Using default 'crammd5' > authenticator > I0818 23:51:42.885213 5907 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0818 23:51:42.885382 5907 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0818 23:51:42.885512 5907 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0818 23:51:42.885640 5907 master.cpp:646] Authorization enabled > I0818 23:51:42.885818 5903 hierarchical.cpp:171] Initialized hierarchical > allocator process > I0818 23:51:42.886016 5905 whitelist_watcher.cpp:77] No whitelist given > I0818 23:51:42.889050 5908 master.cpp:2163] Elected as the leading master! > I0818 23:51:42.889081 5908 master.cpp:1702] Recovering from registrar > I0818 23:51:42.889387 5909 registrar.cpp:347] Recovering registrar > I0818 23:51:42.889838 5909 registrar.cpp:391] Successfully fetched the > registry (0B) in 409856ns > I0818 23:51:42.889966 5909 registrar.cpp:495] Applied 1 operations in > 38859ns; attempting to update the registry > I0818 23:51:42.890450 5909 registrar.cpp:552] Successfully updated the > registry in 425216ns > I0818 23:51:42.890552 5909 registrar.cpp:424] Successfully recovered > registrar > I0818 23:51:42.890890 5909 master.cpp:1801] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0818 23:51:42.890969 5910 hierarchical.cpp:209] Skipping recovery of > hierarchical allocator: nothing to recover > I0818 23:51:42.895795 5882 process.cpp:3228] Attempting to spawn already > spawned process files@172.17.0.3:57495 > I0818 23:51:42.896057 5882 cluster.cpp:448] Creating default 'local' > authorizer > I0818 23:51:42.897809 5904 slave.cpp:250] Mesos agent started on > (85)@172.17.0.3:57495 > I0818 23:51:42.897848 5904 slave.cpp:251] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://; > --appc_store_dir="/tmp/GarbageCollectorIntegrationTest_ExitedFramework_DayibR/store/appc" > --authenticate_http_executors="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true"
[jira] [Assigned] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers
[ https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song reassigned MESOS-8158: --- Assignee: Gilbert Song > Mesos Agent in docker neglects to retry discovering Task docker containers > -- > > Key: MESOS-8158 > URL: https://issues.apache.org/jira/browse/MESOS-8158 > Project: Mesos > Issue Type: Bug > Components: agent, containerization, docker, executor >Affects Versions: 1.4.0 > Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4 >Reporter: Charles Allen >Assignee: Gilbert Song > > I have attempted to launch Mesos agents inside of a docker container in such > a way where the agent docker can be replaced and recovered. Unfortunately I > hit a major snag in the way the mesos docker launching works. > To test simple functionality a marathon app is setup that simply has the > following command: {{date && python -m SimpleHTTPServer $PORT0}} > That way the HTTP port can be accessed to assure things are being assigned > correctly, and the date is printed out in the log. > When I attempt to start this marathon app, the mesos agent (inside a docker > container) properly launches an executor which properly creates a second task > that launches the python code. Here's the output from the executor logs (this > looks correct): > {code} > I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0 > I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent > d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0 > I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on > 10.0.75.2 > I1101 20:34:03.428680 68281 executor.cpp:160] Starting task > testapp.fe35282f-bf43-11e7-a24b-0242ac110002 > I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H > unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e > HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e > MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS > =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e > MARATHON_APP_RESOURCE_MEM=128.0 -e > MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e > MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e > MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA > SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e > PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v > /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp > .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox > --net host --entrypoint /bin/sh --name > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 > -c date && p > ython -m SimpleHTTPServer $PORT0 > I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero > status code. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero > status code. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container > not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > Wed Nov 1 20:34:06 UTC 2017 > {code} > But, somehow there is a TASK_FAILED message sent to marathon. > Upon further investigation, the following snippet can be found in the agent > logs (running in a docker container) > {code} > I1101 20:34:00.949129 9 slave.cpp:1736] Got assigned task > 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework > a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 > I1101 20:34:00.950150 9 gc.cpp:93] Unscheduling > '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001' > from gc > I1101 20:34:00.950225 9 gc.cpp:93] Unscheduling >
[jira] [Updated] (MESOS-7881) Building gRPC with CMake
[ https://issues.apache.org/jira/browse/MESOS-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7881: -- Sprint: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66 (was: Mesosphere Sprint 61, Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66, Mesosphere Sprint 67) > Building gRPC with CMake > > > Key: MESOS-7881 > URL: https://issues.apache.org/jira/browse/MESOS-7881 > Project: Mesos > Issue Type: Improvement >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > Labels: storage > Fix For: 1.4.0 > > > gRPC manages its own third-party libraries, which overlap with Mesos' > third-party library bundles. We need to write proper rules in CMake to > configure gRPC's CMake properly to build it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7924) Add a javascript linter to the webui.
[ https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245903#comment-16245903 ] Kevin Klues commented on MESOS-7924: {noformat} commit 844590611067d04de86a2de923b21ef377554728 (HEAD -> master, upstream/master) Author: Armand GrilletDate: Thu Nov 9 16:53:40 2017 +0100 Added JavaScript linter. The linter runs when changes on a JavaScript file are being committed. We use ESLint with a configuration based on our current JS code base. The linter and its dependencies (i.e. Node.js) are installed in a virtual environment using Virtualenv and then Nodeenv. Review: https://reviews.apache.org/r/62214/ {noformat} > Add a javascript linter to the webui. > - > > Key: MESOS-7924 > URL: https://issues.apache.org/jira/browse/MESOS-7924 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Benjamin Mahler >Assignee: Armand Grillet > Labels: tech-debt > Fix For: 1.5.0 > > > As far as I can tell, javascript linters (e.g. ESLint) help catch some > functional errors as well, for example, we've made some "strict" mistakes a > few times that ESLint can catch: MESOS-6624, MESOS-7912. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.
[ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7506: --- Description: I've observed a number of flaky tests that leave orphan containers upon cleanup. A typical log looks like this: {noformat} ../../src/tests/cluster.cpp:580: Failure Value of: containers->empty() Actual: false Expected: true Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } {noformat} All currently affected tests: {noformat} ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 SlaveRecoveryTest/0.RecoverUnregisteredExecutor SlaveRecoveryTest/0.CleanupExecutor SlaveRecoveryTest/0.RecoverTerminatedExecutor SlaveTest.ShutdownUnregisteredExecutor SlaveTest.RestartSlaveRequireExecutorAuthentication ShutdownUnregisteredExecutor {noformat} was: I've observed a number of flaky tests that leave orphan containers upon cleanup. A typical log looks like this: {noformat} ../../src/tests/cluster.cpp:580: Failure Value of: containers->empty() Actual: false Expected: true Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } {noformat} All currently affected tests: {noformat} ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 SlaveRecoveryTest/0.RecoverUnregisteredExecutor SlaveRecoveryTest/0.CleanupExecutor SlaveRecoveryTest/0.RecoverTerminatedExecutor SlaveTest.ShutdownUnregisteredExecutor SlaveTest.RestartSlaveRequireExecutorAuthentication ShutdownUnregisteredExecutor {noformat} > Multiple tests leave orphan containers. > --- > > Key: MESOS-7506 > URL: https://issues.apache.org/jira/browse/MESOS-7506 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 16.04 > Fedora 23 > other Linux distros >Reporter: Alexander Rukletsov >Assignee: Andrei Budnik > Labels: containerizer, flaky-test, mesosphere > Attachments: KillMultipleTasks-badrun.txt, > ResourceLimitation-badrun.txt, TaskWithFileURI-badrun.txt > > > I've observed a number of flaky tests that leave orphan containers upon > cleanup. A typical log looks like this: > {noformat} > ../../src/tests/cluster.cpp:580: Failure > Value of: containers->empty() > Actual: false > Expected: true > Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } > {noformat} > All currently affected tests: > {noformat} > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 > SlaveRecoveryTest/0.RecoverUnregisteredExecutor > SlaveRecoveryTest/0.CleanupExecutor > SlaveRecoveryTest/0.RecoverTerminatedExecutor > SlaveTest.ShutdownUnregisteredExecutor > SlaveTest.RestartSlaveRequireExecutorAuthentication > ShutdownUnregisteredExecutor > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.
[ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7506: --- Description: I've observed a number of flaky tests that leave orphan containers upon cleanup. A typical log looks like this: {noformat} ../../src/tests/cluster.cpp:580: Failure Value of: containers->empty() Actual: false Expected: true Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } {noformat} All currently affected tests: {noformat} ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 SlaveRecoveryTest/0.RecoverUnregisteredExecutor SlaveRecoveryTest/0.CleanupExecutor SlaveRecoveryTest/0.RecoverTerminatedExecutor SlaveTest.ShutdownUnregisteredExecutor SlaveTest.RestartSlaveRequireExecutorAuthentication ShutdownUnregisteredExecutor {noformat} was: I've observed a number of flaky tests that leave orphan containers upon cleanup. A typical log looks like this: {noformat} ../../src/tests/cluster.cpp:580: Failure Value of: containers->empty() Actual: false Expected: true Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } {noformat} All currently affected tests: {noformat} ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 SlaveRecoveryTest/0.RecoverUnregisteredExecutor SlaveRecoveryTest/0.CleanupExecutor SlaveRecoveryTest/0.RecoverTerminatedExecutor SlaveTest.ShutdownUnregisteredExecutor SlaveTest.RestartSlaveRequireExecutorAuthentication ShutdownUnregisteredExecutor {noformat} > Multiple tests leave orphan containers. > --- > > Key: MESOS-7506 > URL: https://issues.apache.org/jira/browse/MESOS-7506 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 16.04 > Fedora 23 > other Linux distros >Reporter: Alexander Rukletsov >Assignee: Andrei Budnik > Labels: containerizer, flaky-test, mesosphere > Attachments: KillMultipleTasks-badrun.txt, > ResourceLimitation-badrun.txt, TaskWithFileURI-badrun.txt > > > I've observed a number of flaky tests that leave orphan containers upon > cleanup. A typical log looks like this: > {noformat} > ../../src/tests/cluster.cpp:580: Failure > Value of: containers->empty() > Actual: false > Expected: true > Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } > {noformat} > All currently affected tests: > {noformat} > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 > SlaveRecoveryTest/0.RecoverUnregisteredExecutor > SlaveRecoveryTest/0.CleanupExecutor > SlaveRecoveryTest/0.RecoverTerminatedExecutor > SlaveTest.ShutdownUnregisteredExecutor > SlaveTest.RestartSlaveRequireExecutorAuthentication > ShutdownUnregisteredExecutor > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7924) Add a javascript linter to the webui.
[ https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-7924: --- Shepherd: Kevin Klues (was: Benjamin Mahler) > Add a javascript linter to the webui. > - > > Key: MESOS-7924 > URL: https://issues.apache.org/jira/browse/MESOS-7924 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Benjamin Mahler >Assignee: Armand Grillet > Labels: tech-debt > Fix For: 1.5.0 > > > As far as I can tell, javascript linters (e.g. ESLint) help catch some > functional errors as well, for example, we've made some "strict" mistakes a > few times that ESLint can catch: MESOS-6624, MESOS-7912. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7924) Add a javascript linter to the webui.
[ https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245903#comment-16245903 ] Kevin Klues edited comment on MESOS-7924 at 11/9/17 4:01 PM: - {noformat} commit 844590611067d04de86a2de923b21ef377554728 Author: Armand GrilletDate: Thu Nov 9 16:53:40 2017 +0100 Added JavaScript linter. The linter runs when changes on a JavaScript file are being committed. We use ESLint with a configuration based on our current JS code base. The linter and its dependencies (i.e. Node.js) are installed in a virtual environment using Virtualenv and then Nodeenv. Review: https://reviews.apache.org/r/62214/ {noformat} was (Author: klueska): {noformat} commit 844590611067d04de86a2de923b21ef377554728 (HEAD -> master, upstream/master) Author: Armand Grillet Date: Thu Nov 9 16:53:40 2017 +0100 Added JavaScript linter. The linter runs when changes on a JavaScript file are being committed. We use ESLint with a configuration based on our current JS code base. The linter and its dependencies (i.e. Node.js) are installed in a virtual environment using Virtualenv and then Nodeenv. Review: https://reviews.apache.org/r/62214/ {noformat} > Add a javascript linter to the webui. > - > > Key: MESOS-7924 > URL: https://issues.apache.org/jira/browse/MESOS-7924 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Benjamin Mahler >Assignee: Armand Grillet > Labels: tech-debt > Fix For: 1.5.0 > > > As far as I can tell, javascript linters (e.g. ESLint) help catch some > functional errors as well, for example, we've made some "strict" mistakes a > few times that ESLint can catch: MESOS-6624, MESOS-7912. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.
[ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7506: --- Attachment: ResourceLimitation-badrun2.txt > Multiple tests leave orphan containers. > --- > > Key: MESOS-7506 > URL: https://issues.apache.org/jira/browse/MESOS-7506 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 16.04 > Fedora 23 > other Linux distros >Reporter: Alexander Rukletsov >Assignee: Andrei Budnik > Labels: containerizer, flaky-test, mesosphere > Attachments: KillMultipleTasks-badrun.txt, > ResourceLimitation-badrun.txt, ResourceLimitation-badrun2.txt, > TaskWithFileURI-badrun.txt > > > I've observed a number of flaky tests that leave orphan containers upon > cleanup. A typical log looks like this: > {noformat} > ../../src/tests/cluster.cpp:580: Failure > Value of: containers->empty() > Actual: false > Expected: true > Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } > {noformat} > All currently affected tests: > {noformat} > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 > SlaveRecoveryTest/0.RecoverUnregisteredExecutor > SlaveRecoveryTest/0.CleanupExecutor > SlaveRecoveryTest/0.RecoverTerminatedExecutor > SlaveTest.ShutdownUnregisteredExecutor > SlaveTest.RestartSlaveRequireExecutorAuthentication > ShutdownUnregisteredExecutor > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.
[ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7506: --- Description: I've observed a number of flaky tests that leave orphan containers upon cleanup. A typical log looks like this: {noformat} ../../src/tests/cluster.cpp:580: Failure Value of: containers->empty() Actual: false Expected: true Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } {noformat} All currently affected tests: {noformat} ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 SlaveRecoveryTest/0.RecoverUnregisteredExecutor SlaveRecoveryTest/0.CleanupExecutor SlaveRecoveryTest/0.RecoverTerminatedExecutor SlaveTest.ShutdownUnregisteredExecutor SlaveTest.RestartSlaveRequireExecutorAuthentication ShutdownUnregisteredExecutor {noformat} was: I've observed a number of flaky tests that leave orphan containers upon cleanup. A typical log looks like this: {noformat} ../../src/tests/cluster.cpp:580: Failure Value of: containers->empty() Actual: false Expected: true Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } {noformat} All currently affected tests: {noformat} ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 SlaveRecoveryTest/0.RecoverUnregisteredExecutor SlaveRecoveryTest/0.CleanupExecutor SlaveRecoveryTest/0.RecoverTerminatedExecutor SlaveTest.ShutdownUnregisteredExecutor ShutdownUnregisteredExecutor {noformat} > Multiple tests leave orphan containers. > --- > > Key: MESOS-7506 > URL: https://issues.apache.org/jira/browse/MESOS-7506 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 16.04 > Fedora 23 > other Linux distros >Reporter: Alexander Rukletsov >Assignee: Andrei Budnik > Labels: containerizer, flaky-test, mesosphere > Attachments: KillMultipleTasks-badrun.txt, > ResourceLimitation-badrun.txt, TaskWithFileURI-badrun.txt > > > I've observed a number of flaky tests that leave orphan containers upon > cleanup. A typical log looks like this: > {noformat} > ../../src/tests/cluster.cpp:580: Failure > Value of: containers->empty() > Actual: false > Expected: true > Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } > {noformat} > All currently affected tests: > {noformat} > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.ResourceLimitation/0 > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 > SlaveRecoveryTest/0.RecoverUnregisteredExecutor > SlaveRecoveryTest/0.CleanupExecutor > SlaveRecoveryTest/0.RecoverTerminatedExecutor > SlaveTest.ShutdownUnregisteredExecutor > SlaveTest.RestartSlaveRequireExecutorAuthentication > ShutdownUnregisteredExecutor > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7742: --- Description: Observed this on ASF CI and internal Mesosphere CI. Affected tests: {noformat} AgentAPIStreamingTest.AttachInputToNestedContainerSession AgentAPITest.LaunchNestedContainerSession ContentType/AgentAPITest.AttachContainerInputAuthorization/0 {noformat} {code} [ RUN ] ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer I0629 05:49:33.182234 25306 master.cpp:436] Master 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726 I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" - -allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --au thenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/a5h5J3/credentials" --framework_sorter="drf" --help="false" --hostn ame_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="10 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registr y_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" - -version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated frameworks to register I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated agents to register I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated HTTP frameworks to register I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for authentication from '/tmp/a5h5J3/credentials' I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' authenticator I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical allocator process I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the registry (0B) in 183040ns I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; attempting to update the registry I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the registry in 147200ns I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of hierarchical allocator: nothing to recover I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy' I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on (644)@172.17.0.3:45726 I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local"
[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7742: --- Description: Observed this on ASF CI and internal Mesosphere CI. Affected tests: AgentAPIStreamingTest.AttachInputToNestedContainerSession AgentAPITest.LaunchNestedContainerSession ContentType/AgentAPITest.AttachContainerInputAuthorization/0 {code} [ RUN ] ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer I0629 05:49:33.182234 25306 master.cpp:436] Master 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726 I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" - -allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --au thenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/a5h5J3/credentials" --framework_sorter="drf" --help="false" --hostn ame_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="10 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registr y_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" - -version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated frameworks to register I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated agents to register I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated HTTP frameworks to register I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for authentication from '/tmp/a5h5J3/credentials' I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' authenticator I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical allocator process I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the registry (0B) in 183040ns I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; attempting to update the registry I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the registry in 147200ns I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of hierarchical allocator: nothing to recover I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy' I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on (644)@172.17.0.3:45726 I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local"
[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7742: --- Description: Observed this on ASF CI and internal Mesosphere CI. Affected tests: AgentAPIStreamingTest.AttachInputToNestedContainerSession {code} [ RUN ] ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer I0629 05:49:33.182234 25306 master.cpp:436] Master 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726 I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" - -allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --au thenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/a5h5J3/credentials" --framework_sorter="drf" --help="false" --hostn ame_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="10 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registr y_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" - -version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated frameworks to register I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated agents to register I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated HTTP frameworks to register I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for authentication from '/tmp/a5h5J3/credentials' I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' authenticator I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical allocator process I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the registry (0B) in 183040ns I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; attempting to update the registry I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the registry in 147200ns I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of hierarchical allocator: nothing to recover I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy' I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on (644)@172.17.0.3:45726 I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
[jira] [Commented] (MESOS-7924) Add a javascript linter to the webui.
[ https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245805#comment-16245805 ] Kevin Klues commented on MESOS-7924: {noformat} commit 0f674cb7fcc827ef241dc76fa40139e86717 Author: Armand GrilletDate: Thu Nov 9 16:17:12 2017 +0100 Removed pylint from the CLI requirements. Due to the new virtual environment located in /support, we do not need to have pylint in the CLI virtual environment anymore. Review: https://reviews.apache.org/r/63582/ {noformat} > Add a javascript linter to the webui. > - > > Key: MESOS-7924 > URL: https://issues.apache.org/jira/browse/MESOS-7924 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Benjamin Mahler >Assignee: Armand Grillet > Labels: tech-debt > Fix For: 1.5.0 > > > As far as I can tell, javascript linters (e.g. ESLint) help catch some > functional errors as well, for example, we've made some "strict" mistakes a > few times that ESLint can catch: MESOS-6624, MESOS-7912. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7742: --- Description: Observed this on ASF CI and internal Mesosphere CI. Affected tests: {code} [ RUN ] ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer I0629 05:49:33.182234 25306 master.cpp:436] Master 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726 I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" - -allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --au thenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/a5h5J3/credentials" --framework_sorter="drf" --help="false" --hostn ame_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="10 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registr y_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" - -version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated frameworks to register I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated agents to register I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated HTTP frameworks to register I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for authentication from '/tmp/a5h5J3/credentials' I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' authenticator I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical allocator process I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the registry (0B) in 183040ns I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; attempting to update the registry I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the registry in 147200ns I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered registrar I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of hierarchical allocator: nothing to recover I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0629 05:49:33.187396 25301 provisioner.cpp:249] Using default backend 'copy' I0629 05:49:33.189133 25301 cluster.cpp:448] Creating default 'local' authorizer I0629 05:49:33.189707 25306 slave.cpp:231] Mesos agent started on (644)@172.17.0.3:45726 I0629 05:49:33.189741 25306 slave.cpp:232] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos"