----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66644/#review201768 -----------------------------------------------------------
FAIL: Some of the unit tests failed. Please check the relevant logs. Reviews applied: `['66644']` Failed command: `Start-MesosCITesting` All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66644 Relevant logs: - [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66644/logs/mesos-tests-stdout.log): ``` [ OK ] OperationStatusUpdateManagerTest.RecoverNotCheckpointedStream (7 ms) [ RUN ] OperationStatusUpdateManagerTest.RecoverEmptyFile [ OK ] OperationStatusUpdateManagerTest.RecoverEmptyFile (14 ms) [ RUN ] OperationStatusUpdateManagerTest.RecoverEmptyDirectory [ OK ] OperationStatusUpdateManagerTest.RecoverEmptyDirectory (14 ms) [ RUN ] OperationStatusUpdateManagerTest.RecoverTerminatedStream [ OK ] OperationStatusUpdateManagerTest.RecoverTerminatedStream (19 ms) [ RUN ] OperationStatusUpdateManagerTest.IgnoreDuplicateUpdate [ OK ] OperationStatusUpdateManagerTest.IgnoreDuplicateUpdate (20 ms) [ RUN ] OperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover [ OK ] OperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover (16 ms) [ RUN ] OperationStatusUpdateManagerTest.RejectDuplicateAck [ OK ] OperationStatusUpdateManagerTest.RejectDuplicateAck (15 ms) [ RUN ] OperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover [ OK ] OperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover (15 ms) [ RUN ] OperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile [ OK ] OperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile (21 ms) [ RUN ] OperationStatusUpdateManagerTest.StrictRecoveryCorruptedFile [ OK ] OperationStatusUpdateManagerTest.StrictRecoveryCorruptedFile (20 ms) [ RUN ] OperationStatusUpdateManagerTest.UpdateLatestWhenResending [ OK ] OperationStatusUpdateManagerTest.UpdateLatestWhenResending (20 ms) [----------] 16 tests from OperationStatusUpdateManagerTest (277 ms total) [----------] 6 tests from PartitionTest [ RUN ] PartitionTest.PartitionedSlave [ OK ] PartitionTest.PartitionedSlave (286 ms) [ RUN ] PartitionTest.PartitionedSlaveExitedExecutor [ OK ] PartitionTest.PartitionedSlaveExitedExecutor (371 ms) [ RUN ] PartitionTest.TaskCompletedOnPartitionedAgent ``` - [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66644/logs/mesos-tests-stderr.log): ``` I0423 20:57:57.007681 18224 master.cpp:8517] Marked agent dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-S0 (winbldsrv-01.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) unreachable: health check timed out I0423 20:57:57.007681 18224 master.cpp:10482] Updating the state of task 1 of framework dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-0000 (latest state: TASK_LOST, status update state: TASK_LOST) I0423 20:57:57.009660 26088 hierarchical.cpp:609] Removed agent dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-S0 I0423 20:57:57.010648 18224 master.cpp:10581] Removing task 1 with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-0000 on agent dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-S0 at slave(87)@10.3.1.8:50409 (winbldsrv-01.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) I0423 20:57:57.010648 18224 master.cpp:8147] Sending status update TASK_LOST for task 1 of framework dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-0000 'health check timed out' I0423 20:57:57.011663 18224 master.cpp:10610] Removing executor 'default' with resources [] of framework dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-0000 on agent dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-S0 at slave(87)@10.3.1.8:50409 (winbldsrv-01.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) I0423 20:57:57.013650 18224 master.cpp:2045] Notifying framework dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-0000 (default) at [email protected]:50409 of lost agent dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-S0 (winbldsrv-01.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) I0423 20:57:57.014663 30016 slave.cpp:5243] Handling status update TASK_FINISHED (Status UUID: 632972b0-c915-4622-a421-7a0c7e536d4b) for task 1 of framework dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-0000 from executor(31)@10.3.1.8:50409 I0423 20:57:57.015681 30016 slave.cpp:1253] Lost leading master I0423 20:57:57.015681 4088 task_status_update_manager.cpp:181] Pausing sending task status updates I0423 20:57:57.016660 30016 slave.cpp:1315] Detecting new master I0423 20:57:57.017660 30016 slave.cpp:1260] New master detected at [email protected]:50409 I0423 20:57:57.017660 29096 task_status_update_manager.cpp:181] Pausing sending task status updates I0423 20:57:57.017660 30016 slave.cpp:1315] Detecting new master I0423 20:57:57.019656 30016 slave.cpp:1342] Authenticating with master [email protected]:50409 I0423 20:57:57.018729 4128 task_status_update_manager.cpp:328] Received task status update TASK_FINISHED (Status UUID: 632972b0-c915-4622-a421-7a0c7e536d4b) for task 1 of framework dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-0000 I0423 20:57:57.019656 30016 slave.cpp:1351] Using default CRAM-MD5 authenticatee I0423 20:57:57.019656 26088 authenticatee.cpp:121] Creating new client SASL connection I0423 20:57:57.019656 30016 slave.cpp:5644] Sending acknowledgement for status update TASK_FINISHED (Status UUID: 632972b0-c915-4622-a421-7a0c7e536d4b) for task 1 of framework dc3b2518-4a8c-4d3a-bd8e-a36dfba3d82a-0000 to executor(31)@10.3.1.8:50409 I0423 20:57:57.021664 31320 master.cpp:9227] Authenticating slave(87)@10.3.1.8:50409 I0423 20:57:57.021664 32920 authenticator.cpp:98] Creating new server SASL connection I0423 20:57:57.022657 30164 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 I0423 20:57:57.023687 30164 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' I0423 20:57:57.023687 30656 authenticator.cpp:204] Received SASL authentication start I0423 20:57:57.023687 30656 authenticator.cpp:326] Authentication requires more steps I0423 20:57:57.023687 28796 authenticatee.cpp:259] Received SASL authentication step I0423 20:57:57.024680 33024 authenticator.cpp:232] Received SASL authentication step I0423 20:57:57.024680 33024 authenticator.cpp:318] Authentication success I0423 20:57:57.024680 4088 authenticatee.cpp:299] Authentication success I0423 ``` - Mesos Reviewbot Windows On April 23, 2018, 6:11 p.m., Megha Sharma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66644/ > ----------------------------------------------------------- > > (Updated April 23, 2018, 6:11 p.m.) > > > Review request for mesos and Jiang Yan Xu. > > > Bugs: 8750 > https://issues.apache.org/jira/browse/8750 > > > Repository: mesos > > > Description > ------- > > A RunTask messsage could get dropped for an agent while it's > disconnected from the master and when such an agent goes unreachable > then this dropped task message gets added to the unreachable tasks. > When the agent re-registers, the master sends status updates for the > tasks that the agent reported when re-registering and these tasks are > also removed from the unreachableTasks on the framework but since the > agent doesn't know about the dropped task so it doesn't get removed > from the unreachableTasks leading to a check failure when > this inconsistency is detected during framework removal. > > > Diffs > ----- > > src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 > src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a > src/tests/partition_tests.cpp 9138e5c745cf354a3573e1ab0b251d46702833cc > > > Diff: https://reviews.apache.org/r/66644/diff/1/ > > > Testing > ------- > > > Thanks, > > Megha Sharma > >
