----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67264/#review203811 -----------------------------------------------------------
FAIL: Some of the unit tests failed. Please check the relevant logs. Reviews applied: `['67264']` Failed command: `Start-MesosCITesting` All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67264 Relevant logs: - [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67264/logs/mesos-tests-stdout.log): ``` [ OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (107 ms) [----------] 9 tests from Endpoint/SlaveEndpointTest (1017 ms total) [----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest [ RUN ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 [ OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (32 ms) [ RUN ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 [ OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (37 ms) [----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (71 ms total) [----------] 1 test from IsolationFlag/CpuIsolatorTest [ RUN ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 [ OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (749 ms) [----------] 1 test from IsolationFlag/CpuIsolatorTest (772 ms total) [----------] 1 test from IsolationFlag/MemoryIsolatorTest [ RUN ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 [ OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (729 ms) [----------] 1 test from IsolationFlag/MemoryIsolatorTest (754 ms total) [----------] Global test environment tear-down [==========] 981 tests from 95 test cases ran. (435642 ms total) [ PASSED ] 980 tests. [ FAILED ] 1 test, listed below: [ FAILED ] DockerContainerizerHealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange 1 FAILED TEST YOU HAVE 220 DISABLED TESTS ``` - [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67264/logs/mesos-tests-stderr.log): ``` I0524 20:36:29.580374 18960 master.cpp:10843] Updating the state of task df2af7bf-2c19-4766-8bc4-c846bf77e848 of framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED) I0524 20:36:29.580374 23248 slave.cpp:3935] Shutting down framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 I0524 20:36:29.580374 23248 slave.cpp:6656] Shutting down executor 'df2af7bf-2c19-4766-8bc4-c846bf77e848' of framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 at executor(1)@192.10.1.6:62490 I0524 20:36:29.582537 23248 slave.cpp:929] Agent terminating W0524 20:36:29.582537 23248 slave.cpp:3931] Ignoring shutdown framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 because it is terminating I0524 20:36:29.583359 18960 master.cpp:10942] Removing task df2af7bf-2c19-4766-8bc4-c846bf77e848 with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000]I0524 20:36:29.415405 19000 exec.cpp:162] Version: 1.7.0 I0524 20:36:29.440357 8332 exec.cpp:236] Executor registered on agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0 I0524 20:36:29.443363 13296 executor.cpp:178] Received SUBSCRIBED event I0524 20:36:29.448391 13296 executor.cpp:182] Subscribed executor on windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net I0524 20:36:29.448391 13296 executor.cpp:178] Received LAUNCH event I0524 20:36:29.453393 13296 executor.cpp:665] Starting task df2af7bf-2c19-4766-8bc4-c846bf77e848 I0524 20:36:29.535387 13296 executor.cpp:485] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>' I0524 20:36:29.553393 13296 executor.cpp:678] Forked command at 7304 I0524 20:36:29.582537 11212 exec.cpp:445] Executor asked to shutdown I0524 20:36:29.583359 20196 executor.cpp:178] Received SHUTDOWN event I0524 20:36:29.583359 20196 executor.cpp:781] Shutting down I0524 20:36:29.583359 20196 executor.cpp:894] Sending SIGTERM to process tree at pid 730 of framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 on agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0 at slave(448)@192.10.1.6:62469 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net) I0524 20:36:29.586357 18960 master.cpp:1293] Agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0 at slave(448)@192.10.1.6:62469 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net) disconnected I0524 20:36:29.586357 18960 master.cpp:3303] Disconnecting agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0 at slave(448)@192.10.1.6:62469 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net) I0524 20:36:29.586357 13776 hierarchical.cpp:344] Removed framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 I0524 20:36:29.587376 18960 master.cpp:3322] Deactivating agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0 at slave(448)@192.10.1.6:62469 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net) I0524 20:36:29.587376 23252 hierarchical.cpp:766] Agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0 deactivated I0524 20:36:29.587376 16924 containerizer.cpp:2401] Destroying container ea872d77-9ff4-42b0-9baf-e90055143d61 in RUNNING state I0524 20:36:29.588369 16924 containerizer.cpp:3015] Transitioning the state of container ea872d77-9ff4-42b0-9baf-e90055143d61 from RUNNING to DESTROYING I0524 20:36:29.588369 16924 launcher.cpp:156] Asked to destroy container ea872d77-9ff4-42b0-9baf-e90055143d61 I0524 20:36:29.629357 22740 containerizer.cpp:2854] Container ea872d77-9ff4-42b0-9baf-e90055143d61 has exited I0524 20:36:29.659430 18772 master.cpp:1135] Master terminating I0524 20:36:29.661396 23032 hierarchical.cpp:609] Removed agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0 I0524 20:36:30.096397 12160 process.cpp:940] Stopped the socket accept loop ``` - Mesos Reviewbot Windows On May 24, 2018, 7:48 p.m., Zhitao Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67264/ > ----------------------------------------------------------- > > (Updated May 24, 2018, 7:48 p.m.) > > > Review request for mesos, Jason Lai and Jie Yu. > > > Bugs: MESOS-8830 > https://issues.apache.org/jira/browse/MESOS-8830 > > > Repository: mesos > > > Description > ------- > > In various corner cases, agent may not get chance to properly unmount > persistent volumes mounted inside an executor's sandbox. When GC later > gets to these sandbox directories, permanent data loss can happen (see > MESOS-8830). > > This patch added some protection to unmount possible persistent > volumes inside a path to gc, and skipped the path if unmount failed. > > NOTE: this means agent will not garbage collect any path if it cannot > read its own `mountinfo` table. > > > Diffs > ----- > > src/local/local.cpp afff54653e8e659d947ddbee6dc38ba2715f2a78 > src/slave/gc.hpp df40165bb8a23f065156bf6c5f354b143d88c088 > src/slave/gc.cpp 390b35e6d17d6614a73c9548decbf10739560106 > src/slave/gc_process.hpp 20374ad91820341282fdf18ecade60a020e26cea > src/slave/main.cpp 646125344d590b28256d8ee684d7e51a90e82f23 > src/slave/paths.hpp 015896453410a33923eed07b3e676be19af62a48 > src/slave/paths.cpp ed0b1276908f4990ce7a24c96aea20e8c79d3126 > src/tests/cluster.cpp b56212f6529a4d307e65797ad9bb34f2104fc832 > src/tests/gc_tests.cpp 619ed22edd9b3909ea24cdcbf62c354420a8d031 > src/tests/mesos.hpp 733344a2f07ebd9d841a55fb9bbfda2e3c1a1eb2 > src/tests/mesos.cpp d3c87c295429481c59d5a49398e289a4b84e4496 > src/tests/slave_tests.cpp 65d860594572b58a50a89358e31e97fd2a10bf08 > > > Diff: https://reviews.apache.org/r/67264/diff/2/ > > > Testing > ------- > > Tested with following procedures: > 1. Start a test master and agent; > 2. Created a persistent volume on agent through operator API; > 3. Use `mesos-execute` to run a task; > 4. Stop the agent; > 5. Manually bind mount persistent volume path into a `volume` directory > inside the executor sandbox (to simulate a dangling mount in MESOS-8830); > 6. Restart agent with `--gc_disk_headroom=1.0 --gc_delay=1secs` to force it > gc the path immediately. > > With this fix, we observed that the dangling mount is automatically cleaned > up, and agent produces log line: > ``` > W0523 06:00:04.001075 82745 gc.cpp:229] Unmounting dangling mount point > '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0/frameworks/b3eb3aff-d19d-45ff-8113-f0316462d3fa-0000/executors/test_id/runs/1cd3bd06-2632-4541-a708-80c7cd51c74b/volume' > of persistent volume '/home/zhitao/mesos-workdir/volumes/roles/role/id1' > inside garbage collected path > '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0' > ``` > > > Thanks, > > Zhitao Li > >
