----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67264/#review204328 -----------------------------------------------------------
src/slave/gc.cpp Lines 225 (patched) <https://reviews.apache.org/r/67264/#comment286822> `isPersistentVolumePath` won't work for MOUNT type persistent volumes. Currently, the only mounts in the host mount namespace under the sandbox directories (i.e., `/var/lib/mesos/slaves/...` are persistent volume mounts. I'd suggest we compare `entry.target` with that to determine if we need to unmount. src/tests/gc_tests.cpp Line 99 (original), 99 (patched) <https://reviews.apache.org/r/67264/#comment286766> remove one `;`? src/tests/mesos.cpp Line 694 (original), 694 (patched) <https://reviews.apache.org/r/67264/#comment286767> Please move `:` to the next line to follow our style guide - Jie Yu On May 31, 2018, 8:38 p.m., Zhitao Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67264/ > ----------------------------------------------------------- > > (Updated May 31, 2018, 8:38 p.m.) > > > Review request for mesos, Chun-Hung Hsiao, Jason Lai, and Jie Yu. > > > Bugs: MESOS-8830 > https://issues.apache.org/jira/browse/MESOS-8830 > > > Repository: mesos > > > Description > ------- > > In various corner cases, agent may not get chance to properly unmount > persistent volumes mounted inside an executor's sandbox. When GC later > gets to these sandbox directories, permanent data loss can happen (see > MESOS-8830). > > This patch added some protection to unmount possible persistent > volumes inside a path to gc, and skipped the path if unmount failed. > > NOTE: this means agent will not garbage collect any path if it cannot > read its own `mountinfo` table. > > > Diffs > ----- > > src/local/local.cpp afff54653e8e659d947ddbee6dc38ba2715f2a78 > src/slave/gc.hpp df40165bb8a23f065156bf6c5f354b143d88c088 > src/slave/gc.cpp 390b35e6d17d6614a73c9548decbf10739560106 > src/slave/gc_process.hpp 20374ad91820341282fdf18ecade60a020e26cea > src/slave/main.cpp 646125344d590b28256d8ee684d7e51a90e82f23 > src/slave/paths.hpp 015896453410a33923eed07b3e676be19af62a48 > src/slave/paths.cpp ed0b1276908f4990ce7a24c96aea20e8c79d3126 > src/tests/cluster.cpp b56212f6529a4d307e65797ad9bb34f2104fc832 > src/tests/gc_tests.cpp 619ed22edd9b3909ea24cdcbf62c354420a8d031 > src/tests/mesos.hpp 733344a2f07ebd9d841a55fb9bbfda2e3c1a1eb2 > src/tests/mesos.cpp d3c87c295429481c59d5a49398e289a4b84e4496 > src/tests/slave_tests.cpp 65d860594572b58a50a89358e31e97fd2a10bf08 > > > Diff: https://reviews.apache.org/r/67264/diff/4/ > > > Testing > ------- > > Added a unit test in following patch. > > Tested with following procedures: > 1. Start a test master and agent; > 2. Created a persistent volume on agent through operator API; > 3. Use `mesos-execute` to run a task; > 4. Stop the agent; > 5. Manually bind mount persistent volume path into a `volume` directory > inside the executor sandbox (to simulate a dangling mount in MESOS-8830); > 6. Restart agent with `--gc_disk_headroom=1.0 --gc_delay=1secs` to force it > gc the path immediately. > > With this fix, we observed that the dangling mount is automatically cleaned > up, and agent produces log line: > ``` > W0523 06:00:04.001075 82745 gc.cpp:229] Unmounting dangling mount point > '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0/frameworks/b3eb3aff-d19d-45ff-8113-f0316462d3fa-0000/executors/test_id/runs/1cd3bd06-2632-4541-a708-80c7cd51c74b/volume' > of persistent volume '/home/zhitao/mesos-workdir/volumes/roles/role/id1' > inside garbage collected path > '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0' > ``` > > > Thanks, > > Zhitao Li > >
