----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67264/#review203912 -----------------------------------------------------------
src/slave/gc.cpp Lines 221 (patched) <https://reviews.apache.org/r/67264/#comment286292> You may want to iterate the mount entries in reversed order. Otherwise, you would likely run into the cases where you try to unmount a parent directory before its descendant mount points get umounted. src/slave/gc.cpp Lines 228 (patched) <https://reviews.apache.org/r/67264/#comment286290> For checking whether a path is a descendant of a directory, it's not enough to just use `strings::startsWith`, as you run into the case where `strings::startsWith("/mnt/something-else", "/mnt/something")` returns `true`. It would be safer to check the following: 1) Check if `entry.target` == `info->path`; 2) Check if `strings::startsWith(entry.target, path::join(info->path, ""))` (`info->path` suffied with a `"/"`); - Jason Lai On May 24, 2018, 7:48 p.m., Zhitao Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67264/ > ----------------------------------------------------------- > > (Updated May 24, 2018, 7:48 p.m.) > > > Review request for mesos, Chun-Hung Hsiao, Jason Lai, and Jie Yu. > > > Bugs: MESOS-8830 > https://issues.apache.org/jira/browse/MESOS-8830 > > > Repository: mesos > > > Description > ------- > > In various corner cases, agent may not get chance to properly unmount > persistent volumes mounted inside an executor's sandbox. When GC later > gets to these sandbox directories, permanent data loss can happen (see > MESOS-8830). > > This patch added some protection to unmount possible persistent > volumes inside a path to gc, and skipped the path if unmount failed. > > NOTE: this means agent will not garbage collect any path if it cannot > read its own `mountinfo` table. > > > Diffs > ----- > > src/local/local.cpp afff54653e8e659d947ddbee6dc38ba2715f2a78 > src/slave/gc.hpp df40165bb8a23f065156bf6c5f354b143d88c088 > src/slave/gc.cpp 390b35e6d17d6614a73c9548decbf10739560106 > src/slave/gc_process.hpp 20374ad91820341282fdf18ecade60a020e26cea > src/slave/main.cpp 646125344d590b28256d8ee684d7e51a90e82f23 > src/slave/paths.hpp 015896453410a33923eed07b3e676be19af62a48 > src/slave/paths.cpp ed0b1276908f4990ce7a24c96aea20e8c79d3126 > src/tests/cluster.cpp b56212f6529a4d307e65797ad9bb34f2104fc832 > src/tests/gc_tests.cpp 619ed22edd9b3909ea24cdcbf62c354420a8d031 > src/tests/mesos.hpp 733344a2f07ebd9d841a55fb9bbfda2e3c1a1eb2 > src/tests/mesos.cpp d3c87c295429481c59d5a49398e289a4b84e4496 > src/tests/slave_tests.cpp 65d860594572b58a50a89358e31e97fd2a10bf08 > > > Diff: https://reviews.apache.org/r/67264/diff/2/ > > > Testing > ------- > > Tested with following procedures: > 1. Start a test master and agent; > 2. Created a persistent volume on agent through operator API; > 3. Use `mesos-execute` to run a task; > 4. Stop the agent; > 5. Manually bind mount persistent volume path into a `volume` directory > inside the executor sandbox (to simulate a dangling mount in MESOS-8830); > 6. Restart agent with `--gc_disk_headroom=1.0 --gc_delay=1secs` to force it > gc the path immediately. > > With this fix, we observed that the dangling mount is automatically cleaned > up, and agent produces log line: > ``` > W0523 06:00:04.001075 82745 gc.cpp:229] Unmounting dangling mount point > '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0/frameworks/b3eb3aff-d19d-45ff-8113-f0316462d3fa-0000/executors/test_id/runs/1cd3bd06-2632-4541-a708-80c7cd51c74b/volume' > of persistent volume '/home/zhitao/mesos-workdir/volumes/roles/role/id1' > inside garbage collected path > '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0' > ``` > > > Thanks, > > Zhitao Li > >
