> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.hpp, line 120 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554434#file1554434line120> > > > > 'removal' and 'rm(dir)' sound too synonymous... > > > > > > To distinguish the two promises, perhaps name this one `gc` (to > > indicate it's for the overall gc and gc implies a sense of "scheduled for > > the future" to me). In any case we should use comments to make clear what > > is for what. > > > > e.g., > > > > ``` > > // The promise for the scheduled gc of this path. > > const process::Owned<process::Promise<Nothing>> gc; > > ```
Agreed, changed to gc/removal. > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.hpp, lines 128-131 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554434#file1554434line128> > > > > This is too low-level for a summary IMO. Some of the details, if > > necessary, could be put in the method definition. Removed from summary, added a few more details to the method def. > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.cpp, line 103 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line103> > > > > Here we can do > > > > ``` > > // Return `false` to be consistent with the behavior when `unschedule` > > is called after the path is removed. > > return info.rmdir->future() > > .then([]() { return false; }); > > ``` Makes sense, there isn't parity between true/false removal and true/false unschedule. > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.cpp, line 199 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line199> > > > > Would this be an abnormal case? If so we can log a warning. I think this is defensive, removing. I included this before changes to `reset()` > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.cpp, lines 213-217 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line213> > > > > 4 space indentation and put `self()` on the next line. > > > > Also, it looks like instead of having one callback for each rmdir, we > > should be probably `await` on all of them and have the callback for the > > overall result. This is because we need to reset the timer only when the > > batch of `rmdir`s completes. > > > > Of course then the overall callback needs to be informed about which > > paths this list of futures are for. > > > > i.e., something like this > > > > ``` > > hashmap<string, Future<Nothing>> futures; > > > > ... In foreach: > > > > futures[info.path] = async(&os::rmdir, info.path, true, true, true); > > > > ... > > > > await(values.values()) > > .onAny(...); > > > > ``` > > > > The `_remove()` method can be defined as: > > > > ``` > > void GarbageCollectorProcess::_remove( > > const Timeout& removalTime, > > const hashmap<string, Future<Nothing>>& futures) > > { > > } > > ``` Passing this down into the callback and looking it up in paths/timeouts is cleaner. > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.cpp, lines 225-228 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line225> > > > > Because you > > > > ``` > > if (it->second.rmdir.isSome()) { > > continue; > > } > > ``` > > > > above, it's possible for method to not trigger a `reset()` at all if we > > move it like this? If there is a removal to be done i.e. there are removalTime keys in our multimap the continuation `_remove` will trigger a `reset()`. > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.cpp, lines 117-118 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line117> > > > > Not yours but this should probaly be > > > > ``` > > UNREACHABLE() << "Inconsistent state across 'paths' and 'timeouts'"; > > ``` Hmmm like UNREACHABLE(); with a log line? > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.cpp, lines 188-193 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line188> > > > > I think we actually have to use the following? > > > > ``` > > map<process::Timeout, hashmap<std::string, PathInfo>> paths; > > ``` > > > > Because we rmdir asynchronously, when it's finished we need to retrieve > > the same `PathInfo` by both the removalTime and the path! > > > > If we do that, we can access all entries by reference so we won't have > > this problem. However we need to be cautious in cleaning up the outer entry > > when the inner map is all cleaned up. Ah hmmm are you saying instead of carrying the timeouts and paths data structures, carry a timeout to path/pathinfo map? I changed the logic to use `paths.get(removalTime)` and index into the new `futures` hashmap with the list of `pathInfo`. Maybe this is cleaner? > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.cpp, line 216 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line216> > > > > It's probably safe to pass a copy of `PathInfo` here but the fact that > > it's safe it's not obvious. > > > > We can probably just pass a `path` so in the continuation we can use > > `removalTime` and `path` to get the `PathInfo` back and we can just `CHECK` > > to assert that the entry should be there and add a comment on why it should > > be there (because we don't erase the any entry with a pending `rmdir`, we > > wait until it's finished). Removed in favor of looking up in `_remove` > On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote: > > src/slave/gc.cpp, lines 126-131 > > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line126> > > > > As it's written this way (along with the single path `_remove()`), in > > each batch of `rmdir()`s, the quickest one is going to trigger the next > > reset, by which time the head of the `paths` could be the same batch. > > > > We won't have this problem if we invoke `_remove()` once per "batch", > > in which case we don't need to change this method here. Right, this is unecessary with the batching since there is not trampling on toes when `_remove` streams in for each op. - Jacob ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53479/#review155639 ----------------------------------------------------------- On Nov. 12, 2016, 12:48 a.m., Jacob Janco wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53479/ > ----------------------------------------------------------- > > (Updated Nov. 12, 2016, 12:48 a.m.) > > > Review request for mesos, Benjamin Mahler and Jiang Yan Xu. > > > Bugs: MESOS-6549 > https://issues.apache.org/jira/browse/MESOS-6549 > > > Repository: mesos > > > Description > ------- > > Perform GC asynchronously. > > > Diffs > ----- > > src/slave/gc.hpp 5ea82456cffa374203371f227b6f0ee00296553e > src/slave/gc.cpp 961f547793984449ea113d9664b12b5033638c58 > > Diff: https://reviews.apache.org/r/53479/diff/ > > > Testing > ------- > > `make check` > > > Thanks, > > Jacob Janco > >
