> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote: > > src/resource_provider/storage/provider.cpp > > Lines 237 (patched) > > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line238> > > > > I'd just use lambda here: > > ``` > > driver->send(evolve(call)) > > .onFailed(defer(self(), [=](const string& failure) { > > LOG(ERROR) << "Failed to subscribe resource provider with type " > > << "'" info.type() << "' and name " > > << "'" << info.name() << "': " << failure; > > })) > > ... > > ```
Is it perferred to have `onFailed(...).onDiscarded(...)` with similar error logging code in both, or use `onAny(... LOG(ERROR) << ... << (future.isFailed() ? future.failure() : "future discarded"; ...)`? > On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote: > > src/resource_provider/storage/provider.cpp > > Lines 291 (patched) > > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line293> > > > > Let's introduce a paths.hpp helper for SLRP: > > ``` > > src/resource_provider/storage/paths.hpp > > ``` > > > > This can be `getCSIEndpointDirSymlinkPath` Do you want to expose this in `src/slave/paths.hpp`? I didn't create a helper because I think it's an SLRP-internal thing. > On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote: > > src/resource_provider/storage/provider.cpp > > Lines 296 (patched) > > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line298> > > > > What if someone choose a very long name for the provider? I would > > suggest we just use `os::mkdtemp` here. In fact, `os::temp` depends on > > TMPDIR env var. For instance, on my mac: > > > > ``` > > Jies-MacBook-Pro:tmp jie$ echo $TMPDIR > > /var/folders/cs/xp4tynrs69v0bbx55l4k5lx80000gn/T/ > > ``` > > > > It can be very long too. But let's punt that for now as it should be > > rare. I'm aware of `TMPDIR`. I've seen that both `os::temp` and `path::join(os::PATH_SEPARATOR, "tmp")` have been used in the codepath. Since we have the concern about the length, let's just use the latter, as it is safe to assume the standard `/tmp` dir exists in Un*x systems. > On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote: > > src/resource_provider/storage/provider.cpp > > Lines 337 (patched) > > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line339> > > > > You need to create metadata dir as well. > > > > Although I don't quite like the fact that it'll be a CHECK failure if > > the creation fails. Ideally, we check the return value and call `fatal` if > > the creation failed. > > > > Also, depends on the order you create both directories, the recovery > > logic in `initialize` might be different. In the recovery logic, it's > > likely one directory does not have a symlink (depends on the order here). > > > > One way is to only treat this RP as an existing RP if both directories > > exists in `initialize`. If the agent reregistered with a different slave ID, then only the top-level RP work dir will exist. I'd treat the lifecycle of the RP and the plugins differently: RP is an existing one if the meta dir exists, and plugin exists if the csi endpoint dir exists. The recovery logic now does not involve anything about the RP itself yet. > On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote: > > src/resource_provider/storage/provider.cpp > > Lines 341-363 (patched) > > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line343> > > > > Any reason do not move the logic in `initialize` related to csi > > endpoint dir here? It'll be much more readable if we group all logics > > related to CSIEndpointDir initialization here: > > > > Basically, you want to "ensure" that the directory and symlink is > > properly setup: > > > > 1) If symlink is there, and the linked dir is there, nothing needs to > > be one > > 2) If symlink is there, and the linked dir is not there, do the same as > > 3) > > 3) If symlink does not exist, create the tmp dir and link it The only reason is to fail fast. But yes I agree that it will be more readible if we put all related logic here. Will do. > On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote: > > src/resource_provider/storage/provider.cpp > > Lines 385-386 (patched) > > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line387> > > > > My thinking is that in `StorageLocalResourceProviderProcess`, we > > maintain two `Future<csi::Client>`, one for controller service and one for > > node service. > > > > ``` > > Future<csi::Client> controllerService; > > Future<csi::Client> nodeService; > > ``` > > > > The above two will become ready once CSI plugin containers are properly > > launched. > > > > Whenever the SLRP wants to use the csi client to talk to the plugin, > > it'll do something like this: > > > > ``` > > controllerSerivce.then(defer(self(), [](const csi::Client& client) { > > ... > > })); > > ``` > > > > We probably should make sure `controllerService` is either pending or > > ready, never failed or discarded. > > > > Now, there will be a loop in the background continuesly trying to > > launch the CSI container, and wait for the container, and relaunch it if it > > dies. This is kind of related to the `ContainerDaemon` idea in joseph's > > standalone container design: > > > > ``` > > Try<ContainerDaemon> ContainerDaemon::create( > > const URL& endpoint, > > const ContainerID& id, > > const CommandInfo& command, > > const Option<ContainerInfo>& container, > > const vector<lambda::function<void()>>& preStartHooks, > > const vector<lambda::function<void()>>& postStopHooks); > > > > void ContainerDaemon::terminate(); > > Future<Nothing> ContainerDaemon::wait(); > > ``` > > > > In SLRP, you still need to wait for the socket file to appear, and set > > the Future mentioned above. The `postStopHooks` above allow you to set the > > Future to pending if container is not running. In my next patch, I didn't wait for the future but just check `controllerService.isReady()`, but will change it accordingly. We have to do `controllerService.then(defer(self(), [](csi::Client client)` though, since I intentionally did not declare those call functions as `const`. But `csi::Client` is designed to just use the underlying copyable `process::grpc` wrappers so should be fine. > On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote: > > src/resource_provider/storage/provider.cpp > > Lines 629-630 (patched) > > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line631> > > > > This is a bit hard to read, how about: > > ``` > > if (plugin.name() == info.storage().controller_plugin()) { > > hasControllerPlugin = true; > > } > > if (plugin.name() == info.storage().node_plugin()) { > > hasNodePlugin = true; > > } > > ``` This is just for saving a little extra string comparisons once we have found the plugins, but this may not worth sacrificing the reabilibily. Will channge it. - Chun-Hung ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63021/#review188726 ----------------------------------------------------------- On Oct. 19, 2017, 3:49 a.m., Chun-Hung Hsiao wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63021/ > ----------------------------------------------------------- > > (Updated Oct. 19, 2017, 3:49 a.m.) > > > Review request for mesos, James DeFelice, Jie Yu, Joseph Wu, and Jan Schlicht. > > > Bugs: MESOS-8032 > https://issues.apache.org/jira/browse/MESOS-8032 > > > Repository: mesos > > > Description > ------- > > During initialization, the storage local resource provider first tries > to recover its ID and CSI socket dir of the last session through reading > the actual path linked by > `work_dir/resource_providers/<type>/<name>/latest`, then subscribe to > the agent's resource provider manager. If this is a new subscription, > it will set up a new dir for CSI plugins to put their socket files. > > Once the CSI socket dir is set up, the storage local resource provider > can connect to a CSI plugin through the following procedure: > 1. It first tries to connect to the existing socket file if there is > one. On success, just return the connection. > 2. On failure, kill the running plugin and remove the socket file. > 3. Launch a container to run the plugin, and w for the socket file to > appear, then connect to the socket file. > > > Diffs > ----- > > src/resource_provider/storage/provider.cpp > 46224997430ac0c568904d80014166a6f059907f > > > Diff: https://reviews.apache.org/r/63021/diff/5/ > > > Testing > ------- > > > Thanks, > > Chun-Hung Hsiao > >