> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 237 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line238>
> >
> >     I'd just use lambda here:
> >     ```
> >     driver->send(evolve(call))
> >       .onFailed(defer(self(), [=](const string& failure) {
> >         LOG(ERROR) << "Failed to subscribe resource provider with type "
> >                    << "'" info.type() << "' and name "
> >                    << "'" << info.name() << "': " << failure;
> >       }))
> >       ...
> >     ```

Is it perferred to have `onFailed(...).onDiscarded(...)` with similar error 
logging code in both, or use `onAny(... LOG(ERROR) << ... << (future.isFailed() 
? future.failure() : "future discarded"; ...)`?


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 291 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line293>
> >
> >     Let's introduce a paths.hpp helper for SLRP:
> >     ```
> >     src/resource_provider/storage/paths.hpp
> >     ```
> >     
> >     This can be `getCSIEndpointDirSymlinkPath`

Do you want to expose this in `src/slave/paths.hpp`? I didn't create a helper 
because I think it's an SLRP-internal thing.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 296 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line298>
> >
> >     What if someone choose a very long name for the provider? I would 
> > suggest we just use `os::mkdtemp` here. In fact, `os::temp` depends on 
> > TMPDIR env var. For instance, on my mac:
> >     
> >     ```
> >     Jies-MacBook-Pro:tmp jie$ echo $TMPDIR
> >     /var/folders/cs/xp4tynrs69v0bbx55l4k5lx80000gn/T/
> >     ```
> >     
> >     It can be very long too. But let's punt that for now as it should be 
> > rare.

I'm aware of `TMPDIR`. I've seen that both `os::temp` and 
`path::join(os::PATH_SEPARATOR, "tmp")` have been used in the codepath. Since 
we have the concern about the length, let's just use the latter, as it is safe 
to assume the standard `/tmp` dir exists in Un*x systems.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 337 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line339>
> >
> >     You need to create metadata dir as well.
> >     
> >     Although I don't quite like the fact that it'll be a CHECK failure if 
> > the creation fails. Ideally, we check the return value and call `fatal` if 
> > the creation failed.
> >     
> >     Also, depends on the order you create both directories, the recovery 
> > logic in `initialize` might be different. In the recovery logic, it's 
> > likely one directory does not have a symlink (depends on the order here).
> >     
> >     One way is to only treat this RP as an existing RP if both directories 
> > exists in `initialize`.

If the agent reregistered with a different slave ID, then only the top-level RP 
work dir will exist. I'd treat the lifecycle of the RP and the plugins 
differently: RP is an existing one if the meta dir exists, and plugin exists if 
the csi endpoint dir exists. The recovery logic now does not involve anything 
about the RP itself yet.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 341-363 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line343>
> >
> >     Any reason do not move the logic in `initialize` related to csi 
> > endpoint dir here? It'll be much more readable if we group all logics 
> > related to CSIEndpointDir initialization here:
> >     
> >     Basically, you want to "ensure" that the directory and symlink is 
> > properly setup:
> >     
> >     1) If symlink is there, and the linked dir is there, nothing needs to 
> > be one
> >     2) If symlink is there, and the linked dir is not there, do the same as 
> > 3)
> >     3) If symlink does not exist, create the tmp dir and link it

The only reason is to fail fast. But yes I agree that it will be more readible 
if we put all related logic here. Will do.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 385-386 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line387>
> >
> >     My thinking is that in `StorageLocalResourceProviderProcess`, we 
> > maintain two `Future<csi::Client>`, one for controller service and one for 
> > node service.
> >     
> >     ```
> >     Future<csi::Client> controllerService;
> >     Future<csi::Client> nodeService;
> >     ```
> >     
> >     The above two will become ready once CSI plugin containers are properly 
> > launched.
> >     
> >     Whenever the SLRP wants to use the csi client to talk to the plugin, 
> > it'll do something like this:
> >     
> >     ```
> >     controllerSerivce.then(defer(self(), [](const csi::Client& client) {
> >       ...
> >     }));
> >     ```
> >     
> >     We probably should make sure `controllerService` is either pending or 
> > ready, never failed or discarded.
> >     
> >     Now, there will be a loop in the background continuesly trying to 
> > launch the CSI container, and wait for the container, and relaunch it if it 
> > dies. This is kind of related to the `ContainerDaemon` idea in joseph's 
> > standalone container design:
> >     
> >     ```
> >     Try<ContainerDaemon> ContainerDaemon::create(
> >         const URL& endpoint,
> >         const ContainerID& id,
> >         const CommandInfo& command,
> >         const Option<ContainerInfo>& container,
> >         const vector<lambda::function<void()>>& preStartHooks,
> >         const vector<lambda::function<void()>>& postStopHooks);
> >     
> >     void ContainerDaemon::terminate();
> >     Future<Nothing> ContainerDaemon::wait();
> >     ```
> >     
> >     In SLRP, you still need to wait for the socket file to appear, and set 
> > the Future mentioned above. The `postStopHooks` above allow you to set the 
> > Future to pending if container is not running.

In my next patch, I didn't wait for the future but just check 
`controllerService.isReady()`, but will change it accordingly. We have to do 
`controllerService.then(defer(self(), [](csi::Client client)` though, since I 
intentionally did not declare those call functions as `const`. But 
`csi::Client` is designed to just use the underlying copyable `process::grpc` 
wrappers so should be fine.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 629-630 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line631>
> >
> >     This is a bit hard to read, how about:
> >     ```
> >     if (plugin.name() == info.storage().controller_plugin()) {
> >       hasControllerPlugin = true;
> >     }
> >     if (plugin.name() == info.storage().node_plugin()) {
> >       hasNodePlugin = true;
> >     }
> >     ```

This is just for saving a little extra string comparisons once we have found 
the plugins, but this may not worth sacrificing the reabilibily. Will channge 
it.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63021/#review188726
-----------------------------------------------------------


On Oct. 19, 2017, 3:49 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63021/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 3:49 a.m.)
> 
> 
> Review request for mesos, James DeFelice, Jie Yu, Joseph Wu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8032
>     https://issues.apache.org/jira/browse/MESOS-8032
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> During initialization, the storage local resource provider first tries
> to recover its ID and CSI socket dir of the last session through reading
> the actual path linked by
> `work_dir/resource_providers/<type>/<name>/latest`, then subscribe to
> the agent's resource provider manager. If this is a new subscription,
> it will set up a new dir for CSI plugins to put their socket files.
> 
> Once the CSI socket dir is set up, the storage local resource provider
> can connect to a CSI plugin through the following procedure:
> 1. It first tries to connect to the existing socket file if there is
>    one. On success, just return the connection.
> 2. On failure, kill the running plugin and remove the socket file.
> 3. Launch a container to run the plugin, and w for the socket file to
>    appear, then connect to the socket file.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/storage/provider.cpp 
> 46224997430ac0c568904d80014166a6f059907f 
> 
> 
> Diff: https://reviews.apache.org/r/63021/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>

Reply via email to