-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63021/#review188726
-----------------------------------------------------------




src/resource_provider/storage/provider.cpp
Line 57 (original), 84 (patched)
<https://reviews.apache.org/r/63021/#comment265794>

    I would suggest to use a shorter name for this: `mesos-slrp-`



src/resource_provider/storage/provider.cpp
Lines 106-110 (patched)
<https://reviews.apache.org/r/63021/#comment265800>

    This can be moved to the paths.hpp as I mentioned above:
    ```
    string getCSIEndpointPath(
        const string& rootDir,
        const string& pluginName);
    ```



src/resource_provider/storage/provider.cpp
Lines 152-156 (patched)
<https://reviews.apache.org/r/63021/#comment265754>

    We usually inline this for many reasons:
    1) no need to merge string messages if there are multiple: e.g., `LOG(INFO) 
<< message1 << message2 << ...`
    2) For readability, jumping around to find the definition of this helper 
method is not good.



src/resource_provider/storage/provider.cpp
Lines 205 (patched)
<https://reviews.apache.org/r/63021/#comment265755>

    Do you need this? Let's remove this for now.



src/resource_provider/storage/provider.cpp
Lines 236 (patched)
<https://reviews.apache.org/r/63021/#comment265757>

    YOu should follow `src/launcher/default_executor.cpp` to see how to do 
reliable registration (`doReliableRegistration`).
    
    So basically, you need to keep a state in this process, and set it 
SUBSCRIBED once you received Subscribed message. You need to retry to make it 
reliable.



src/resource_provider/storage/provider.cpp
Lines 237 (patched)
<https://reviews.apache.org/r/63021/#comment265756>

    I'd just use lambda here:
    ```
    driver->send(evolve(call))
      .onFailed(defer(self(), [=](const string& failure) {
        LOG(ERROR) << "Failed to subscribe resource provider with type "
                   << "'" info.type() << "' and name "
                   << "'" << info.name() << "': " << failure;
      }))
      ...
    ```



src/resource_provider/storage/provider.cpp
Lines 282 (patched)
<https://reviews.apache.org/r/63021/#comment265774>

    as we discussed, `StorageLocalResourceProvider::create` will pass in 
SlaveID. So let's keep track `slaveId` in `StorageLocalResourceProviderProcess` 
as a member field.
    
    Idealy, here, you need to check both the metadata dir (metaDir) and the 
root dir (rootDir), and make sure the symlink points to the same provider id.
    
    In fact, i would `s/latestDir/rootDir/`



src/resource_provider/storage/provider.cpp
Lines 285-288 (patched)
<https://reviews.apache.org/r/63021/#comment265762>

    What if `_latestDir` is None()? You just treat this as a new SLRP? I think 
that's fine, but let's add some comments about that.



src/resource_provider/storage/provider.cpp
Lines 291 (patched)
<https://reviews.apache.org/r/63021/#comment265786>

    Let's introduce a paths.hpp helper for SLRP:
    ```
    src/resource_provider/storage/paths.hpp
    ```
    
    This can be `getCSIEndpointDirSymlinkPath`



src/resource_provider/storage/provider.cpp
Lines 293 (patched)
<https://reviews.apache.org/r/63021/#comment265787>

    I would call it `csiEndpointDir`



src/resource_provider/storage/provider.cpp
Lines 296 (patched)
<https://reviews.apache.org/r/63021/#comment265795>

    What if someone choose a very long name for the provider? I would suggest 
we just use `os::mkdtemp` here. In fact, `os::temp` depends on TMPDIR env var. 
For instance, on my mac:
    
    ```
    Jies-MacBook-Pro:tmp jie$ echo $TMPDIR
    /var/folders/cs/xp4tynrs69v0bbx55l4k5lx80000gn/T/
    ```
    
    It can be very long too. But let's punt that for now as it should be rare.



src/resource_provider/storage/provider.cpp
Lines 298-301 (patched)
<https://reviews.apache.org/r/63021/#comment265781>

    I'd suggest we introduce a helper method in this class to handle fatal 
scenarios:
    ```
    void StorageLocalResourceProviderProcess::fatal(
        const string& failure) 
    {
      LOG(ERROR) << failure;
      terminate(self());
    }
    ```
    
    In the future, we might want to expose a method allowing caller to wait for 
the exit of this process, and this method can be extended to set a promise that 
the caller expects.



src/resource_provider/storage/provider.cpp
Lines 337 (patched)
<https://reviews.apache.org/r/63021/#comment265804>

    You need to create metadata dir as well.
    
    Although I don't quite like the fact that it'll be a CHECK failure if the 
creation fails. Ideally, we check the return value and call `fatal` if the 
creation failed.
    
    Also, depends on the order you create both directories, the recovery logic 
in `initialize` might be different. In the recovery logic, it's likely one 
directory does not have a symlink (depends on the order here).
    
    One way is to only treat this RP as an existing RP if both directories 
exists in `initialize`.



src/resource_provider/storage/provider.cpp
Lines 341-343 (patched)
<https://reviews.apache.org/r/63021/#comment265807>

    You can use the SLRP paths helper here.



src/resource_provider/storage/provider.cpp
Lines 341-363 (patched)
<https://reviews.apache.org/r/63021/#comment265808>

    Any reason do not move the logic in `initialize` related to csi endpoint 
dir here? It'll be much more readable if we group all logics related to 
CSIEndpointDir initialization here:
    
    Basically, you want to "ensure" that the directory and symlink is properly 
setup:
    
    1) If symlink is there, and the linked dir is there, nothing needs to be one
    2) If symlink is there, and the linked dir is not there, do the same as 3)
    3) If symlink does not exist, create the tmp dir and link it



src/resource_provider/storage/provider.cpp
Lines 341-363 (patched)
<https://reviews.apache.org/r/63021/#comment265809>

    Any reason do not move the logic in `initialize` related to csi endpoint 
dir here? It'll be much more readable if we group all logics related to 
CSIEndpointDir initialization here:
    
    Basically, you want to "ensure" that the directory and symlink is properly 
setup:
    
    1) If symlink is there, and the linked dir is there, nothing needs to be one
    2) If symlink is there, and the linked dir is not there, do the same as 3)
    3) If symlink does not exist, create the tmp dir and link it



src/resource_provider/storage/provider.cpp
Lines 341-363 (patched)
<https://reviews.apache.org/r/63021/#comment265810>

    Any reason do not move the logic in `initialize` related to csi endpoint 
dir here? It'll be much more readable if we group all logics related to 
CSIEndpointDir initialization here:
    
    Basically, you want to "ensure" that the directory and symlink is properly 
setup:
    
    1) If symlink is there, and the linked dir is there, nothing needs to be one
    2) If symlink is there, and the linked dir is not there, do the same as 3)
    3) If symlink does not exist, create the tmp dir and link it



src/resource_provider/storage/provider.cpp
Lines 385-386 (patched)
<https://reviews.apache.org/r/63021/#comment265818>

    My thinking is that in `StorageLocalResourceProviderProcess`, we maintain 
two `Future<csi::Client>`, one for controller service and one for node service.
    
    ```
    Future<csi::Client> controllerService;
    Future<csi::Client> nodeService;
    ```
    
    The above two will become ready once CSI plugin containers are properly 
launched.
    
    Whenever the SLRP wants to use the csi client to talk to the plugin, it'll 
do something like this:
    
    ```
    controllerSerivce.then(defer(self(), [](const csi::Client& client) {
      ...
    }));
    ```
    
    We probably should make sure `controllerService` is either pending or 
ready, never failed or discarded.
    
    Now, there will be a loop in the background continuesly trying to launch 
the CSI container, and wait for the container, and relaunch it if it dies. This 
is kind of related to the `ContainerDaemon` idea in joseph's standalone 
container design:
    
    ```
    Try<ContainerDaemon> ContainerDaemon::create(
        const URL& endpoint,
        const ContainerID& id,
        const CommandInfo& command,
        const Option<ContainerInfo>& container,
        const vector<lambda::function<void()>>& preStartHooks,
        const vector<lambda::function<void()>>& postStopHooks);
    
    void ContainerDaemon::terminate();
    Future<Nothing> ContainerDaemon::wait();
    ```
    
    In SLRP, you still need to wait for the socket file to appear, and set the 
Future mentioned above. The `postStopHooks` above allow you to set the Future 
to pending if container is not running.



src/resource_provider/storage/provider.cpp
Lines 427-433 (patched)
<https://reviews.apache.org/r/63021/#comment265819>

    What if the CSI container failed again? Who will be responsible to launch 
it again?
    
    I think the `ContainerDaemon` suggestion I gave above can solve the issue.
    
    ```
    src/slave/containerizer/daemon.hpp|cpp
    ```



src/resource_provider/storage/provider.cpp
Lines 629-630 (patched)
<https://reviews.apache.org/r/63021/#comment265778>

    This is a bit hard to read, how about:
    ```
    if (plugin.name() == info.storage().controller_plugin()) {
      hasControllerPlugin = true;
    }
    if (plugin.name() == info.storage().node_plugin()) {
      hasNodePlugin = true;
    }
    ```


- Jie Yu


On Oct. 19, 2017, 3:49 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63021/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 3:49 a.m.)
> 
> 
> Review request for mesos, James DeFelice, Jie Yu, Joseph Wu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8032
>     https://issues.apache.org/jira/browse/MESOS-8032
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> During initialization, the storage local resource provider first tries
> to recover its ID and CSI socket dir of the last session through reading
> the actual path linked by
> `work_dir/resource_providers/<type>/<name>/latest`, then subscribe to
> the agent's resource provider manager. If this is a new subscription,
> it will set up a new dir for CSI plugins to put their socket files.
> 
> Once the CSI socket dir is set up, the storage local resource provider
> can connect to a CSI plugin through the following procedure:
> 1. It first tries to connect to the existing socket file if there is
>    one. On success, just return the connection.
> 2. On failure, kill the running plugin and remove the socket file.
> 3. Launch a container to run the plugin, and w for the socket file to
>    appear, then connect to the socket file.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/storage/provider.cpp 
> 46224997430ac0c568904d80014166a6f059907f 
> 
> 
> Diff: https://reviews.apache.org/r/63021/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>

Reply via email to