> On March 4, 2016, 11:30 p.m., Avinash sridharan wrote:
> > src/slave/containerizer/mesos/isolators/network/cni.cpp, line 134
> > <https://reviews.apache.org/r/44269/diff/4/?file=1280944#file1280944line134>
> >
> >     I don't understand this comment. We just made sure the plugin does not 
> > exist? So what does the comment imply "it can
> >             // still be valid as long as operator puts the CNI plugin binary
> >             // that it uses under '--network_cni_plugins_dir'." ?
> >             
> >     I think at this point we should return an error. If can't find an 
> > executable for a named network, the behavior will become undefined. We 
> > should bail at this point.
> 
> Qian Zhang wrote:
>     My point is, if we can not find a plugin for a named network during 
> initilization, log a warning message to let operator know this issue, and 
> afterward operator can put the plugin in the plugin directory without 
> restarting agent, then the named network can still work.
> 
> Avinash sridharan wrote:
>     Lets not rely on the operator heeding WARNING messages and fixing the 
> problem. My concern is that this is a `FATAL` error since before the operator 
> can rectify the error if containers are launched the behavior becomes 
> undefined.
> 
> Qian Zhang wrote:
>     Agree, let's return an error :-)
> 
> Qian Zhang wrote:
>     After more thinking, I think in this case, it makes more sense to log a 
> warning message and ignore the network config file rather than bail at this 
> point, because there can be other valid network config files. If in the end 
> there is no any valid network config files, we should definitely bail.
> 
> Avinash sridharan wrote:
>     I think we should not allow any errors in the configs/plugins passed by 
> the operator . Reason being that frameworks are going to learn about networks 
> out-of-band, and if there are config/plugin errors we will have to throw 
> errors during task launch. Why should we allow the system to proceed knowing 
> that this is going to lead to erroneous situation? The only way the operator 
> can fix this error is by restarting the slave (and fixing the config), so 
> might as well bail out sooner rather than later.
> 
> Qian Zhang wrote:
>     Can you please let me know how this can lead to erroneous situation? If a 
> network config file is invalid for whatever reason, "network/isolator" will 
> NOT load it and just ignore it, so how can framework launches a task to join 
> an invalid network which is not loaded by the isolator? I do not think 
> framework user has such knowledge, or you think framework user will know all 
> the network config files (valid or invalid) under "--network_cni_config_dir" 
> in some way?
> 
> Avinash sridharan wrote:
>     Frameworks would know only the network name. Its the responsibility of 
> the operator to install the right config for the given `name`. Hence the 
> erroneous case. The fact that config was not loaded for valid network name 
> cause inconsistency between the frameworks view of what is available and the 
> isolators view of what is configurable.
> 
> Qian Zhang wrote:
>     What about framework specifies a wrong network name by mistake? Even in 
> `create()` method we ensure every network config file is valid and agent is 
> started successfuly, there is still a chance for framework to specify a wrong 
> network name (which is actually out of our control), right? So my point is, 
> we have to handle this erroneous in launching task case anyway.
>     
>     And I do not quite understand what is `the frameworks view of what is 
> available`, can you please elaborate how framework can know what is 
> available? My thinking is, in future we may expose the available CNI networks 
> to framework via an HTTP endpoint or even via an offer as shared resources, 
> but the networks exposed in this way must be the ones which are valid, the 
> invalids will not be loaded by isolator, hense will not be exposed.
> 
> Avinash sridharan wrote:
>     Let me try explaining my perspective differently. If the operator was to 
> provide erroneous parameter to the isolator, the isolator would  bail out. My 
> perspective is that an erroneous network config is an erroneous parameter 
> being provided to the isolator and we should treat it so. I believe this 
> would simplify any error handling that needs to be done when frameworks 
> launch tasks on misconfigured networks.

Can you please elaborate how this would simplify any error handling that needs 
to be done when frameworks launch tasks on misconfigured networks? How can 
framework launches task on a misconfigured network which will actually not be 
loaded by isolator?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44269/#review122077
-----------------------------------------------------------


On March 7, 2016, 12:03 a.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44269/
> -----------------------------------------------------------
> 
> (Updated March 7, 2016, 12:03 a.m.)
> 
> 
> Review request for mesos, Avinash sridharan, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-4759
>     https://issues.apache.org/jira/browse/MESOS-4759
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added the framework of 'network/cni' isolator.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt 8f57a5701073bf1eaaa223383e928cf5db8f8ae4 
>   src/Makefile.am a41e95ddeb838fdebf4ced953c4a29181916e261 
>   src/slave/containerizer/mesos/isolators/network/cni.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/network/cni.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/44269/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>

Reply via email to