> On Dec. 14, 2018, 9:39 p.m., Chun-Hung Hsiao wrote:
> > src/common/protobuf_utils.cpp
> > Lines 470 (patched)
> > <https://reviews.apache.org/r/69569/diff/1/?file=2114077#file2114077line470>
> >
> >     Since we set the agent ID and RP ID here in SLRP, how about doing the 
> > following changes:
> >     
> >     1) Change 
> > https://github.com/apache/mesos/blob/b6fc9706c03c93239128f4a27051997c9647d8ff/src/slave/slave.cpp#L7859-L7869
> >  to:
> >     ```
> >     // Inject the agent ID into the status update if the resource
> >     // provider does not provide it (because the resource provider
> >     // may not know it).
> >     ...
> >     if (!update.has_slave_id()) {
> >       update.mutable_slave_id()->CopyFrom(...);
> >     }
> >     if (update.has_latest_status() &&
> >         !update.latest_status().has_slave_id()) {
> >       update.mutable_latest_status()->mutable_slave_id()->CopyFrom(...);
> >     }
> >     ```
> >     
> >     2) Change 
> > https://github.com/apache/mesos/blob/b6fc9706c03c93239128f4a27051997c9647d8ff/src/resource_provider/manager.cpp#L886-L893
> >  to:
> >     ```
> >     CHECK_EQ(update.status().resource_provider_id(),
> >              resourceProvider->info().id());
> >     
> >     ...
> >     if (update.has_latest_status()) {
> >       CHECK_EQ(update.latest_status().resource_provider_id(),
> >                resourceProvider->info().id());
> >     
> >       body.update.mutable_latest_status()->CopyFrom(...);
> >     }
> >     ```
> >     Then a followup work would be introduce another event to surface these 
> > errors and shutdown the RP instead of crash the RP manager. Could you 
> > create a ticket for this?
> >     
> >     
> >     Or alternatively, make the agent ID injection code into CHECKs since 
> > `LocalResourceProviderDaemon` provides the agent ID when creating LRPs. You 
> > might argue that it leaks informations to RP, but my counter argument would 
> > be that an LRP needs to learn about the agent lifecycle so it can know when 
> > to subscribe with no ID and when to resubscribe with a new ID, so it either 
> > needs to learn about the current agent ID, or we need provide feedback 
> > through the HTTP response of the `SUBSCRIBE` call.
> >     
> >     We could pick either approach for now.
> 
> Benjamin Bannier wrote:
>     Good suggestions.
>     
>     1) Not sure what making the modifications conditional would gain us. It 
> would seem to increase the complexity and number of possible combinations 
> while just setting unconditionally shouldn't introduce additional cases. 
> Suggest to always set.
>     
>     2) Since it isn't directly related, I added these changes in 
> https://reviews.apache.org/r/69572/. I filed 
> https://issues.apache.org/jira/browse/MESOS-9482 for making the RP manage 
> code safer.

Then how about checking the `slave_id` and print a warning if it does not match 
the current agent ID?


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69569/#review211342
-----------------------------------------------------------


On Dec. 17, 2018, 9:57 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69569/
> -----------------------------------------------------------
> 
> (Updated Dec. 17, 2018, 9:57 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9479
>     https://issues.apache.org/jira/browse/MESOS-9479
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The helper function to set up the message was set up to receive
> arguments, but we missed implementing the code making use of the values.
> This went undiscovered as previously only used a mock resource provider.
> 
> 
> Diffs
> -----
> 
>   src/common/protobuf_utils.cpp 583f0ab0dd874602bbd20731f1f32116a425ebc4 
>   src/tests/master_tests.cpp 50b905b5ab508378eddf070eabfe744b68fd4cea 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 7887bd5ec300be3cc093886525d71b3e09bbf085 
> 
> 
> Diff: https://reviews.apache.org/r/69569/diff/2/
> 
> 
> Testing
> -------
> 
> * `make check`
> * ran added test in repeatition, both with and without additional stress on 
> the system
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>

Reply via email to