> On Dec. 14, 2018, 10:39 p.m., Chun-Hung Hsiao wrote:
> > src/common/protobuf_utils.cpp
> > Lines 470 (patched)
> > <https://reviews.apache.org/r/69569/diff/1/?file=2114077#file2114077line470>
> >
> >     Since we set the agent ID and RP ID here in SLRP, how about doing the 
> > following changes:
> >     
> >     1) Change 
> > https://github.com/apache/mesos/blob/b6fc9706c03c93239128f4a27051997c9647d8ff/src/slave/slave.cpp#L7859-L7869
> >  to:
> >     ```
> >     // Inject the agent ID into the status update if the resource
> >     // provider does not provide it (because the resource provider
> >     // may not know it).
> >     ...
> >     if (!update.has_slave_id()) {
> >       update.mutable_slave_id()->CopyFrom(...);
> >     }
> >     if (update.has_latest_status() &&
> >         !update.latest_status().has_slave_id()) {
> >       update.mutable_latest_status()->mutable_slave_id()->CopyFrom(...);
> >     }
> >     ```
> >     
> >     2) Change 
> > https://github.com/apache/mesos/blob/b6fc9706c03c93239128f4a27051997c9647d8ff/src/resource_provider/manager.cpp#L886-L893
> >  to:
> >     ```
> >     CHECK_EQ(update.status().resource_provider_id(),
> >              resourceProvider->info().id());
> >     
> >     ...
> >     if (update.has_latest_status()) {
> >       CHECK_EQ(update.latest_status().resource_provider_id(),
> >                resourceProvider->info().id());
> >     
> >       body.update.mutable_latest_status()->CopyFrom(...);
> >     }
> >     ```
> >     Then a followup work would be introduce another event to surface these 
> > errors and shutdown the RP instead of crash the RP manager. Could you 
> > create a ticket for this?
> >     
> >     
> >     Or alternatively, make the agent ID injection code into CHECKs since 
> > `LocalResourceProviderDaemon` provides the agent ID when creating LRPs. You 
> > might argue that it leaks informations to RP, but my counter argument would 
> > be that an LRP needs to learn about the agent lifecycle so it can know when 
> > to subscribe with no ID and when to resubscribe with a new ID, so it either 
> > needs to learn about the current agent ID, or we need provide feedback 
> > through the HTTP response of the `SUBSCRIBE` call.
> >     
> >     We could pick either approach for now.
> 
> Benjamin Bannier wrote:
>     Good suggestions.
>     
>     1) Not sure what making the modifications conditional would gain us. It 
> would seem to increase the complexity and number of possible combinations 
> while just setting unconditionally shouldn't introduce additional cases. 
> Suggest to always set.
>     
>     2) Since it isn't directly related, I added these changes in 
> https://reviews.apache.org/r/69572/. I filed 
> https://issues.apache.org/jira/browse/MESOS-9482 for making the RP manage 
> code safer.
> 
> Chun-Hung Hsiao wrote:
>     Then how about checking the `slave_id` and print a warning if it does not 
> match the current agent ID?

Hmm, we would overwrite the value regardless, and the RP isn't even able to in 
general know its value. The only reason it is present is that we reuse the same 
message to communicate information between all sorts of components (RP-agent, 
agent-master, master-framework). Ideally we'd have a message here which 
wouldn't even contain the agent ID.

I think just ignoring and overwriting its value makes more sense. Maybe we can 
think of some test to enforce the invariant you have in mind?


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69569/#review211342
-----------------------------------------------------------


On Dec. 17, 2018, 10:57 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69569/
> -----------------------------------------------------------
> 
> (Updated Dec. 17, 2018, 10:57 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9479
>     https://issues.apache.org/jira/browse/MESOS-9479
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The helper function to set up the message was set up to receive
> arguments, but we missed implementing the code making use of the values.
> This went undiscovered as previously only used a mock resource provider.
> 
> 
> Diffs
> -----
> 
>   src/common/protobuf_utils.cpp 583f0ab0dd874602bbd20731f1f32116a425ebc4 
>   src/tests/master_tests.cpp 50b905b5ab508378eddf070eabfe744b68fd4cea 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 7887bd5ec300be3cc093886525d71b3e09bbf085 
> 
> 
> Diff: https://reviews.apache.org/r/69569/diff/2/
> 
> 
> Testing
> -------
> 
> * `make check`
> * ran added test in repeatition, both with and without additional stress on 
> the system
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>

Reply via email to