> On May 31, 2017, 12:09 a.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 6018-6025 (original), 6018-6025 (patched)
> > <https://reviews.apache.org/r/59320/diff/2/?file=1721802#file1721802line6018>
> >
> >     Hm.. what does slaveWasRemoved mean? Is it equivalen to the agent being 
> > marked as unreachable? Or is being removed something beyond being marked as 
> > unreachable (what it sounds like)? I had a hard time grasping the code 
> > below when it was referring to "unreachable" but the variable is called 
> > "removed".

`slaveWasRemoved` means that _this_ instance of the master was the one that 
removed the agent (because `removed` is only a cache, there might be false 
negatives). Similarly, `removed` contains agents that _this_ instance of the 
master has marked unreachable. No objection to renaming both variables, as long 
as we keep them in sync, although I can't think of a ton of better names.


> On May 31, 2017, 12:09 a.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 6027-6043 (original), 6027-6043 (patched)
> > <https://reviews.apache.org/r/59320/diff/2/?file=1721802#file1721802line6027>
> >
> >     Hm.. it seems like we could maybe make this code easier to read. 
> > Currently, the loop for dealing with frameworks is down below and the loop 
> > for dealing with tasks is up here. However, it seems to me that these are 
> > highly related, and it would be easier to reason about if it were 
> > consolidated into a loop over the framework and then a loop over the tasks 
> > belonging to that framework. That would also eliminate the need for 
> > `partitionAwareFrameworks`.
> >     
> >     E.g.
> >     
> >     ```
> >       foreach (const FrameworkInfo& framework, frameworks):
> >         if completed:
> >           shut down the framework, drop the tasks
> >           continue
> >           
> >         if agent was unreachable/removed and framework is not partition 
> > aware:
> >           shut down the framework, drop the tasks
> >           continue
> >         
> >         for each task in the framework:
> >           recoveredTasks.push_back(task)
> >           remove from unreachable tasks
> >     ```

Hmmm. I can see how this would makes things clearer, but it might result in 
some subtle behavioral differences. Right now, the master notifies the agent 
that it has re-registered, then sends it `ShutdownFrameworkMessage`s in some 
cases. Adopting the rewrite you suggested would mean shutting down frameworks 
while the agent thinks it hasn't re-registered yet. Not wrong but confusing.

Since this code is likely to change (see the "RFC: Partition Awareness" thread 
on `dev`), I'm inclined to leave this as-is in the short term.


- Neil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59320/#review176399
-----------------------------------------------------------


On May 31, 2017, 8:35 p.m., Neil Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59320/
> -----------------------------------------------------------
> 
> (Updated May 31, 2017, 8:35 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Add an explanatory comment and a test case for a particular case in the
> master's logic for handling agent re-registration.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp a5376f9fc3c3c6c72b22cfd36ec56a1de0d03a3a 
>   src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 
> 
> 
> Diff: https://reviews.apache.org/r/59320/diff/3/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Checked that if you add `CHECK_NOTNULL(framework)` to the master code (circa 
> the newly added comment), the incorrect `CHECK` is triggered by this test 
> case (but not by any existing test cases).
> 
> 
> Thanks,
> 
> Neil Conway
> 
>

Reply via email to