> On Dec. 3, 2016, 2:12 a.m., Vinod Kone wrote:
> > src/master/master.cpp, line 5602
> > <https://reviews.apache.org/r/53897/diff/6/?file=1574956#file1574956line5602>
> >
> >     shouldn't we send a ShutdownFrameworkMessage in this case?

This is implemented in a later review in this chain, 
https://reviews.apache.org/r/54232/


> On Dec. 3, 2016, 2:12 a.m., Vinod Kone wrote:
> > src/master/master.cpp, lines 5578-5580
> > <https://reviews.apache.org/r/53897/diff/6/?file=1574956#file1574956line5578>
> >
> >     yea, i think this should be
> >     
> >     ```
> >     if (framework != nullptr && framework->connected()) {
> >     
> >     }
> >     ```

I did this in a separate review, https://reviews.apache.org/r/54380/


> On Dec. 3, 2016, 2:12 a.m., Vinod Kone wrote:
> > src/master/master.cpp, lines 7132-7137
> > <https://reviews.apache.org/r/53897/diff/6/?file=1574956#file1574956line7132>
> >
> >     hmm. it's a bit weird that activating a framework also updates it. this 
> > should be done by the callers before calling this method.

Hmmm, why is this weird? To me it seems reasonable that when we activate a 
recovered framework, we're given the `FrameworkInfo` that was presented by the 
framework on re-registration; we use that `FrameworkInfo` to update whatever 
`FrameworkInfo` we previously had for the recovered framework.

If we move this outside of `activateRecoveredFramework`, we'd need identical 
code in both of its call-sites -- that's not too bad, but I'm not sure why it 
is an improvement.


- Neil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53897/#review157867
-----------------------------------------------------------


On Dec. 4, 2016, 2:06 a.m., Neil Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53897/
> -----------------------------------------------------------
> 
> (Updated Dec. 4, 2016, 2:06 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6419
>     https://issues.apache.org/jira/browse/MESOS-6419
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> After master failover, the new master doesn't know which frameworks were
> registered with the previous master (because this information is not
> currently stored in the registry). In the period after the master fails
> over but before the framework scheduler has re-registered, the master
> learns about the frameworks in the cluster when agents re-register (an
> agent reports the FrameworkInfo for all of the frameworks it is running
> when it re-registers).
> 
> Such frameworks were previously represented separately from the normal
> list of frameworks in the master: the master kept a collection of
> `FrameworkInfo` for these "recovered" frameworks.
> 
> This commit removes this separate collection of "recovered" frameworks.
> Instead, the master now treats recovered frameworks very similarly to
> frameworks that are registered but currently disconnected. For example,
> recovered frameworks will now have a `Framework` object which tracks the
> tasks/executors running under that framework; recovered frameworks will
> be reported via the normal "frameworks" key when querying HTTP
> endpoints. Similarly, "teardown" operations on recovered frameworks will
> now work correctly (MESOS-6419).
> 
> This means there is no longer a concept of "orphan tasks" [1]: if the
> master knows about a task, the task will be running under a framework
> (albeit the framework might be recovered or disconnected). A new
> "recovered" key has been added to various HTTP endpoints/APIs to
> determine if a framework hasn't yet re-registered after master failover.
> 
> [1] The exception here is if the cluster contains Mesos agents older
> than 1.0, because old Mesos agents don't report `FrameworkInfo`s when
> they re-register.
> 
> 
> Diffs
> -----
> 
>   include/mesos/master/master.proto 966105cf14bf92ad15a38653cd5c7f3322821289 
>   include/mesos/v1/master/master.proto 
> 0e5a8674a76a46c6f2cc17e11cd735f756d94fd1 
>   src/master/http.cpp ac560d1fdd219d0de0c5d987a32a7112e149602f 
>   src/master/master.hpp b444b2352360fb4f7179acd97dffc0cd81cc7afa 
>   src/master/master.cpp b0670d993348d189fafff0f83f9da0c5b18d1c51 
>   src/tests/api_tests.cpp afae6a7e0809174f48f280f170fad9315e80a906 
>   src/tests/fault_tolerance_tests.cpp 
> 59f68f65fac9fca4a6941793b712bfe7bb30c880 
>   src/tests/master_allocator_tests.cpp 
> bb94e38d5bb472801366c172cfc036f2eecdcbcb 
>   src/tests/master_authorization_tests.cpp 
> 06c6e47250003cd467bf37e1daa8bd636ef8fef5 
>   src/tests/master_tests.cpp dfedbbdf78e8054813872e9eeebccc7504097751 
>   src/tests/partition_tests.cpp 5a0d4bd2de6a5aa0e9fdf0d34cd10d16fd4e34a1 
>   src/tests/teardown_tests.cpp 125e1808e51da53fdf1bb1babc956ff062cbb102 
> 
> Diff: https://reviews.apache.org/r/53897/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Neil Conway
> 
>

Reply via email to