> On Feb. 9, 2018, 6:36 p.m., Joseph Wu wrote:
> > src/launcher/default_executor.cpp
> > Lines 539-546 (original), 537 (patched)
> > <https://reviews.apache.org/r/65550/diff/1/?file=1954029#file1954029line539>
> >
> >     Since the guard above was removed, this CHECK could potentially be hit 
> > now.

Good catch, I'll remove the CHECK.


> On Feb. 9, 2018, 6:36 p.m., Joseph Wu wrote:
> > src/launcher/default_executor.cpp
> > Line 558 (original), 549 (patched)
> > <https://reviews.apache.org/r/65550/diff/1/?file=1954029#file1954029line558>
> >
> >     What happens when the executor is disconnected (as is now allowed) and 
> > attempts to launch some health checks?
> >     
> >     Any nested command checks would definitely fail.  But I suppose this is 
> > better than shutting down the executor.
> >     
> >     Seems like you need to either delay the creation of the health checks 
> > or pause them immediately after creation.

The checker process will treat connection errors as transient failures, and 
reschedule the check: 
https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/checks/checker_process.cpp#L531-L538


Transient failures are logged, but not treated as a health check failure:
https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/checks/checker_process.cpp#L353-L356


> On Feb. 9, 2018, 6:36 p.m., Joseph Wu wrote:
> > src/launcher/default_executor.cpp
> > Lines 626-631 (original), 617-622 (patched)
> > <https://reviews.apache.org/r/65550/diff/1/?file=1954029#file1954029line626>
> >
> >     This will be dropped if the executor isn't subscribed.  And as far as I 
> > can tell, this status update is not sent in any other location

If the executor isn't subscribed, the status updates will be added to the 
`unacknowledgedUpdates` map, and sent by `doReliableRegistration()` in the next 
`SUBSCRIBE` call: 


https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/launcher/default_executor.cpp#L309-L343


The executor doesn't wait for the updates to be ack'd before shutting down 
(https://github.com/apache/mesos/blob/a86ff8c36532f97b6eb6b44c6f871de24afbcc4d/src/launcher/default_executor.cpp#L1020-L1024),
 so there's a possibility that these updates will be dropped if the executor is 
not connected to the agent upon disconnection. This is tracked in 
https://issues.apache.org/jira/browse/MESOS-8537.


- Gaston


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/#review197217
-----------------------------------------------------------


On Feb. 7, 2018, 11 a.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65550/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2018, 11 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor would unnecessarily shutdown if, while launching a
> task group, it gets unsubscribed after having successfully launched the
> task group's containers.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65550/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>

Reply via email to