Re: Review Request 32543: Documented problem and solution with slave recovery and systemd settings.

Benjamin Hindman Sun, 05 Jul 2015 21:24:46 -0700


> On March 27, 2015, 9:17 a.m., Adam B wrote:
> > docs/slave-recovery.md, line 71
> > <https://reviews.apache.org/r/32543/diff/2/?file=907123#file907123line71>
> >
> >     (If the slave does not come back, each executorDriver shuts itself down 
> > after $MESOS_RECOVERY_TIMEOUT.)
> >     
> >     Important question: If an executor is killed, does this systemd mode 
> > affect whether its tasks would get killed?
> 
> Alexander Rukletsov wrote:
>     Adam, could you please explain what use case do you have in mind and how 
> it is related to slave recovery?
> 
> Adam B wrote:
>     It's not related to slave recovery necessarily, but to how this KillMode 
> impacts other processes like a custom executor. Some frameworks (like HDFS) 
> have a custom executor that launches task(s) as a separate 
> process/subprocess. If the executor is killed (kill -9, or shutdown by the 
> framework/admin), will this change in KillMode affect whether the executors 
> task subprocesses also get killed?
>     I'm mostly worried about this KillMode change suddenly leaving stranded 
> task processes if/when executors are killed.
> 
> Alexander Rukletsov wrote:
>     I thought that's exactly why we have containerizers: clean-up all 
> stranded processes.
> 
> Adam B wrote:
>     Fair enough, when the slave is running. But what if the executor is 
> killed while the slave (thus also the containerizer) is shutdown/recovering?
>     I'm not claiming there's anything necessarily wrong with using this 
> KillMode. I just ask the question to make sure we don't recommend a setting 
> that may fix one issue but cause others.
> 
> Alexander Rukletsov wrote:
>     I see your point. I would be surprised if this setting will cause the 
> issue, but let's check: better safe than sorry.

The KillMode is only relevant when stopping the "root" process of a systemd 
unit (e.g., via 'systemctl stop'). When another process within the same cgroup 
dies systemd doesn't do anything about it, the normal Linux/init reaping takes 
place. Thus, the suggestion documented in this review is correct. HOWEVER, it 
only applies when using 'posix' isolation since when using 'cgroups' isolation 
the processes are in another cgroup. I updated the documentation accordingly 
before committing.

- Benjamin

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32543/#review78025
-----------------------------------------------------------

On March 27, 2015, 2:09 p.m., Joerg Schad wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32543/
> -----------------------------------------------------------
> 
> (Updated March 27, 2015, 2:09 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Brenden Matthews.
> 
> 
> Bugs: Mesos-2555
>     https://issues.apache.org/jira/browse/Mesos-2555
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Documented the problem and solution encountered in MESOS-2419.
> 
> 
> Diffs
> -----
> 
>   docs/slave-recovery.md 4bb4a71c6945bd70121743a1e9209a26906773c1 
>   docs/upgrades.md 2a15694607c079ad95ef6cf7f1490872ab9a5976 
> 
> Diff: https://reviews.apache.org/r/32543/diff/
> 
> 
> Testing
> -------
> 
> markdown check
> 
> 
> Thanks,
> 
> Joerg Schad
> 
>

Re: Review Request 32543: Documented problem and solution with slave recovery and systemd settings.

Reply via email to