-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53403/
-----------------------------------------------------------

Review request for Aurora, Joshua Cohen, Santhosh Kumar Shanmugham, and Stephan 
Erb.


Bugs: AURORA-1808
    https://issues.apache.org/jira/browse/AURORA-1808


Repository: aurora


Description
-------

This is a WIP patch showing a possible fix to AURORA-1808.

# Problem

Processes can deamonize and escape the supervision of a coordinator. Using the 
Docker Containerizer or the Mesos Containerizer with pid isolation means that 
the processes will be come reparented to the `sh` process that launches the 
executor. For example:
````
root@aurora:/# ps xf
  PID TTY      STAT   TIME COMMAND
   48 ?        Ss     0:00 /bin/bash
   86 ?        R+     0:00  _ ps xf
    1 ?        Ss     0:00 /bin/sh -c ${MESOS_SANDBOX=.}/thermos_executor.pex 
--announcer-ensemble localhost:2181 --announcer-zookeeper-auth-config 
/home/vagrant/aurora/examples/va
    5 ?        Sl     0:02 python2.7 /mnt/mesos/sandbox/thermos_executor.pex 
--announcer-ensemble localhost:2181 --announcer-zookeeper-auth-config 
/home/vagrant/aurora/examples/vag
   23 ?        S      0:00  _ /usr/local/bin/python2.7 
/mnt/mesos/sandbox/thermos_runner.pex 
--task_id=www-data-devel-hello_docker_engine-0-bde5cdc7-8685-46fd-9078-4a86bd5be152
 --
   29 ?        Ss     0:00      _ /usr/local/bin/python2.7 
/mnt/mesos/sandbox/thermos_runner.pex 
--task_id=www-data-devel-hello_docker_engine-0-bde5cdc7-8685-46fd-9078-4a86bd5be15
   32 ?        S      0:00      |   _ /bin/bash -c      while true; do       
echo hello world       sleep 10     done
   81 ?        S      0:00      |       _ sleep 10
   31 ?        Ss     0:00      _ /usr/local/bin/python2.7 
/mnt/mesos/sandbox/thermos_runner.pex 
--task_id=www-data-devel-hello_docker_engine-0-bde5cdc7-8685-46fd-9078-4a86bd5be15
   33 ?        S      0:00          _ /bin/bash -c      while true; do       
echo hello world       sleep 10     done
   82 ?        S      0:00              _ sleep 10
   47 ?        S      0:00 python ./daemon.py
````

# Solution
Ensure processes that escape the supervision of the coordinator reparent to the 
runner who can send signals to them on task tear down.

After this change the process tree looks like:
````
root@aurora:/# ps xf
  PID TTY      STAT   TIME COMMAND
   66 ?        Ss     0:00 /bin/bash
   70 ?        R+     0:00  _ ps xf
    1 ?        Ss     0:00 /bin/sh -c ${MESOS_SANDBOX=.}/thermos_executor.pex 
--announcer-ensemble localhost:2181 --announcer-zookeeper-auth-config 
/home/vagrant/aurora/examples/va
    5 ?        Sl     0:02 python2.7 /mnt/mesos/sandbox/thermos_executor.pex 
--announcer-ensemble localhost:2181 --announcer-zookeeper-auth-config 
/home/vagrant/aurora/examples/vag
   23 ?        S      0:00  _ /usr/local/bin/python2.7 
/mnt/mesos/sandbox/thermos_runner.pex 
--task_id=www-data-devel-hello_docker_engine-0-721406db-00f5-4c0c-915e-1dbc5568b849
 --
   33 ?        Ss     0:00      _ /usr/local/bin/python2.7 
/mnt/mesos/sandbox/thermos_runner.pex 
--task_id=www-data-devel-hello_docker_engine-0-721406db-00f5-4c0c-915e-1dbc5568b84
   40 ?        S      0:00      |   _ /bin/bash -c      while true; do       
echo hello world       sleep 10     done
   63 ?        S      0:00      |       _ sleep 10
   36 ?        Ss     0:00      _ /usr/local/bin/python2.7 
/mnt/mesos/sandbox/thermos_runner.pex 
--task_id=www-data-devel-hello_docker_engine-0-721406db-00f5-4c0c-915e-1dbc5568b84
   37 ?        S      0:00      |   _ /bin/bash -c      while true; do       
echo hello world       sleep 10     done
   62 ?        S      0:00      |       _ sleep 10
   55 ?        S      0:00      _ python ./daemon.py
````

Now the runner is aware of the reparented procesess can can tear it down 
cleanly during teardown.


Diffs
-----

  src/main/python/apache/thermos/common/process_util.py 
abd2c0ef35858d13971319b0a7436ce2293824ce 
  src/main/python/apache/thermos/core/helper.py 
68855e1e54ba1cd4456e18a36fb237ce6a468c34 
  src/main/python/apache/thermos/core/process.py 
3ec43e2719ef97026f399c4b2aa23002559b3153 
  src/main/python/apache/thermos/core/runner.py 
7b9013d11f6ff4172b6b7bf56e62299b0d11c977 

Diff: https://reviews.apache.org/r/53403/diff/


Testing
-------

no automated tests yet.

Validated behaviour with `ps` and `strace`.


Thanks,

Zameer Manji

Reply via email to