[ https://issues.apache.org/jira/browse/AURORA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526955#comment-14526955 ]
Bill Farner commented on AURORA-1303: ------------------------------------- Thanks for reporting! Are you able to reproduce this in our vagrant image? > Thermos runner broken with non-root account > ------------------------------------------- > > Key: AURORA-1303 > URL: https://issues.apache.org/jira/browse/AURORA-1303 > Project: Aurora > Issue Type: Bug > Components: Executor > Affects Versions: 0.7.0 > Reporter: Ovidiu Predescu > > This happens with the latest code from github. > I'm trying to schedule the hello_world example using a non-root role. The > thermos_runner crashes when it tries to write the checkpoint in the > fetch_package process. > It looks like what is happening is the runner is executing as the non-root > user, but the checkpoint is owned by root. > Unfortunately the error handling in Aurora is not very good. The exception > thrown by the runner is silently swallowed, and the fetch_package process is > running without showing any failures in the log files. I was able to figure > out what's going on by manually running the command. > As a workaround I added user 'ovidiu' to group 'root', since the directory > containing the checkpoint has 'rwx' permissions for the group. > This is the command: > /usr/bin/python2.7 > /var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex > --setuid=ovidiu > --thermos_json=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/task.json > > --sandbox=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/sandbox > --log_dir=. > --task_id=1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1 > --log_to_disk=DEBUG --checkpoint_root=/var/run/thermos --hostname=m1a.dc > And here is the output: > Writing log files to disk in . > ERROR] Found existing runner, cannot take control. > ERROR] Unknown exception: Unable to open checkpoint > /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner > ERROR] Traceback (most recent call last): > ERROR] File > "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/bin/thermos_runner.py", > line 176, in proxy_main > ERROR] File > "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py", > line 859, in run > ERROR] with self.control(force): > ERROR] File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ > ERROR] return self.gen.next() > ERROR] File > "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py", > line 552, in control > ERROR] raise self.PermissionError('Unable to open checkpoint %s' % > ckpt_file) > ERROR] PermissionError: Unable to open checkpoint > /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner -- This message was sent by Atlassian JIRA (v6.3.4#6332)