> On Jan. 14, 2019, 4:31 p.m., Qian Zhang wrote: > > src/linux/seccomp/seccomp.cpp > > Lines 137-139 (patched) > > <https://reviews.apache.org/r/68018/diff/14/?file=2117423#file2117423line137> > > > > Will this affect the task run by Mesos? E.g., a task may want to run a > > program which has `set-user-ID` bit. > > Andrei Budnik wrote: > Yes, `no_new_privs` flag affects the task that wants to run a program > which has `set-user-ID` bit. > E.g., launching a `ping -c 3 8.8.8.8` fails with seccomp. You'll see a > message in executor logs: > ``` > I0114 07:19:21.887670 13264 executor.cpp:706] Forked command at 13276 > ping: socket: Operation not permitted > I0114 07:19:22.055352 13263 executor.cpp:1007] Command exited with status > 2 (pid: 13276) > ``` > > Also, see my previous comment > https://reviews.apache.org/r/68018/#comment297000 > > Qian Zhang wrote: > In your previous comment, you mentioned that Docker daemon launches its > containers with `SCMP_FLTATR_CTL_NNP` flag set by default, does that mean any > containers launched by Docker daemon cannot run program which has set-user-ID > bit? > > This seems unfortunate since it might break some use cases or > applications that we already supported. And can you please elaborate a bit > about `"Disabling SCMP_FLTATR_CTL_NNP flag for a root means that Seccomp > filter can be reverted anytime"`? How will the Seccomp filter be reverted? Do > you mean the task launched by Mesos can call libseccomp API to revert the > filter itself? > > If we have to live with this limitation (i.e., cannot run program which > has set-user-ID bit), then we need to highlight it in the document. > > Gilbert Song wrote: > Seems like we asked the same question. > > Andrei, let align on this thread? :/thanks:) > > Andrei Budnik wrote: > >does that mean any containers launched by Docker daemon cannot run > program which has set-user-ID bit? > > Docker daemon can not be used to run arbitrary programs (in opposity to > Mesos c'zer). So, when one launches a Docker container, Docker daemon > launches a container process with `NNP` bit set, which means that a container > process (and it descendants) can't gain more previleges **outside** its > container. Mesos containerizer has exactly the same behaviour: > > 1) Run system-provided `/bin/ping` (*outside* its container) as a > non-privileged user: > ``` > $ ./src/mesos-execute --master="`hostname`:5050" --name="a" > --containerizer=mesos --command="ping -c 3 8.8.8.8" > ... > Received status update TASK_FAILED for task 'a' > message: 'Command exited with status 2' > source: SOURCE_EXECUTOR > ``` > > 2) Run system-provided `/bin/ping` (*outside* its container) as a > privileged user: > ``` > sudo ./src/mesos-execute --master="`hostname`:5050" --name="a" > --containerizer=mesos --command="ping -c 3 8.8.8.8" > ... > Received status update TASK_FINISHED for task 'a' > message: 'Command exited with status 0' > source: SOURCE_EXECUTOR > ``` > > 3) Run container image provided `ping` (*inside* its image/container) as > a non-privileged user: > ``` > $ ./src/mesos-execute --master="`hostname`:5050" --name="a" > --containerizer=mesos --docker_image="fedora:latest" --command="yum -y > install iputils;ping -c 3 8.8.8.8" > ... > Received status update TASK_FINISHED for task 'a' > message: 'Command exited with status 0' > source: SOURCE_EXECUTOR > > $ cat /path/to/container/stdout > ... > PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. > 64 bytes from 8.8.8.8: icmp_seq=1 ttl=122 time=13.9 ms > ``` > > > This seems unfortunate since it might break some use cases or > applications that we already supported. > > It's very unlikely that the agent launches tasks, whose binary has > `setuid`/`setgid` bit specified. Because... what the point? > I doubt if any of the following programs a launched as a Mesos container: > ``` > $ sudo find /bin/ -perm -u=s -type f 2>/dev/null > /bin/newgrp > /bin/pkexec > /bin/mount > /bin/umount > /bin/newuidmap > /bin/newgidmap > /bin/sudo > /bin/crontab > /bin/su > /bin/gpasswd > /bin/chage > /bin/passwd > /bin/staprun > /bin/fusermount > /bin/fusermount-glusterfs > /bin/chfn > /bin/chsh > /bin/at > ``` > > > And can you please elaborate a bit about "Disabling SCMP_FLTATR_CTL_NNP > flag for a root means that Seccomp filter can be reverted anytime"? How will > the Seccomp filter be reverted? Do you mean the task launched by Mesos can > call libseccomp API to revert the filter itself? > > Yes, without `NNP` (`no_new_privs`) bit set, a privileged task might call > `seccomp` Linux syscall to install an empty Seccomp filter.
> Run system-provided /bin/ping (outside its container) as a non-privileged > user: As you mentioned in the above comment, this task will fail, but that's **after** your seccomp patches are applied. Before your seccomp patches are applied (e.g., I am using the latest code in Mesos master branch), it will succeed: ``` $ ./src/mesos-execute --master=192.168.56.5:5050 --name=test --command="ping -c 3 8.8.8.8" --checkpoint I0116 10:15:02.699398 14271 scheduler.cpp:189] Version: 1.8.0 I0116 10:15:02.977327 14287 scheduler.cpp:355] Using default 'basic' HTTP authenticatee I0116 10:15:02.979837 14285 scheduler.cpp:538] New master detected at master@192.168.56.5:5050 Subscribed with ID ea9488e1-a171-423f-8eb5-4d70187349fb-0001 Submitted task 'test' to agent '12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0' Received status update TASK_STARTING for task 'test' source: SOURCE_EXECUTOR Received status update TASK_RUNNING for task 'test' source: SOURCE_EXECUTOR Received status update TASK_FINISHED for task 'test' message: 'Command exited with status 0' source: SOURCE_EXECUTOR ``` To me, this is kind of feature broken, i.e., some previously supported user cases or applications will fail after your seccomp patches are applied. > when one launches a Docker container, Docker daemon launches a container > process with NNP bit set, which means that a container process (and it > descendants) can't gain more previleges outside its container. This seems not what I found with Docker. I created a Docker image with ping installed and a non-root user added: ``` FROM ubuntu:18.04 RUN apt-get update && apt-get install -y iputils-ping RUN adduser --disabled-password --gecos "" stack ``` And then I created a Docker container from that image with the non-root user, and I found ping worked. ``` docker run --rm -it --user stack ubuntu:stack sh $ id uid=1000(stack) gid=1000(stack) groups=1000(stack) $ ls -la /bin/ping -rwsr-xr-x. 1 root root 64424 Mar 9 2017 /bin/ping $ ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=3.25 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=3.20 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=3.48 ms ^C --- 8.8.8.8 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2002ms rtt min/avg/max/mdev = 3.200/3.312/3.481/0.121 ms ``` So Docker daemon actually can create a container to run the program which has set-user-ID bit, I am a bit confused what is the impact of `SCMP_FLTATR_CTL_NNP` flag which is set by Docker daemon for its containers as you mentioned. - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68018/#review211946 ----------------------------------------------------------- On Nov. 8, 2018, 11:24 p.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68018/ > ----------------------------------------------------------- > > (Updated Nov. 8, 2018, 11:24 p.m.) > > > Review request for mesos, Gilbert Song, Jie Yu, James Peach, and Qian Zhang. > > > Bugs: MESOS-9034 > https://issues.apache.org/jira/browse/MESOS-9034 > > > Repository: mesos > > > Description > ------- > > `SeccompFilter` class is a wrapper for `libseccomp` API. Its main > purpose is to provide a translation of the `ContainerSeccompProfile` > message into calls of `libseccomp` API. > > > Diffs > ----- > > src/CMakeLists.txt a574d449dc26b820cbef7ff0b5e94b42b6fe86cf > src/Makefile.am cd785255fcdf1302a8f9fa358039e5d1f200e132 > src/linux/seccomp/seccomp.hpp PRE-CREATION > src/linux/seccomp/seccomp.cpp PRE-CREATION > > > Diff: https://reviews.apache.org/r/68018/diff/16/ > > > Testing > ------- > > > Thanks, > > Andrei Budnik > >