> On Jan. 14, 2019, 8:31 a.m., Qian Zhang wrote: > > src/linux/seccomp/seccomp.cpp > > Lines 137-139 (patched) > > <https://reviews.apache.org/r/68018/diff/14/?file=2117423#file2117423line137> > > > > Will this affect the task run by Mesos? E.g., a task may want to run a > > program which has `set-user-ID` bit. > > Andrei Budnik wrote: > Yes, `no_new_privs` flag affects the task that wants to run a program > which has `set-user-ID` bit. > E.g., launching a `ping -c 3 8.8.8.8` fails with seccomp. You'll see a > message in executor logs: > ``` > I0114 07:19:21.887670 13264 executor.cpp:706] Forked command at 13276 > ping: socket: Operation not permitted > I0114 07:19:22.055352 13263 executor.cpp:1007] Command exited with status > 2 (pid: 13276) > ``` > > Also, see my previous comment > https://reviews.apache.org/r/68018/#comment297000 > > Qian Zhang wrote: > In your previous comment, you mentioned that Docker daemon launches its > containers with `SCMP_FLTATR_CTL_NNP` flag set by default, does that mean any > containers launched by Docker daemon cannot run program which has set-user-ID > bit? > > This seems unfortunate since it might break some use cases or > applications that we already supported. And can you please elaborate a bit > about `"Disabling SCMP_FLTATR_CTL_NNP flag for a root means that Seccomp > filter can be reverted anytime"`? How will the Seccomp filter be reverted? Do > you mean the task launched by Mesos can call libseccomp API to revert the > filter itself? > > If we have to live with this limitation (i.e., cannot run program which > has set-user-ID bit), then we need to highlight it in the document. > > Gilbert Song wrote: > Seems like we asked the same question. > > Andrei, let align on this thread? :/thanks:) > > Andrei Budnik wrote: > >does that mean any containers launched by Docker daemon cannot run > program which has set-user-ID bit? > > Docker daemon can not be used to run arbitrary programs (in opposity to > Mesos c'zer). So, when one launches a Docker container, Docker daemon > launches a container process with `NNP` bit set, which means that a container > process (and it descendants) can't gain more previleges **outside** its > container. Mesos containerizer has exactly the same behaviour: > > 1) Run system-provided `/bin/ping` (*outside* its container) as a > non-privileged user: > ``` > $ ./src/mesos-execute --master="`hostname`:5050" --name="a" > --containerizer=mesos --command="ping -c 3 8.8.8.8" > ... > Received status update TASK_FAILED for task 'a' > message: 'Command exited with status 2' > source: SOURCE_EXECUTOR > ``` > > 2) Run system-provided `/bin/ping` (*outside* its container) as a > privileged user: > ``` > sudo ./src/mesos-execute --master="`hostname`:5050" --name="a" > --containerizer=mesos --command="ping -c 3 8.8.8.8" > ... > Received status update TASK_FINISHED for task 'a' > message: 'Command exited with status 0' > source: SOURCE_EXECUTOR > ``` > > 3) Run container image provided `ping` (*inside* its image/container) as > a non-privileged user: > ``` > $ ./src/mesos-execute --master="`hostname`:5050" --name="a" > --containerizer=mesos --docker_image="fedora:latest" --command="yum -y > install iputils;ping -c 3 8.8.8.8" > ... > Received status update TASK_FINISHED for task 'a' > message: 'Command exited with status 0' > source: SOURCE_EXECUTOR > > $ cat /path/to/container/stdout > ... > PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. > 64 bytes from 8.8.8.8: icmp_seq=1 ttl=122 time=13.9 ms > ``` > > > This seems unfortunate since it might break some use cases or > applications that we already supported. > > It's very unlikely that the agent launches tasks, whose binary has > `setuid`/`setgid` bit specified. Because... what the point? > I doubt if any of the following programs a launched as a Mesos container: > ``` > $ sudo find /bin/ -perm -u=s -type f 2>/dev/null > /bin/newgrp > /bin/pkexec > /bin/mount > /bin/umount > /bin/newuidmap > /bin/newgidmap > /bin/sudo > /bin/crontab > /bin/su > /bin/gpasswd > /bin/chage > /bin/passwd > /bin/staprun > /bin/fusermount > /bin/fusermount-glusterfs > /bin/chfn > /bin/chsh > /bin/at > ``` > > > And can you please elaborate a bit about "Disabling SCMP_FLTATR_CTL_NNP > flag for a root means that Seccomp filter can be reverted anytime"? How will > the Seccomp filter be reverted? Do you mean the task launched by Mesos can > call libseccomp API to revert the filter itself? > > Yes, without `NNP` (`no_new_privs`) bit set, a privileged task might call > `seccomp` Linux syscall to install an empty Seccomp filter. > > Qian Zhang wrote: > > Run system-provided /bin/ping (outside its container) as a > non-privileged user: > > As you mentioned in the above comment, this task will fail, but that's > **after** your seccomp patches are applied. Before your seccomp patches are > applied (e.g., I am using the latest code in Mesos master branch), it will > succeed: > ``` > $ ./src/mesos-execute --master=192.168.56.5:5050 --name=test > --command="ping -c 3 8.8.8.8" --checkpoint > I0116 10:15:02.699398 14271 scheduler.cpp:189] Version: 1.8.0 > I0116 10:15:02.977327 14287 scheduler.cpp:355] Using default 'basic' HTTP > authenticatee > I0116 10:15:02.979837 14285 scheduler.cpp:538] New master detected at > master@192.168.56.5:5050 > Subscribed with ID ea9488e1-a171-423f-8eb5-4d70187349fb-0001 > Submitted task 'test' to agent '12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0' > Received status update TASK_STARTING for task 'test' > source: SOURCE_EXECUTOR > Received status update TASK_RUNNING for task 'test' > source: SOURCE_EXECUTOR > Received status update TASK_FINISHED for task 'test' > message: 'Command exited with status 0' > source: SOURCE_EXECUTOR > ``` > To me, this is kind of feature broken, i.e., some previously supported > user cases or applications will fail after your seccomp patches are applied. > > > when one launches a Docker container, Docker daemon launches a > container process with NNP bit set, which means that a container process (and > it descendants) can't gain more previleges outside its container. > > This seems not what I found with Docker. I created a Docker image with > ping installed and a non-root user added: > ``` > FROM ubuntu:18.04 > > RUN apt-get update && apt-get install -y iputils-ping > RUN adduser --disabled-password --gecos "" stack > ``` > > And then I created a Docker container from that image with the non-root > user, and I found ping worked. > ``` > docker run --rm -it --user stack ubuntu:stack sh > $ id > uid=1000(stack) gid=1000(stack) groups=1000(stack) > $ ls -la /bin/ping > -rwsr-xr-x. 1 root root 64424 Mar 9 2017 /bin/ping > $ ping 8.8.8.8 > PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. > 64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=3.25 ms > 64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=3.20 ms > 64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=3.48 ms > ^C > --- 8.8.8.8 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2002ms > rtt min/avg/max/mdev = 3.200/3.312/3.481/0.121 ms > ``` > So Docker daemon actually can create a container to run the program which > has set-user-ID bit, I am a bit confused what is the impact of > `SCMP_FLTATR_CTL_NNP` flag which is set by Docker daemon for its containers > as you mentioned. > > Andrei Budnik wrote: > The example you have provided with Docker daemon is identical to the 3rd > case from my previous comment: > ``` > $ ./src/mesos-execute --master="`hostname`:5050" --name="a" > --containerizer=mesos --docker_image="fedora:latest" --command="yum -y > install iputils;ping -c 3 8.8.8.8" > ``` > We behave in this case exactly as Docker. > > At the same time first two cases are not supported by Docker, but > supported by Mesos containerizer. Hence, the difference in behaviour. > > Anyway, we need to set `NNP` bit both for a non-privileged user > (otherwise, we have no permissions to install Seccomp filter - more details > in seccomp man page) and for privileged user (otherwise, it does not make > sense to install a Seccomp filter as it can be easily reverted later). > > Andrei Budnik wrote: > I will highlight this nuance in Seccomp documentation.
Added a note in the Seccomp doc: https://reviews.apache.org/r/69493/diff/4-5/ - Andrei ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68018/#review211946 ----------------------------------------------------------- On Nov. 8, 2018, 3:24 p.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68018/ > ----------------------------------------------------------- > > (Updated Nov. 8, 2018, 3:24 p.m.) > > > Review request for mesos, Gilbert Song, Jie Yu, James Peach, and Qian Zhang. > > > Bugs: MESOS-9034 > https://issues.apache.org/jira/browse/MESOS-9034 > > > Repository: mesos > > > Description > ------- > > `SeccompFilter` class is a wrapper for `libseccomp` API. Its main > purpose is to provide a translation of the `ContainerSeccompProfile` > message into calls of `libseccomp` API. > > > Diffs > ----- > > src/CMakeLists.txt a574d449dc26b820cbef7ff0b5e94b42b6fe86cf > src/Makefile.am cd785255fcdf1302a8f9fa358039e5d1f200e132 > src/linux/seccomp/seccomp.hpp PRE-CREATION > src/linux/seccomp/seccomp.cpp PRE-CREATION > > > Diff: https://reviews.apache.org/r/68018/diff/16/ > > > Testing > ------- > > > Thanks, > > Andrei Budnik > >