> On Jan. 14, 2019, 8:31 a.m., Qian Zhang wrote:
> > src/linux/seccomp/seccomp.cpp
> > Lines 137-139 (patched)
> > <https://reviews.apache.org/r/68018/diff/14/?file=2117423#file2117423line137>
> >
> >     Will this affect the task run by Mesos? E.g., a task may want to run a 
> > program which has `set-user-ID` bit.
> 
> Andrei Budnik wrote:
>     Yes, `no_new_privs` flag affects the task that wants to run a program 
> which has `set-user-ID` bit.
>     E.g., launching a `ping -c 3 8.8.8.8` fails with seccomp. You'll see a 
> message in executor logs:
>     ```
>     I0114 07:19:21.887670 13264 executor.cpp:706] Forked command at 13276
>     ping: socket: Operation not permitted
>     I0114 07:19:22.055352 13263 executor.cpp:1007] Command exited with status 
> 2 (pid: 13276)
>     ```
>     
>     Also, see my previous comment 
> https://reviews.apache.org/r/68018/#comment297000
> 
> Qian Zhang wrote:
>     In your previous comment, you mentioned that Docker daemon launches its 
> containers with `SCMP_FLTATR_CTL_NNP` flag set by default, does that mean any 
> containers launched by Docker daemon cannot run program which has set-user-ID 
> bit?
>     
>     This seems unfortunate since it might break some use cases or 
> applications that we already supported. And can you please elaborate a bit 
> about `"Disabling SCMP_FLTATR_CTL_NNP flag for a root means that Seccomp 
> filter can be reverted anytime"`? How will the Seccomp filter be reverted? Do 
> you mean the task launched by Mesos can call libseccomp API to revert the 
> filter itself?
>     
>     If we have to live with this limitation (i.e., cannot run program which 
> has set-user-ID bit), then we need to highlight it in the document.
> 
> Gilbert Song wrote:
>     Seems like we asked the same question.
>     
>     Andrei, let align on this thread? :/thanks:)
> 
> Andrei Budnik wrote:
>     >does that mean any containers launched by Docker daemon cannot run 
> program which has set-user-ID bit?
>     
>     Docker daemon can not be used to run arbitrary programs (in opposity to 
> Mesos c'zer). So, when one launches a Docker container, Docker daemon 
> launches a container process with `NNP` bit set, which means that a container 
> process (and it descendants) can't gain more previleges **outside** its 
> container. Mesos containerizer has exactly the same behaviour:
>     
>     1) Run system-provided `/bin/ping` (*outside* its container) as a 
> non-privileged user:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" 
> --containerizer=mesos --command="ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FAILED for task 'a'
>       message: 'Command exited with status 2'
>       source: SOURCE_EXECUTOR
>     ```
>     
>     2) Run system-provided `/bin/ping` (*outside* its container) as a 
> privileged user:
>     ```
>     sudo ./src/mesos-execute --master="`hostname`:5050" --name="a" 
> --containerizer=mesos --command="ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FINISHED for task 'a'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     ```
>     
>     3) Run container image provided `ping` (*inside* its image/container) as 
> a non-privileged user:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" 
> --containerizer=mesos --docker_image="fedora:latest" --command="yum -y 
> install iputils;ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FINISHED for task 'a'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     
>     $ cat /path/to/container/stdout
>     ...
>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>     64 bytes from 8.8.8.8: icmp_seq=1 ttl=122 time=13.9 ms
>     ```
>     
>     > This seems unfortunate since it might break some use cases or 
> applications that we already supported.
>     
>     It's very unlikely that the agent launches tasks, whose binary has 
> `setuid`/`setgid` bit specified. Because... what the point?
>     I doubt if any of the following programs a launched as a Mesos container:
>     ```
>     $ sudo find /bin/ -perm -u=s -type f 2>/dev/null
>     /bin/newgrp
>     /bin/pkexec
>     /bin/mount
>     /bin/umount
>     /bin/newuidmap
>     /bin/newgidmap
>     /bin/sudo
>     /bin/crontab
>     /bin/su
>     /bin/gpasswd
>     /bin/chage
>     /bin/passwd
>     /bin/staprun
>     /bin/fusermount
>     /bin/fusermount-glusterfs
>     /bin/chfn
>     /bin/chsh
>     /bin/at
>     ```
>     
>     > And can you please elaborate a bit about "Disabling SCMP_FLTATR_CTL_NNP 
> flag for a root means that Seccomp filter can be reverted anytime"? How will 
> the Seccomp filter be reverted? Do you mean the task launched by Mesos can 
> call libseccomp API to revert the filter itself?
>     
>     Yes, without `NNP` (`no_new_privs`) bit set, a privileged task might call 
> `seccomp` Linux syscall to install an empty Seccomp filter.
> 
> Qian Zhang wrote:
>     > Run system-provided /bin/ping (outside its container) as a 
> non-privileged user:
>     
>     As you mentioned in the above comment, this task will fail, but that's 
> **after** your seccomp patches are applied. Before your seccomp patches are 
> applied (e.g., I am using the latest code in Mesos master branch), it will 
> succeed:
>     ```
>     $ ./src/mesos-execute --master=192.168.56.5:5050 --name=test 
> --command="ping -c 3 8.8.8.8" --checkpoint  
>     I0116 10:15:02.699398 14271 scheduler.cpp:189] Version: 1.8.0
>     I0116 10:15:02.977327 14287 scheduler.cpp:355] Using default 'basic' HTTP 
> authenticatee
>     I0116 10:15:02.979837 14285 scheduler.cpp:538] New master detected at 
> master@192.168.56.5:5050
>     Subscribed with ID ea9488e1-a171-423f-8eb5-4d70187349fb-0001
>     Submitted task 'test' to agent '12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0'
>     Received status update TASK_STARTING for task 'test'
>       source: SOURCE_EXECUTOR
>     Received status update TASK_RUNNING for task 'test'
>       source: SOURCE_EXECUTOR
>     Received status update TASK_FINISHED for task 'test'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     ```
>     To me, this is kind of feature broken, i.e., some previously supported 
> user cases or applications will fail after your seccomp patches are applied.
>     
>     > when one launches a Docker container, Docker daemon launches a 
> container process with NNP bit set, which means that a container process (and 
> it descendants) can't gain more previleges outside its container.
>     
>     This seems not what I found with Docker. I created a Docker image with 
> ping installed and a non-root user added:
>     ```
>     FROM ubuntu:18.04
>     
>     RUN apt-get update && apt-get install -y iputils-ping
>     RUN adduser --disabled-password --gecos "" stack
>     ```
>     
>     And then I created a Docker container from that image with the non-root 
> user, and I found ping worked.
>     ```
>     docker run --rm -it --user stack ubuntu:stack sh   
>     $ id 
>     uid=1000(stack) gid=1000(stack) groups=1000(stack)
>     $ ls -la /bin/ping 
>     -rwsr-xr-x. 1 root root 64424 Mar  9  2017 /bin/ping
>     $ ping 8.8.8.8 
>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>     64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=3.25 ms
>     64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=3.20 ms
>     64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=3.48 ms
>     ^C
>     --- 8.8.8.8 ping statistics ---
>     3 packets transmitted, 3 received, 0% packet loss, time 2002ms
>     rtt min/avg/max/mdev = 3.200/3.312/3.481/0.121 ms
>     ```
>     So Docker daemon actually can create a container to run the program which 
> has set-user-ID bit, I am a bit confused what is the impact of 
> `SCMP_FLTATR_CTL_NNP` flag which is set by Docker daemon for its containers 
> as you mentioned.

The example you have provided with Docker daemon is identical to the 3rd case 
from my previous comment:
```
$ ./src/mesos-execute --master="`hostname`:5050" --name="a" 
--containerizer=mesos --docker_image="fedora:latest" --command="yum -y install 
iputils;ping -c 3 8.8.8.8"
```
We behave in this case exactly as Docker.

At the same time first two cases are not supported by Docker, but supported by 
Mesos containerizer. Hence, the difference in behaviour.

Anyway, we need to set `NNP` bit both for a non-privileged user (otherwise, we 
have no permissions to install Seccomp filter - more details in seccomp man 
page) and for privileged user (otherwise, it does not make sense to install a 
Seccomp filter as it can be easily reverted later).


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68018/#review211946
-----------------------------------------------------------


On Nov. 8, 2018, 3:24 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68018/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2018, 3:24 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, James Peach, and Qian Zhang.
> 
> 
> Bugs: MESOS-9034
>     https://issues.apache.org/jira/browse/MESOS-9034
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `SeccompFilter` class is a wrapper for `libseccomp` API. Its main
> purpose is to provide a translation of the `ContainerSeccompProfile`
> message into calls of `libseccomp` API.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt a574d449dc26b820cbef7ff0b5e94b42b6fe86cf 
>   src/Makefile.am cd785255fcdf1302a8f9fa358039e5d1f200e132 
>   src/linux/seccomp/seccomp.hpp PRE-CREATION 
>   src/linux/seccomp/seccomp.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68018/diff/16/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to