> On Jan. 14, 2019, 8:31 a.m., Qian Zhang wrote:
> > src/linux/seccomp/seccomp.cpp
> > Lines 137-139 (patched)
> > <https://reviews.apache.org/r/68018/diff/14/?file=2117423#file2117423line137>
> >
> >     Will this affect the task run by Mesos? E.g., a task may want to run a 
> > program which has `set-user-ID` bit.
> 
> Andrei Budnik wrote:
>     Yes, `no_new_privs` flag affects the task that wants to run a program 
> which has `set-user-ID` bit.
>     E.g., launching a `ping -c 3 8.8.8.8` fails with seccomp. You'll see a 
> message in executor logs:
>     ```
>     I0114 07:19:21.887670 13264 executor.cpp:706] Forked command at 13276
>     ping: socket: Operation not permitted
>     I0114 07:19:22.055352 13263 executor.cpp:1007] Command exited with status 
> 2 (pid: 13276)
>     ```
>     
>     Also, see my previous comment 
> https://reviews.apache.org/r/68018/#comment297000
> 
> Qian Zhang wrote:
>     In your previous comment, you mentioned that Docker daemon launches its 
> containers with `SCMP_FLTATR_CTL_NNP` flag set by default, does that mean any 
> containers launched by Docker daemon cannot run program which has set-user-ID 
> bit?
>     
>     This seems unfortunate since it might break some use cases or 
> applications that we already supported. And can you please elaborate a bit 
> about `"Disabling SCMP_FLTATR_CTL_NNP flag for a root means that Seccomp 
> filter can be reverted anytime"`? How will the Seccomp filter be reverted? Do 
> you mean the task launched by Mesos can call libseccomp API to revert the 
> filter itself?
>     
>     If we have to live with this limitation (i.e., cannot run program which 
> has set-user-ID bit), then we need to highlight it in the document.
> 
> Gilbert Song wrote:
>     Seems like we asked the same question.
>     
>     Andrei, let align on this thread? :/thanks:)
> 
> Andrei Budnik wrote:
>     >does that mean any containers launched by Docker daemon cannot run 
> program which has set-user-ID bit?
>     
>     Docker daemon can not be used to run arbitrary programs (in opposity to 
> Mesos c'zer). So, when one launches a Docker container, Docker daemon 
> launches a container process with `NNP` bit set, which means that a container 
> process (and it descendants) can't gain more previleges **outside** its 
> container. Mesos containerizer has exactly the same behaviour:
>     
>     1) Run system-provided `/bin/ping` (*outside* its container) as a 
> non-privileged user:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" 
> --containerizer=mesos --command="ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FAILED for task 'a'
>       message: 'Command exited with status 2'
>       source: SOURCE_EXECUTOR
>     ```
>     
>     2) Run system-provided `/bin/ping` (*outside* its container) as a 
> privileged user:
>     ```
>     sudo ./src/mesos-execute --master="`hostname`:5050" --name="a" 
> --containerizer=mesos --command="ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FINISHED for task 'a'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     ```
>     
>     3) Run container image provided `ping` (*inside* its image/container) as 
> a non-privileged user:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" 
> --containerizer=mesos --docker_image="fedora:latest" --command="yum -y 
> install iputils;ping -c 3 8.8.8.8"
>     ...
>     Received status update TASK_FINISHED for task 'a'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     
>     $ cat /path/to/container/stdout
>     ...
>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>     64 bytes from 8.8.8.8: icmp_seq=1 ttl=122 time=13.9 ms
>     ```
>     
>     > This seems unfortunate since it might break some use cases or 
> applications that we already supported.
>     
>     It's very unlikely that the agent launches tasks, whose binary has 
> `setuid`/`setgid` bit specified. Because... what the point?
>     I doubt if any of the following programs a launched as a Mesos container:
>     ```
>     $ sudo find /bin/ -perm -u=s -type f 2>/dev/null
>     /bin/newgrp
>     /bin/pkexec
>     /bin/mount
>     /bin/umount
>     /bin/newuidmap
>     /bin/newgidmap
>     /bin/sudo
>     /bin/crontab
>     /bin/su
>     /bin/gpasswd
>     /bin/chage
>     /bin/passwd
>     /bin/staprun
>     /bin/fusermount
>     /bin/fusermount-glusterfs
>     /bin/chfn
>     /bin/chsh
>     /bin/at
>     ```
>     
>     > And can you please elaborate a bit about "Disabling SCMP_FLTATR_CTL_NNP 
> flag for a root means that Seccomp filter can be reverted anytime"? How will 
> the Seccomp filter be reverted? Do you mean the task launched by Mesos can 
> call libseccomp API to revert the filter itself?
>     
>     Yes, without `NNP` (`no_new_privs`) bit set, a privileged task might call 
> `seccomp` Linux syscall to install an empty Seccomp filter.
> 
> Qian Zhang wrote:
>     > Run system-provided /bin/ping (outside its container) as a 
> non-privileged user:
>     
>     As you mentioned in the above comment, this task will fail, but that's 
> **after** your seccomp patches are applied. Before your seccomp patches are 
> applied (e.g., I am using the latest code in Mesos master branch), it will 
> succeed:
>     ```
>     $ ./src/mesos-execute --master=192.168.56.5:5050 --name=test 
> --command="ping -c 3 8.8.8.8" --checkpoint  
>     I0116 10:15:02.699398 14271 scheduler.cpp:189] Version: 1.8.0
>     I0116 10:15:02.977327 14287 scheduler.cpp:355] Using default 'basic' HTTP 
> authenticatee
>     I0116 10:15:02.979837 14285 scheduler.cpp:538] New master detected at 
> master@192.168.56.5:5050
>     Subscribed with ID ea9488e1-a171-423f-8eb5-4d70187349fb-0001
>     Submitted task 'test' to agent '12866186-dc2b-48a9-88ad-f9d951cf8c7f-S0'
>     Received status update TASK_STARTING for task 'test'
>       source: SOURCE_EXECUTOR
>     Received status update TASK_RUNNING for task 'test'
>       source: SOURCE_EXECUTOR
>     Received status update TASK_FINISHED for task 'test'
>       message: 'Command exited with status 0'
>       source: SOURCE_EXECUTOR
>     ```
>     To me, this is kind of feature broken, i.e., some previously supported 
> user cases or applications will fail after your seccomp patches are applied.
>     
>     > when one launches a Docker container, Docker daemon launches a 
> container process with NNP bit set, which means that a container process (and 
> it descendants) can't gain more previleges outside its container.
>     
>     This seems not what I found with Docker. I created a Docker image with 
> ping installed and a non-root user added:
>     ```
>     FROM ubuntu:18.04
>     
>     RUN apt-get update && apt-get install -y iputils-ping
>     RUN adduser --disabled-password --gecos "" stack
>     ```
>     
>     And then I created a Docker container from that image with the non-root 
> user, and I found ping worked.
>     ```
>     docker run --rm -it --user stack ubuntu:stack sh   
>     $ id 
>     uid=1000(stack) gid=1000(stack) groups=1000(stack)
>     $ ls -la /bin/ping 
>     -rwsr-xr-x. 1 root root 64424 Mar  9  2017 /bin/ping
>     $ ping 8.8.8.8 
>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>     64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=3.25 ms
>     64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=3.20 ms
>     64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=3.48 ms
>     ^C
>     --- 8.8.8.8 ping statistics ---
>     3 packets transmitted, 3 received, 0% packet loss, time 2002ms
>     rtt min/avg/max/mdev = 3.200/3.312/3.481/0.121 ms
>     ```
>     So Docker daemon actually can create a container to run the program which 
> has set-user-ID bit, I am a bit confused what is the impact of 
> `SCMP_FLTATR_CTL_NNP` flag which is set by Docker daemon for its containers 
> as you mentioned.
> 
> Andrei Budnik wrote:
>     The example you have provided with Docker daemon is identical to the 3rd 
> case from my previous comment:
>     ```
>     $ ./src/mesos-execute --master="`hostname`:5050" --name="a" 
> --containerizer=mesos --docker_image="fedora:latest" --command="yum -y 
> install iputils;ping -c 3 8.8.8.8"
>     ```
>     We behave in this case exactly as Docker.
>     
>     At the same time first two cases are not supported by Docker, but 
> supported by Mesos containerizer. Hence, the difference in behaviour.
>     
>     Anyway, we need to set `NNP` bit both for a non-privileged user 
> (otherwise, we have no permissions to install Seccomp filter - more details 
> in seccomp man page) and for privileged user (otherwise, it does not make 
> sense to install a Seccomp filter as it can be easily reverted later).
> 
> Andrei Budnik wrote:
>     I will highlight this nuance in Seccomp documentation.
> 
> Andrei Budnik wrote:
>     Added a note in the Seccomp doc: 
> https://reviews.apache.org/r/69493/diff/4-5/
> 
> Qian Zhang wrote:
>     > The example you have provided with Docker daemon is identical to the 
> 3rd case from my previous comment:
>     
>     I think they are different, the `ping` binary I installed with the 
> `ubuntu` image has the set-user-ID bit, but the `ping` binary you installed 
> with the `fedora` image has **no** set-user-ID bit. So my example proves 
> Docker daemon actually can create a container with a non-root user to run a 
> program which has set-user-ID bit. Can you please try your 3rd case with the 
> `ubunut` image? If it fails, then I think that's not acceptable since the 
> same use case can be supported by Docker but not by us.
>     
>     > Added a note in the Seccomp doc: 
> https://reviews.apache.org/r/69493/diff/4-5/
>     
>     I see you added the statement below in the doc:
>     ```
>     So, when a framework wants to launch an OS-provided `ping` task as a 
> non-privileged user, the task will fail.
>     ```
>     My concern is, when a framework wants to launch an image-provided (e.g., 
> ubuntu image) `ping` task as non-privileged user, will the task fail too? And 
> why do we need to care about OS-provided and image-provided? I think the 
> point should be whether the binary (no matter it is OS-provided or 
> image-provided) that the task will execute has set-user-ID bit or not, right?
>     
>     > So, when one launches a Docker container, Docker daemon launches a 
> container process with NNP bit set
>     
>     This seems not what I found with Docker:
>     ```
>     $ docker run --rm --user operator alpine sleep 1000
>     $ ps -ef | grep sleep 
>     stack    25409 23826  0 10:49 pts/0    00:00:00 docker run --rm --user 
> operator alpine sleep 1000
>     11       25478 25455  0 10:49 ?        00:00:00 sleep 1000
>     $ cat /proc/25478/status | grep NoNewPrivs
>     NoNewPrivs:     0
>     ```
>     So as you see, the NNP bit is **not** set for the container process. I 
> think it will only be set when one specifies 
> `--security-opt="no-new-privileges:true"` when launching a Docker container.
>     
>     
>     > we need to set NNP bit both for a non-privileged user (otherwise, we 
> have no permissions to install Seccomp filter - more details in seccomp man 
> page)
>     
>     Can you please elaborate a bit why Seccomp filter cannot be installed for 
> a non-privileged user if NNP bit is not set? That seems not true for Docker, 
> Docker daemon can install the Seccomp filter defined in the default Seccomp 
> profile without NNP bit set. I can create a Docker container successfully 
> with the command like `"docker run --rm -it --user operator --security-opt 
> seccomp=/home/stack/workspace/mesos/build/default.json 
> --security-opt="no-new-privileges:false" alpine sh"`.

`no_new_privs` bit is explicitly disabled. I've updated the patch chain.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68018/#review211946
-----------------------------------------------------------


On Jan. 18, 2019, 8 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68018/
> -----------------------------------------------------------
> 
> (Updated Jan. 18, 2019, 8 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, James Peach, and Qian Zhang.
> 
> 
> Bugs: MESOS-9034
>     https://issues.apache.org/jira/browse/MESOS-9034
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `SeccompFilter` class is a wrapper for `libseccomp` API. Its main
> purpose is to provide a translation of the `ContainerSeccompProfile`
> message into calls of `libseccomp` API.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt a574d449dc26b820cbef7ff0b5e94b42b6fe86cf 
>   src/Makefile.am cd785255fcdf1302a8f9fa358039e5d1f200e132 
>   src/linux/seccomp/seccomp.hpp PRE-CREATION 
>   src/linux/seccomp/seccomp.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68018/diff/17/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to