On Thu, Jul 23, 2020 at 01:50:35PM +0100, Daniel P. Berrangé wrote: > On Thu, Jul 23, 2020 at 01:46:11PM +0100, Stefan Hajnoczi wrote: > > On Wed, Jul 22, 2020 at 06:03:28PM +0100, Daniel P. Berrangé wrote: > > > On Wed, Jul 22, 2020 at 02:02:06PM +0100, Stefan Hajnoczi wrote: > > > > An assertion failure is raised during request processing if > > > > unshare(CLONE_FS) fails. Implement a probe at startup so the problem can > > > > be detected right away. > > > > > > > > Unfortunately Docker/Moby does not include unshare in the seccomp.json > > > > list unless CAP_SYS_ADMIN is given. Other seccomp.json lists always > > > > include unshare (e.g. podman is unaffected): > > > > https://raw.githubusercontent.com/seccomp/containers-golang/master/seccomp.json > > > > > > > > Use "docker run --security-opt seccomp=path/to/seccomp.json ..." if the > > > > default seccomp.json is missing unshare. > > > > > > > > Cc: Misono Tomohiro <misono.tomoh...@jp.fujitsu.com> > > > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > > > --- > > > > tools/virtiofsd/fuse_virtio.c | 13 +++++++++++++ > > > > 1 file changed, 13 insertions(+) > > > > > > > > diff --git a/tools/virtiofsd/fuse_virtio.c > > > > b/tools/virtiofsd/fuse_virtio.c > > > > index 3b6d16a041..ebeb352514 100644 > > > > --- a/tools/virtiofsd/fuse_virtio.c > > > > +++ b/tools/virtiofsd/fuse_virtio.c > > > > @@ -949,6 +949,19 @@ int virtio_session_mount(struct fuse_session *se) > > > > { > > > > int ret; > > > > > > > > + /* > > > > + * Test that unshare(CLONE_FS) works. fv_queue_worker() will need > > > > it. It's > > > > + * an unprivileged system call but some Docker/Moby versions are > > > > known to > > > > + * reject it via seccomp when CAP_SYS_ADMIN is not given. > > > > + */ > > > > + ret = unshare(CLONE_FS); > > > > + if (ret == -1 && errno == EPERM) { > > > > + fuse_log(FUSE_LOG_ERR, "unshare(CLONE_FS) failed with EPERM. > > > > If " > > > > + "running in a container please check that the > > > > container " > > > > + "runtime seccomp policy allows unshare.\n"); > > > > + return -1; > > > > + } > > > > + > > > > > > This describes the unshare() call as a "probe" and a "test", but that's > > > misleading IMHO. A "probe" / "test" implies that after it has completed, > > > there's no lingering side-effect, which isn't the case here. > > > > > > This is actively changing the process' namespace environment in the > > > success case, and not putting it back how it was originally. > > > > > > May be this is in fact OK, but if so I think the commit message and > > > comment should explain/justify what its fine to have this lingering > > > side-effect. > > > > > > If we want to avoid the side-effect then we need to fork() and run > > > unshare() in the child, and use a check of exit status of the child > > > to determine the result. > > > > Thanks for pointing this out. I'll add a comment explaining that > > virtiofsd is single-threaded at this point. No other threads share the > > file system attributes so the call has no observable side-effects. > > Also, if we do an unshare() here, do we still need the unshare() later > on in the code ?
I think so. That unshare() later is to isolate one thread from other. Thanks Vivek