Thanks for this great explanation, now everything is more clear to me. Basically this problem happens every time there is permissions configuration just a little bit more advanced and at the moment there isn't an on the fly solution that doesn't need kernel patching and recompiling.
Good to know. 2 giu 2021, 10:32 da [email protected]: > On Wed, Jun 2, 2021 at 1:01 AM <[email protected]> wrote: > >> >> Hello to everyone, >> >> I'm trying virtiofsd with proxmox from few weeks and I noticed a problem, >> virtiofsd doesn't garant write access at users allowed by group permission. >> >> The virtiofsd bin included in proxmox is v 7.32, but I have also tested the >> bins compiled from source from stable branch (v7.27) and dev branch (v7.33), >> and I also have tested the bin virtiofsd-rs. Always same problem. >> >> I opened an issue on gitlab and I asked in the proxmox forum but with almost >> zero interaction. Seems to me that virtiofsd isn't a lot used. Someone >> suggested me to write in the mailinglist for these technical things and here >> I am. >> >> To better make understand what problem I'm noticing I link the issue opened >> in the gitlab in which I have reported a lot of useful info and logs with a >> good formatting: >> https://gitlab.com/qemu-project/qemu/-/issues/368 >> >> To me seems really strange what is happening, am I doing some error or this >> really is a virtiofsd bug? >> > > I'm pretty sure this is a virtiofsd issue because we ran into the same > problem on crosvm. The issue is that the server changes its uid/gid > to the uid/gid in the fuse context before making the syscall. This > ensures that the file/directory appears atomically with the correct > metadata. However, this causes an EACCES error when the following > conditions are > met: > > * The parent directory has g+rw permissions with gid A > * The process has gid B but has A in its list of supplementary groups > > In this case the fuse context only contains gid B, which doesn't have > permission to modify the parent directory. > > There are a couple of ways to fix this problem: > > The first one we tried was to split file/directory creation into 2 > stages [1]. Basically for files we first create a temporary with > O_TMPFILE and then initialize the metadata before linking it into the > directory tree. The main issue with this is that we're duplicating > the work that kernel already does on open and turning a single syscall > in the VM into several syscalls on the host, which adds a significant > amount of latency. You also have to deal with a bunch of esoteric > corner cases for file systems that the kernel would normally just > handle automatically [2][3]. For directories, there is no O_TMPFILE > equivalent so we had to do a gross hack of creating a directory with a > random name and then renaming it to the correct one once all the > metadata was properly initialized. In theory you could create the > directory in a separate "work dir" first but you have to be careful if > the original directory uses selinux. From what I understand, rename > preserves the security context so to ensure the security context is > properly inherited from the parent directory you need to create a new > directory anyway. Or figure out what the correct context should be > and set it in the work dir before the rename. > > The second solution, which is also what we're using now, is to set the > SECBIT_NO_SETUID_FIXUP flag. This flag prevents the kernel from > dropping caps when the process changes its uid/gid so the permission > checks are skipped as long as the server has the appropriate > capabilities (CAP_DAC_OVERRIDE, I think?). Doing this lets us drop > all the special handling code and just go back to making a single > syscall and letting the kernel figure out the rest [4]. The crosvm fs > device always runs as root in a user namespace with the proper caps so > this works for us but obviously will not work if virtiofsd doesn't > have the caps to begin with. At that point the first option may be > the only choice. > > I guess a third option is to change the fuse protocol so that it also > includes the supplementary groups of the calling process. Then the > server can also set its supplementary groups the same way before > making the syscall. Merging the necessary changes into the kernel is > left as an exercise for the reader ;-) > > > Chirantan > > [1]: > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2217534 > [2]: > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2253493 > [3]: > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2260253 > [4]: > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2684067 >
_______________________________________________ Virtio-fs mailing list [email protected] https://listman.redhat.com/mailman/listinfo/virtio-fs
