Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-09-28 Thread Vivek Goyal
On Tue, Sep 27, 2022 at 07:27:02PM +0200, German Maglione wrote:
> On Tue, Sep 27, 2022 at 6:57 PM Vivek Goyal  wrote:
> >
> > On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> > > On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > > > We previously had a chat here 
> > > > https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fda...@www.fastmail.com/T/
> > > > around virtiofsd and privileges and the case of trying to run virtiofsd 
> > > > inside an unprivileged (Kubernetes) container.
> > > >
> > > > Right now we're still using 9p, and it has bugs (basically it seems 
> > > > like the 9p inode flushing callback tries to allocate memory to send an 
> > > > RPC, and this causes OOM problems)
> > > > https://github.com/coreos/coreos-assembler/issues/1812
> > > >
> > > > Coming back to this...as of lately in Linux, there's support for 
> > > > strongly isolated filesystem access via openat2():
> > > > https://lwn.net/Articles/796868/
> > > >
> > > > Is there any reason we couldn't do an -o sandbox=openat2 ?  This 
> > > > operates without any privileges at all, and should be usable (and 
> > > > secure enough) in our use case.
> > >
> > > [ cc virtio-fs-list, german, sergio ]
> > >
> > > Hi Colin,
> > >
> > > Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> > > good idea. We talked about it few times but nobody ever wrote a patch to
> > > implement it.
> > >
> > > And it probably makes sense with all the sandboxes (chroot(), namespaces).
> > >
> > > I am wondering that it probably should not be a new sandbox mode at all.
> > > It probably should be the default if kernel offers openat2() syscall.
> > >
> > > Now all the development has moved to rust virtiofsd.
> > >
> > > https://gitlab.com/virtio-fs/virtiofsd
> > >
> > > C version of virtiofsd is just seeing small critical fixes.
> > >
> > > And rust version allows running unprivileged (inside a user namespace).
> > > German is also working on allowing running unprivileged without
> > > user namespaces but this will not allow arbitrary uid/gid switching.
> > >
> > > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> > >
> > > If one wants to run unprivileged and also do arbitrary uid/gid switching,
> > > then you need to use user namepsaces and map a range of subuid/subgid
> > > into the user namepsace virtiofsd is running in.
> > >
> > > If possible, please try to use rust virtiofsd for your situation. Its
> > > already packaged for fedora.
> > >
> > > Coming back to original idea of using openat2(), I think we should
> > > probably give it a try in rust virtiofsd and if it works, it should
> > > work across all the sandboxing modes.
> >
> > Thinking more about it, enabling openat2() usage conditionally based on
> > some option probably is not a bad idea. I was assuming that using
> > openat2() by default will not break any of the existing use cases. But
> > I am not sure. I have burnt my fingers so many times and had to back
> > out on default settings that enabling usage of openat2() conditionally
> > will probably be a safer choice. :-)
> >
> 
> I could work on this for the next major version and see if anything breaks.
> But I prefer to add this as a compilation feature, instead of a command line
> option that we will then have to maintain for a while.

What does compilation feature mean? One can compile it out? If it is
compiled in, is it enabled by default?

> 
> Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> process can call openat2() without RESOLVE_IN_ROOT. I did some test with
> Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> kernel 5.13

landlock sounds interesting. May be use it by default if kernel offers it.

Question will be, security mechanisms we are using, how many of these
are mutually exclusive and how many can be used together.

A. pivot_root()
B. chroot()
C. openat2()
D. landlock
E. seccomp

Seccomp goes well with everything. 
landlock probably will go well as well.

pivot_root() and chroot() are currently mutually exlusive.

openat2() is probably redundant if pivot_root()/chroot()/landlock is
being used. But should work anyway.

Something to document as Stefan suggested.

Vivek




Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-09-27 Thread Colin Walters



On Tue, Sep 27, 2022, at 1:27 PM, German Maglione wrote:
>
>> > Now all the development has moved to rust virtiofsd.

Oh, awesome!!  The code there looks great.

> I could work on this for the next major version and see if anything breaks.
> But I prefer to add this as a compilation feature, instead of a command line
> option that we will then have to maintain for a while.

Hmm, what would be the issue with having the code there by default?  I think 
rather than any new command line option, we automatically use 
`openat2+RESOLVE_IN_ROOT` if the process is run as a nonzero uid.

> Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> process can call openat2() without RESOLVE_IN_ROOT. 

I'm a bit skeptical honestly about how secure the existing namespace code is 
against a compromised virtiofsd process.  The primary worry is guest filesystem 
traversals, right?  openat2+RESOLVE_IN_ROOT addresses that.  Plus being in Rust 
makes this dramatically safer.

> I did some test with
> Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> kernel 5.13

But yes, landlock and other things make sense, I just don't see these things as 
strongly linked.  IOW we shouldn't in my opinion block unprivileged virtiofsd 
on more sandboxing than openat2 already gives us.



Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-09-27 Thread German Maglione
On Tue, Sep 27, 2022 at 6:57 PM Vivek Goyal  wrote:
>
> On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> > On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > > We previously had a chat here 
> > > https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fda...@www.fastmail.com/T/
> > > around virtiofsd and privileges and the case of trying to run virtiofsd 
> > > inside an unprivileged (Kubernetes) container.
> > >
> > > Right now we're still using 9p, and it has bugs (basically it seems like 
> > > the 9p inode flushing callback tries to allocate memory to send an RPC, 
> > > and this causes OOM problems)
> > > https://github.com/coreos/coreos-assembler/issues/1812
> > >
> > > Coming back to this...as of lately in Linux, there's support for strongly 
> > > isolated filesystem access via openat2():
> > > https://lwn.net/Articles/796868/
> > >
> > > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates 
> > > without any privileges at all, and should be usable (and secure enough) 
> > > in our use case.
> >
> > [ cc virtio-fs-list, german, sergio ]
> >
> > Hi Colin,
> >
> > Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> > good idea. We talked about it few times but nobody ever wrote a patch to
> > implement it.
> >
> > And it probably makes sense with all the sandboxes (chroot(), namespaces).
> >
> > I am wondering that it probably should not be a new sandbox mode at all.
> > It probably should be the default if kernel offers openat2() syscall.
> >
> > Now all the development has moved to rust virtiofsd.
> >
> > https://gitlab.com/virtio-fs/virtiofsd
> >
> > C version of virtiofsd is just seeing small critical fixes.
> >
> > And rust version allows running unprivileged (inside a user namespace).
> > German is also working on allowing running unprivileged without
> > user namespaces but this will not allow arbitrary uid/gid switching.
> >
> > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> >
> > If one wants to run unprivileged and also do arbitrary uid/gid switching,
> > then you need to use user namepsaces and map a range of subuid/subgid
> > into the user namepsace virtiofsd is running in.
> >
> > If possible, please try to use rust virtiofsd for your situation. Its
> > already packaged for fedora.
> >
> > Coming back to original idea of using openat2(), I think we should
> > probably give it a try in rust virtiofsd and if it works, it should
> > work across all the sandboxing modes.
>
> Thinking more about it, enabling openat2() usage conditionally based on
> some option probably is not a bad idea. I was assuming that using
> openat2() by default will not break any of the existing use cases. But
> I am not sure. I have burnt my fingers so many times and had to back
> out on default settings that enabling usage of openat2() conditionally
> will probably be a safer choice. :-)
>

I could work on this for the next major version and see if anything breaks.
But I prefer to add this as a compilation feature, instead of a command line
option that we will then have to maintain for a while.

Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
process can call openat2() without RESOLVE_IN_ROOT. I did some test with
Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
kernel 5.13

Cheers,
-- 
German




Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-09-27 Thread Vivek Goyal
On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > We previously had a chat here 
> > https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fda...@www.fastmail.com/T/
> > around virtiofsd and privileges and the case of trying to run virtiofsd 
> > inside an unprivileged (Kubernetes) container.
> > 
> > Right now we're still using 9p, and it has bugs (basically it seems like 
> > the 9p inode flushing callback tries to allocate memory to send an RPC, and 
> > this causes OOM problems)
> > https://github.com/coreos/coreos-assembler/issues/1812
> > 
> > Coming back to this...as of lately in Linux, there's support for strongly 
> > isolated filesystem access via openat2():
> > https://lwn.net/Articles/796868/
> > 
> > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates 
> > without any privileges at all, and should be usable (and secure enough) in 
> > our use case.
> 
> [ cc virtio-fs-list, german, sergio ]
> 
> Hi Colin,
> 
> Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> good idea. We talked about it few times but nobody ever wrote a patch to
> implement it.
> 
> And it probably makes sense with all the sandboxes (chroot(), namespaces).
> 
> I am wondering that it probably should not be a new sandbox mode at all.
> It probably should be the default if kernel offers openat2() syscall.
> 
> Now all the development has moved to rust virtiofsd.
> 
> https://gitlab.com/virtio-fs/virtiofsd
> 
> C version of virtiofsd is just seeing small critical fixes.
> 
> And rust version allows running unprivileged (inside a user namespace).
> German is also working on allowing running unprivileged without
> user namespaces but this will not allow arbitrary uid/gid switching.
> 
> https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> 
> If one wants to run unprivileged and also do arbitrary uid/gid switching,
> then you need to use user namepsaces and map a range of subuid/subgid
> into the user namepsace virtiofsd is running in.
> 
> If possible, please try to use rust virtiofsd for your situation. Its
> already packaged for fedora.
> 
> Coming back to original idea of using openat2(), I think we should
> probably give it a try in rust virtiofsd and if it works, it should
> work across all the sandboxing modes.

Thinking more about it, enabling openat2() usage conditionally based on
some option probably is not a bad idea. I was assuming that using
openat2() by default will not break any of the existing use cases. But
I am not sure. I have burnt my fingers so many times and had to back
out on default settings that enabling usage of openat2() conditionally
will probably be a safer choice. :-)

Vivek




Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-09-27 Thread Vivek Goyal
On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> We previously had a chat here 
> https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fda...@www.fastmail.com/T/
> around virtiofsd and privileges and the case of trying to run virtiofsd 
> inside an unprivileged (Kubernetes) container.
> 
> Right now we're still using 9p, and it has bugs (basically it seems like the 
> 9p inode flushing callback tries to allocate memory to send an RPC, and this 
> causes OOM problems)
> https://github.com/coreos/coreos-assembler/issues/1812
> 
> Coming back to this...as of lately in Linux, there's support for strongly 
> isolated filesystem access via openat2():
> https://lwn.net/Articles/796868/
> 
> Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates 
> without any privileges at all, and should be usable (and secure enough) in 
> our use case.

[ cc virtio-fs-list, german, sergio ]

Hi Colin,

Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
good idea. We talked about it few times but nobody ever wrote a patch to
implement it.

And it probably makes sense with all the sandboxes (chroot(), namespaces).

I am wondering that it probably should not be a new sandbox mode at all.
It probably should be the default if kernel offers openat2() syscall.

Now all the development has moved to rust virtiofsd.

https://gitlab.com/virtio-fs/virtiofsd

C version of virtiofsd is just seeing small critical fixes.

And rust version allows running unprivileged (inside a user namespace).
German is also working on allowing running unprivileged without
user namespaces but this will not allow arbitrary uid/gid switching.

https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136

If one wants to run unprivileged and also do arbitrary uid/gid switching,
then you need to use user namepsaces and map a range of subuid/subgid
into the user namepsace virtiofsd is running in.

If possible, please try to use rust virtiofsd for your situation. Its
already packaged for fedora.

Coming back to original idea of using openat2(), I think we should
probably give it a try in rust virtiofsd and if it works, it should
work across all the sandboxing modes.

Thanks
Vivek

> 
> I may try a patch if this sounds OK...
>