Re: RFC: nfsd in a vnet jail
On Mon, Dec 19, 2022 at 9:36 AM Bjoern A. Zeeb wrote: > On Mon, 19 Dec 2022, Rick Macklem wrote: > [good stuff snipped] > > Unfortunately, this does not deal with vnet'ng the kgssapi, rpcsec_gss > for > > Kerberized mounts or vnet'ng NFS-over-TLS, but those could be handled in > a > > similar manner, I think? > > Could be, yes. > > I have now created a patch for the NFS-over-TLS part of the krpc. It uses the same technique, except the macros are called KRPC_VNETxxx instead of NFSD_VNETxxx. The patches are in phabricator as: D37519 - Most of the changes. D3 - The krpc changes for NFS-over-TLS D37741 - The vfs_mount.c changes in D37519 Although I listed a few possible reviewers, anyone is welcome to test and/or review them. The patches are also here (in a form that "patch" might prefer): https://people.freebsd.org/~rmacklem/vnet.patch https://people.freebsd.org/~rmacklem/vnetsmall-rpctls.patch rick > > > So, what do others think of this alternate plan? > > > > rick > > ps: Every use of the vnet'd variables is currently wrapped in a macro > called > >NFSD_VNET(), so the change is pretty easy to do by just re-writing > this > > macro. > > > > -- > Bjoern A. Zeeb r15:7 >
Re: RFC: nfsd in a vnet jail
On Mon, 19 Dec 2022, Rick Macklem wrote: Hi, Kostik expressed some concern w.r.t. using a non-default VNET_NFSD kernel build option and I understand his concern, given that many prefer to use a GENERIC kernel and binary updates. yes, I may have hinted towards that (at least in my mind) during looking at the review. There is a reason that (at least for now) I do like having it. Personally (due to lack of time mostly) I haven't figured out if I would want this to be a vnfs one day or part of vnet. The earlier (like the kernel option) could more easily address possible security concerns to people (especially while this is a moving target with more possibly coming later). Hence me having asked for a dedicated macro not mangling with the VNET macros directly in place as this gives us a lot of flexibility to easily move this around in the future if we wanted to. Removing the option will make the code simpler. Right now there are 29 NFS variables VNET_DEFINED() and several of them are arrays currently sized at 500. One of the reasons for the non-default VNET_NFSD kernel option was the bloat this caused to the vnet. (Chris expressed concern that adding mountd/nfsd to the vnet would result in bloat/overhead in a previous post to this thread.) Another issue with putting all these variables in the vnet is that the nfsd.ko cannot be loaded (it complains the vnet is out of space) and, as such, options NFSD must be used with VNET_NFSD so that the nfsd is linked into the kernel. As such, I am wondering what others think of this alternate plan? - Pull all the VNET_DEFINE()'d variable (except the 3 manipulated by sysctls) into a structure. - Define a single VNET_DEFINE()'d variable that is a pointer to that structure and then malloc() the structure in the function called by VNET_SYSINIT(). This would result in a malloc'd structure for each vnet jail (for kernels built with VMIMAGE), but would only add 4 variables to the vnet. If a small C file that only consists of the VNET_DEFINE()s for the 4 variables is linked into the kernel whenever the VIMAGE option is specified, then I think nfsd.ko would be loadable. It is not the number of variables added that is a problem; it is (as you point out above) their size which is a problem. So 500 uint8_t variables are as expensive as 1 uint8_t[500]; There's only a "small" allocated per-vnet to replicate state for modules. Multicast also had that problem needing huge junks and we eventually switched to malloc to fix this and make the module loadable as well again. (See 1a117215c7f90e6ef8c50ef3bfe099490aaa98f9 for one of the changes -- I think I scrwed somehting up there about sizing so probably follow-ups but it'll show the concept). So wether you make this one huge malloc or multiple small ones for the few big variables is up to you in the end. I also wonder what is easier to deal with "vnet0/prison0/'nfs0-bits'" as NFS_ROOT and other parts will need some of these things eventually to be in place early one for the base. Also sysctl and malloc and virtualisation can be a bit tricky. The "common" theme would be to only malloc the complex data types but leave the simple type variables being normal virtualised ones. Unfortunately, this does not deal with vnet'ng the kgssapi, rpcsec_gss for Kerberized mounts or vnet'ng NFS-over-TLS, but those could be handled in a similar manner, I think? Could be, yes. So, what do others think of this alternate plan? rick ps: Every use of the vnet'd variables is currently wrapped in a macro called NFSD_VNET(), so the change is pretty easy to do by just re-writing this macro. -- Bjoern A. Zeeb r15:7
Re: RFC: nfsd in a vnet jail
Hi, Kostik expressed some concern w.r.t. using a non-default VNET_NFSD kernel build option and I understand his concern, given that many prefer to use a GENERIC kernel and binary updates. Right now there are 29 NFS variables VNET_DEFINED() and several of them are arrays currently sized at 500. One of the reasons for the non-default VNET_NFSD kernel option was the bloat this caused to the vnet. (Chris expressed concern that adding mountd/nfsd to the vnet would result in bloat/overhead in a previous post to this thread.) Another issue with putting all these variables in the vnet is that the nfsd.ko cannot be loaded (it complains the vnet is out of space) and, as such, options NFSD must be used with VNET_NFSD so that the nfsd is linked into the kernel. As such, I am wondering what others think of this alternate plan? - Pull all the VNET_DEFINE()'d variable (except the 3 manipulated by sysctls) into a structure. - Define a single VNET_DEFINE()'d variable that is a pointer to that structure and then malloc() the structure in the function called by VNET_SYSINIT(). This would result in a malloc'd structure for each vnet jail (for kernels built with VMIMAGE), but would only add 4 variables to the vnet. If a small C file that only consists of the VNET_DEFINE()s for the 4 variables is linked into the kernel whenever the VIMAGE option is specified, then I think nfsd.ko would be loadable. Unfortunately, this does not deal with vnet'ng the kgssapi, rpcsec_gss for Kerberized mounts or vnet'ng NFS-over-TLS, but those could be handled in a similar manner, I think? So, what do others think of this alternate plan? rick ps: Every use of the vnet'd variables is currently wrapped in a macro called NFSD_VNET(), so the change is pretty easy to do by just re-writing this macro.
Re: RFC: nfsd in a vnet jail
I think this is worthy of third party testing now. See https://people.freebsd.org/~rmacklem/nfsd-vnet-prison-setup.txt I still haven't tried NFSv3 and I have not ported nfsuserd into the vnet, but most NFSv4 setups don't need it anyhow. Good luck with it if you test it, rick ps: Just replied to a random post for this. On Fri, Dec 2, 2022 at 7:41 AM Olivier Certner wrote: > > To enforce it for cases where mountd/nfsd is not being run would > > definitely be a POLA violation. > > I could not agree more. > > Thanks for the clarification. > > -- > Olivier Certner > > > >
Re: RFC: nfsd in a vnet jail
> To enforce it for cases where mountd/nfsd is not being run would > definitely be a POLA violation. I could not agree more. Thanks for the clarification. -- Olivier Certner
Re: RFC: nfsd in a vnet jail
On Fri, Dec 2, 2022 at 2:03 AM Olivier Certner wrote: > Hi, > > > (snip) > > > > #2 - Require separate file systems and run mountd inside the jail(s). > > > > I think that allowing both alternatives would be too confusing > > and it seems that most want mountd to run within the jail(s). > > As such, unless others prefer #1, I think #2 is the way to go. > > Just to be sure I've understood correctly: You plan to make a separate > filesystem as jail's root a requirement but only in the case of using > mountd(8) in the jail? Or in general? > Certainly not in general. Current plan is for the case of mountd/nfsd. To enforce it for cases where mountd/nfsd is not being run would definitely be a POLA violation. rick > > While I think doing so in the NFSv4/mountd case is indeed a good idea, I > don't > think enforcing it in general is. It would generally degrade the multiple > jails management experience on UFS (in the absence of a volume manager), > where > all jails have roots in the same filesystem (to avoid > allocating/deallocating > space as jails come and go or must be resized). > > Regards. > > -- > Olivier Certner > > >
Re: RFC: nfsd in a vnet jail
On Fri, 02 Dec 2022 11:03:01 +0100 Olivier Certner wrote: > Hi, > > > (snip) > > > > #2 - Require separate file systems and run mountd inside the > > jail(s). > > > > I think that allowing both alternatives would be too confusing > > and it seems that most want mountd to run within the jail(s). > > As such, unless others prefer #1, I think #2 is the way to go. > > Just to be sure I've understood correctly: You plan to make a > separate filesystem as jail's root a requirement but only in the case > of using mountd(8) in the jail? Or in general? > > While I think doing so in the NFSv4/mountd case is indeed a good > idea, I don't think enforcing it in general is. It would generally > degrade the multiple jails management experience on UFS (in the > absence of a volume manager), where all jails have roots in the same > filesystem (to avoid allocating/deallocating space as jails come and > go or must be resized). > Exactly my thoughts. If forced generally, it would mean jails are no longer usable, effectively, for UFS based devices. Or, possibly, 'entry costs' for using jails would be much higher and thus less used. In my eyes, they will be no longer lightweight virtualisation tool, main jail selling point for me. Regards, Milan
Re: RFC: nfsd in a vnet jail
Hi, > (snip) > > #2 - Require separate file systems and run mountd inside the jail(s). > > I think that allowing both alternatives would be too confusing > and it seems that most want mountd to run within the jail(s). > As such, unless others prefer #1, I think #2 is the way to go. Just to be sure I've understood correctly: You plan to make a separate filesystem as jail's root a requirement but only in the case of using mountd(8) in the jail? Or in general? While I think doing so in the NFSv4/mountd case is indeed a good idea, I don't think enforcing it in general is. It would generally degrade the multiple jails management experience on UFS (in the absence of a volume manager), where all jails have roots in the same filesystem (to avoid allocating/deallocating space as jails come and go or must be resized). Regards. -- Olivier Certner
Re: RFC: nfsd in a vnet jail
On 2022-12-01 17:32, Rick Macklem wrote: On Thu, Dec 1, 2022 at 8:23 AM Chris wrote: On 2022-11-29 16:21, Rick Macklem wrote: > On Sun, Nov 27, 2022 at 10:04 AM Peter Eriksson wrote: > >> Keep the global variables as defaults that apply to all nfsds and allow >> (at least some subset) to be overridden inside the net jails if some things >> need to be changed from the defaults? >> >> This is pretty much a reply to one of the posts selected at random, > but I thought that better than starting a new email thread. > > bz@ and asomers@ have both asked about running mountd within a vnet prison > (one via offlist email and the other on phabricator). > > I think it is worth discussing here... > mountd (rightly or wrongly) does two distinctly different things: > 1 - It pushes the exports into the kernel via nmount() so they > can be hung off of the "struct mount" for a file system's > mount point. > --> This can only work for file system mount points and can > only be done once for any given file system mount point. > > At this time, I have it done once globally outside of the prisons. > The alternative I can see is doing it within each prison, but I > think that would require that each prison have its own file system(s). > (ie. The prison's root would always be a file system mount point.) > > 2 - It handles RPC Mount protocol requests from NFSv3 clients. This one > is NFSv3 specific, which is why I have done this NFSv4 only at > this time. To do this, it must be able to register with rpcbind, > and I have no idea if running rpcbind in a vnet jail is practical. > > Enforcing the use for separate file systems for each jail also makes > things safer, since the exports are enforced by the kernel. Without > this, a malicious NFSv4 client could "guess" a file handle for a file > outside the jail and gain access to that file. Put another way, without > a separate file system, there is no way to stop a malicious client from > finding files above the Root file handle. (Normal clients will use > PutRootFH and LookupParent and these won't be able to go above the top > of the jail.) > > So, what do others think of enforcing the requirement that each jail > have its own file systems for this? I don't care for any of it. It looks like additional overhead with the addition of potential security risks. All for a very limited (and as yet unknown) use case. I am thinking that if/when this goes into main, it would be under a new kernel build option called something like NFSD_VIMAGE. I think that would avoid the overhead/security risks for those that do not need/want it. Brilliant. Count me in. :-) --chris rick --chris > > rick > > >> - Peter >> >> >> On Fri, Nov 25, 2022, 4:24 PM Rick Macklem wrote: >> >>> Hi, >>> >>> bz@ has encouraged me to fiddle with the nfsd >>> so that it works in a vnet jail. >>> I have now basically done so, specifically for >>> NFSv4, since NFSv3 presents various issues. >>> >>> What I have not yet done is put global variables >>> in the vnet. This needs to be done so that the nfsd >>> can be run in multiple jail instances and/or in and >>> outside of a jail. >>> The problem is that there are 100s of global variables. >>> >>> I can see two approaches: >>> 1 - Move them all into the vnet jail. This would imply >>> that all the sysctls need to somehow be changed, >>> which would seem to be a POLA violation. >>> It also implies a lot of stuff in the vnet. >>> 2 - Just move the global variables that will always >>> differ from one nfsd to another (this would make >>> the sysctls global and apply to all nfsds). >>> This will keep the number of globals in the vnet >>> smaller. >>> >>> I am currently leaning towards #2, put what do others >>> think? >>> >>> rick >>> ps: Personally, I don't know what use there is of >>> running the nfsd inside a vnet jail, but bz@ has >>> some use case. >>> >> >> 0xBDE49540.asc Description: application/pgp-keys
Re: RFC: nfsd in a vnet jail
On Thu, Dec 1, 2022 at 8:23 AM Chris wrote: > On 2022-11-29 16:21, Rick Macklem wrote: > > On Sun, Nov 27, 2022 at 10:04 AM Peter Eriksson > wrote: > > > >> Keep the global variables as defaults that apply to all nfsds and allow > >> (at least some subset) to be overridden inside the net jails if some > things > >> need to be changed from the defaults? > >> > >> This is pretty much a reply to one of the posts selected at random, > > but I thought that better than starting a new email thread. > > > > bz@ and asomers@ have both asked about running mountd within a vnet > prison > > (one via offlist email and the other on phabricator). > > > > I think it is worth discussing here... > > mountd (rightly or wrongly) does two distinctly different things: > > 1 - It pushes the exports into the kernel via nmount() so they > > can be hung off of the "struct mount" for a file system's > > mount point. > > --> This can only work for file system mount points and can > > only be done once for any given file system mount point. > > > > At this time, I have it done once globally outside of the prisons. > > The alternative I can see is doing it within each prison, but I > > think that would require that each prison have its own file > system(s). > > (ie. The prison's root would always be a file system mount point.) > > > > 2 - It handles RPC Mount protocol requests from NFSv3 clients. This one > > is NFSv3 specific, which is why I have done this NFSv4 only at > > this time. To do this, it must be able to register with rpcbind, > > and I have no idea if running rpcbind in a vnet jail is practical. > > > > Enforcing the use for separate file systems for each jail also makes > > things safer, since the exports are enforced by the kernel. Without > > this, a malicious NFSv4 client could "guess" a file handle for a file > > outside the jail and gain access to that file. Put another way, without > > a separate file system, there is no way to stop a malicious client from > > finding files above the Root file handle. (Normal clients will use > > PutRootFH and LookupParent and these won't be able to go above the top > > of the jail.) > > > > So, what do others think of enforcing the requirement that each jail > > have its own file systems for this? > > I don't care for any of it. It looks like additional overhead with the > addition of potential security risks. All for a very limited (and as yet > unknown) use case. > I am thinking that if/when this goes into main, it would be under a new kernel build option called something like NFSD_VIMAGE. I think that would avoid the overhead/security risks for those that do not need/want it. rick > > --chris > > > > rick > > > > > >> - Peter > >> > >> > >> On Fri, Nov 25, 2022, 4:24 PM Rick Macklem > wrote: > >> > >>> Hi, > >>> > >>> bz@ has encouraged me to fiddle with the nfsd > >>> so that it works in a vnet jail. > >>> I have now basically done so, specifically for > >>> NFSv4, since NFSv3 presents various issues. > >>> > >>> What I have not yet done is put global variables > >>> in the vnet. This needs to be done so that the nfsd > >>> can be run in multiple jail instances and/or in and > >>> outside of a jail. > >>> The problem is that there are 100s of global variables. > >>> > >>> I can see two approaches: > >>> 1 - Move them all into the vnet jail. This would imply > >>> that all the sysctls need to somehow be changed, > >>> which would seem to be a POLA violation. > >>> It also implies a lot of stuff in the vnet. > >>> 2 - Just move the global variables that will always > >>> differ from one nfsd to another (this would make > >>> the sysctls global and apply to all nfsds). > >>> This will keep the number of globals in the vnet > >>> smaller. > >>> > >>> I am currently leaning towards #2, put what do others > >>> think? > >>> > >>> rick > >>> ps: Personally, I don't know what use there is of > >>> running the nfsd inside a vnet jail, but bz@ has > >>> some use case. > >>> > >> > >> >
Re: RFC: nfsd in a vnet jail
On Thu, Dec 1, 2022 at 2:01 AM Milan Obuch wrote: > On Thu, 01 Dec 2022 10:29:25 +0100 > Alexander Leidinger wrote: > > > Quoting Alan Somers (from Tue, 29 Nov 2022 > > 17:28:10 -0700): > > > > > On Tue, Nov 29, 2022 at 5:21 PM Rick Macklem > > > wrote: > > > > >> So, what do others think of enforcing the requirement that each > > >> jail have its own file systems for this? > > > > > > I think that's a totally reasonable requirement. Especially so for > > > ZFS users, who already create a filesystem per jail for other > > > reasons. > > > > While I agree that it is a reasonable requirement, just a note that > > we can not assume that every existing jail resides on its own file > > system. The base system jail infrastructure doesn't check this, and > > the ezjail port doesn't either. The iocage port does it. > > > > My position would be 'recommended, but not forced-to' one. I have > various installations with jails sharing parts of filesystem (like > ports or src tree for development, or even local git repository), or > even running with exactly the same directory as root of number of > jails. Probably not a common scenario for sure, but still useful. > Others indicate they want mountd to run inside the jail. To get that to work, the jail needs to be in a separate file system, since it is the file system(s) mount point(s) that the export information is attached to in the kernel. It comes down to... #1 - Run mountd outside of the jails and encourage use of separate file systems. (Also, since the exports information would be applied to the file system(s) and not the jails, a malicious NFS client could "guess" a file handle and access files outside of the jail. This seems counter to what a jail should provide.) OR #2 - Require separate file systems and run mountd inside the jail(s). I think that allowing both alternatives would be too confusing and it seems that most want mountd to run within the jail(s). As such, unless others prefer #1, I think #2 is the way to go. rick > > Regards, > Milan >
Re: RFC: nfsd in a vnet jail
On Thu, Dec 1, 2022 at 1:29 AM Alexander Leidinger wrote: > > Quoting Alan Somers (from Tue, 29 Nov 2022 > 17:28:10 -0700): > > > On Tue, Nov 29, 2022 at 5:21 PM Rick Macklem > wrote: > > >> So, what do others think of enforcing the requirement that each jail > >> have its own file systems for this? > > > > I think that's a totally reasonable requirement. Especially so for > > ZFS users, who already create a filesystem per jail for other reasons. > > While I agree that it is a reasonable requirement, just a note that we > can not assume that every existing jail resides on its own file > system. The base system jail infrastructure doesn't check this, and > the ezjail port doesn't either. The iocage port does it. > > Is there a way to detect this inside a jail and error out in nfsd/mountd? I think the check (...->pr_root->v_vflag & VV_ROOT) is sufficient. At least it is working for current testing. rick > > Bye, > Alexander. > > -- > http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF > http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF >
Re: RFC: nfsd in a vnet jail
On 2022-12-01 08:37, Alan Somers wrote: I don't care for any of it. It looks like additional overhead with the addition of potential security risks. All for a very limited (and as yet unknown) use case. Here's an example of a real-world use case. I'm responsible for supporting multiple products involving NFS, iSCSI, and other protocols. For security reasons, each product is placed on its own VLAN. Sometimes it's not practical to dedicate a physical server to a single product, so I have to double-up. For the products that don't involve NFS or iSCSI, I place them in a VNET jail. That way their processes can only access the correct VLAN. But NFS and iSCSI can't (yet) be jailed, so those products need to be served by JID 0. Therefore, those products' processes can access each other's VLANs. Clearly that's not ideal. Jailing different products is also good for manageability. It's easier to manage the list of packages that must be installed for each product, config file settings, etc. For example, some of our NFS products require vfs.nfsd.enable_stringtouid=1, but others could work without it. Right now, we're forced to turn it on for all products. OK. I can see that. Assuming I understand your intent correctly, I might choose bhyve(8) for that. Tho that *would* be a little more expensive. I rely on jail(8) daily. They're cheap, fast && easy. As yet, I've never had unreasonable concern for security. The proposed change looks like a potentially high addition of overhead (jail not so cheap now?). RPC && NFS are not cheap && have a comparatively high attack surface. So I guess my concerns/view are affected by this understanding. Thanks for the clarification, Alan. --chris -Alan 0xBDE49540.asc Description: application/pgp-keys
Re: RFC: nfsd in a vnet jail
> I don't care for any of it. It looks like additional overhead with the > addition of potential security risks. All for a very limited (and as yet > unknown) use case. Here's an example of a real-world use case. I'm responsible for supporting multiple products involving NFS, iSCSI, and other protocols. For security reasons, each product is placed on its own VLAN. Sometimes it's not practical to dedicate a physical server to a single product, so I have to double-up. For the products that don't involve NFS or iSCSI, I place them in a VNET jail. That way their processes can only access the correct VLAN. But NFS and iSCSI can't (yet) be jailed, so those products need to be served by JID 0. Therefore, those products' processes can access each other's VLANs. Clearly that's not ideal. Jailing different products is also good for manageability. It's easier to manage the list of packages that must be installed for each product, config file settings, etc. For example, some of our NFS products require vfs.nfsd.enable_stringtouid=1, but others could work without it. Right now, we're forced to turn it on for all products. -Alan
Re: RFC: nfsd in a vnet jail
On 2022-11-29 16:21, Rick Macklem wrote: On Sun, Nov 27, 2022 at 10:04 AM Peter Eriksson wrote: Keep the global variables as defaults that apply to all nfsds and allow (at least some subset) to be overridden inside the net jails if some things need to be changed from the defaults? This is pretty much a reply to one of the posts selected at random, but I thought that better than starting a new email thread. bz@ and asomers@ have both asked about running mountd within a vnet prison (one via offlist email and the other on phabricator). I think it is worth discussing here... mountd (rightly or wrongly) does two distinctly different things: 1 - It pushes the exports into the kernel via nmount() so they can be hung off of the "struct mount" for a file system's mount point. --> This can only work for file system mount points and can only be done once for any given file system mount point. At this time, I have it done once globally outside of the prisons. The alternative I can see is doing it within each prison, but I think that would require that each prison have its own file system(s). (ie. The prison's root would always be a file system mount point.) 2 - It handles RPC Mount protocol requests from NFSv3 clients. This one is NFSv3 specific, which is why I have done this NFSv4 only at this time. To do this, it must be able to register with rpcbind, and I have no idea if running rpcbind in a vnet jail is practical. Enforcing the use for separate file systems for each jail also makes things safer, since the exports are enforced by the kernel. Without this, a malicious NFSv4 client could "guess" a file handle for a file outside the jail and gain access to that file. Put another way, without a separate file system, there is no way to stop a malicious client from finding files above the Root file handle. (Normal clients will use PutRootFH and LookupParent and these won't be able to go above the top of the jail.) So, what do others think of enforcing the requirement that each jail have its own file systems for this? I don't care for any of it. It looks like additional overhead with the addition of potential security risks. All for a very limited (and as yet unknown) use case. --chris rick - Peter On Fri, Nov 25, 2022, 4:24 PM Rick Macklem wrote: Hi, bz@ has encouraged me to fiddle with the nfsd so that it works in a vnet jail. I have now basically done so, specifically for NFSv4, since NFSv3 presents various issues. What I have not yet done is put global variables in the vnet. This needs to be done so that the nfsd can be run in multiple jail instances and/or in and outside of a jail. The problem is that there are 100s of global variables. I can see two approaches: 1 - Move them all into the vnet jail. This would imply that all the sysctls need to somehow be changed, which would seem to be a POLA violation. It also implies a lot of stuff in the vnet. 2 - Just move the global variables that will always differ from one nfsd to another (this would make the sysctls global and apply to all nfsds). This will keep the number of globals in the vnet smaller. I am currently leaning towards #2, put what do others think? rick ps: Personally, I don't know what use there is of running the nfsd inside a vnet jail, but bz@ has some use case. 0xBDE49540.asc Description: application/pgp-keys
Re: RFC: nfsd in a vnet jail
On Thu, Dec 1, 2022 at 2:30 AM Alexander Leidinger wrote: > > Quoting Alan Somers (from Tue, 29 Nov 2022 > 17:28:10 -0700): > > > On Tue, Nov 29, 2022 at 5:21 PM Rick Macklem > wrote: > > >> So, what do others think of enforcing the requirement that each jail > >> have its own file systems for this? > > > > I think that's a totally reasonable requirement. Especially so for > > ZFS users, who already create a filesystem per jail for other reasons. > > While I agree that it is a reasonable requirement, just a note that we > can not assume that every existing jail resides on its own file > system. The base system jail infrastructure doesn't check this, and > the ezjail port doesn't either. The iocage port does it. > I have several jails that all live on the same zfs data set that I setup ages ago before I understood the full benefits of ZFS... but I could migrate in a pinch. But they aren't in their own vnet, so maybe that doesn't apply. > Is there a way to detect this inside a jail and error out in nfsd/mountd? > Whatever we do, there will be people bitten by it, so we need to make the messaging around it good (the error messages from the system, as well as the documentation). Warner
Re: RFC: nfsd in a vnet jail
On Thu, 01 Dec 2022 10:29:25 +0100 Alexander Leidinger wrote: > Quoting Alan Somers (from Tue, 29 Nov 2022 > 17:28:10 -0700): > > > On Tue, Nov 29, 2022 at 5:21 PM Rick Macklem > > wrote: > > >> So, what do others think of enforcing the requirement that each > >> jail have its own file systems for this? > > > > I think that's a totally reasonable requirement. Especially so for > > ZFS users, who already create a filesystem per jail for other > > reasons. > > While I agree that it is a reasonable requirement, just a note that > we can not assume that every existing jail resides on its own file > system. The base system jail infrastructure doesn't check this, and > the ezjail port doesn't either. The iocage port does it. > My position would be 'recommended, but not forced-to' one. I have various installations with jails sharing parts of filesystem (like ports or src tree for development, or even local git repository), or even running with exactly the same directory as root of number of jails. Probably not a common scenario for sure, but still useful. Regards, Milan
Re: RFC: nfsd in a vnet jail
Quoting Alan Somers (from Tue, 29 Nov 2022 17:28:10 -0700): On Tue, Nov 29, 2022 at 5:21 PM Rick Macklem wrote: So, what do others think of enforcing the requirement that each jail have its own file systems for this? I think that's a totally reasonable requirement. Especially so for ZFS users, who already create a filesystem per jail for other reasons. While I agree that it is a reasonable requirement, just a note that we can not assume that every existing jail resides on its own file system. The base system jail infrastructure doesn't check this, and the ezjail port doesn't either. The iocage port does it. Is there a way to detect this inside a jail and error out in nfsd/mountd? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpRjJWWhBIKb.pgp Description: Digitale PGP-Signatur
Re: RFC: nfsd in a vnet jail
On Tue, Nov 29, 2022 at 5:21 PM Rick Macklem wrote: > > > > On Sun, Nov 27, 2022 at 10:04 AM Peter Eriksson wrote: >> >> Keep the global variables as defaults that apply to all nfsds and allow (at >> least some subset) to be overridden inside the net jails if some things need >> to be changed from the defaults? >> > This is pretty much a reply to one of the posts selected at random, > but I thought that better than starting a new email thread. > > bz@ and asomers@ have both asked about running mountd within a vnet prison > (one via offlist email and the other on phabricator). > > I think it is worth discussing here... > mountd (rightly or wrongly) does two distinctly different things: > 1 - It pushes the exports into the kernel via nmount() so they > can be hung off of the "struct mount" for a file system's > mount point. > --> This can only work for file system mount points and can > only be done once for any given file system mount point. > > At this time, I have it done once globally outside of the prisons. > The alternative I can see is doing it within each prison, but I > think that would require that each prison have its own file system(s). > (ie. The prison's root would always be a file system mount point.) > > 2 - It handles RPC Mount protocol requests from NFSv3 clients. This one > is NFSv3 specific, which is why I have done this NFSv4 only at > this time. To do this, it must be able to register with rpcbind, > and I have no idea if running rpcbind in a vnet jail is practical. > > Enforcing the use for separate file systems for each jail also makes > things safer, since the exports are enforced by the kernel. Without > this, a malicious NFSv4 client could "guess" a file handle for a file > outside the jail and gain access to that file. Put another way, without > a separate file system, there is no way to stop a malicious client from > finding files above the Root file handle. (Normal clients will use > PutRootFH and LookupParent and these won't be able to go above the top > of the jail.) > > So, what do others think of enforcing the requirement that each jail > have its own file systems for this? I think that's a totally reasonable requirement. Especially so for ZFS users, who already create a filesystem per jail for other reasons. > > rick > >> >> - Peter >> >> >> On Fri, Nov 25, 2022, 4:24 PM Rick Macklem wrote: >>> >>> Hi, >>> >>> bz@ has encouraged me to fiddle with the nfsd >>> so that it works in a vnet jail. >>> I have now basically done so, specifically for >>> NFSv4, since NFSv3 presents various issues. >>> >>> What I have not yet done is put global variables >>> in the vnet. This needs to be done so that the nfsd >>> can be run in multiple jail instances and/or in and >>> outside of a jail. >>> The problem is that there are 100s of global variables. >>> >>> I can see two approaches: >>> 1 - Move them all into the vnet jail. This would imply >>> that all the sysctls need to somehow be changed, >>> which would seem to be a POLA violation. >>> It also implies a lot of stuff in the vnet. >>> 2 - Just move the global variables that will always >>> differ from one nfsd to another (this would make >>> the sysctls global and apply to all nfsds). >>> This will keep the number of globals in the vnet >>> smaller. >>> >>> I am currently leaning towards #2, put what do others >>> think? >>> >>> rick >>> ps: Personally, I don't know what use there is of >>> running the nfsd inside a vnet jail, but bz@ has >>> some use case. >> >>
Re: RFC: nfsd in a vnet jail
On Sun, Nov 27, 2022 at 10:04 AM Peter Eriksson wrote: > Keep the global variables as defaults that apply to all nfsds and allow > (at least some subset) to be overridden inside the net jails if some things > need to be changed from the defaults? > > This is pretty much a reply to one of the posts selected at random, but I thought that better than starting a new email thread. bz@ and asomers@ have both asked about running mountd within a vnet prison (one via offlist email and the other on phabricator). I think it is worth discussing here... mountd (rightly or wrongly) does two distinctly different things: 1 - It pushes the exports into the kernel via nmount() so they can be hung off of the "struct mount" for a file system's mount point. --> This can only work for file system mount points and can only be done once for any given file system mount point. At this time, I have it done once globally outside of the prisons. The alternative I can see is doing it within each prison, but I think that would require that each prison have its own file system(s). (ie. The prison's root would always be a file system mount point.) 2 - It handles RPC Mount protocol requests from NFSv3 clients. This one is NFSv3 specific, which is why I have done this NFSv4 only at this time. To do this, it must be able to register with rpcbind, and I have no idea if running rpcbind in a vnet jail is practical. Enforcing the use for separate file systems for each jail also makes things safer, since the exports are enforced by the kernel. Without this, a malicious NFSv4 client could "guess" a file handle for a file outside the jail and gain access to that file. Put another way, without a separate file system, there is no way to stop a malicious client from finding files above the Root file handle. (Normal clients will use PutRootFH and LookupParent and these won't be able to go above the top of the jail.) So, what do others think of enforcing the requirement that each jail have its own file systems for this? rick > - Peter > > > On Fri, Nov 25, 2022, 4:24 PM Rick Macklem wrote: > >> Hi, >> >> bz@ has encouraged me to fiddle with the nfsd >> so that it works in a vnet jail. >> I have now basically done so, specifically for >> NFSv4, since NFSv3 presents various issues. >> >> What I have not yet done is put global variables >> in the vnet. This needs to be done so that the nfsd >> can be run in multiple jail instances and/or in and >> outside of a jail. >> The problem is that there are 100s of global variables. >> >> I can see two approaches: >> 1 - Move them all into the vnet jail. This would imply >> that all the sysctls need to somehow be changed, >> which would seem to be a POLA violation. >> It also implies a lot of stuff in the vnet. >> 2 - Just move the global variables that will always >> differ from one nfsd to another (this would make >> the sysctls global and apply to all nfsds). >> This will keep the number of globals in the vnet >> smaller. >> >> I am currently leaning towards #2, put what do others >> think? >> >> rick >> ps: Personally, I don't know what use there is of >> running the nfsd inside a vnet jail, but bz@ has >> some use case. >> > >
Re: RFC: nfsd in a vnet jail
On Fri, Nov 25, 2022 at 9:06 PM Alan Somers wrote: > > > On Fri, Nov 25, 2022, 4:24 PM Rick Macklem wrote: > >> Hi, >> >> bz@ has encouraged me to fiddle with the nfsd >> so that it works in a vnet jail. >> I have now basically done so, specifically for >> NFSv4, since NFSv3 presents various issues. >> >> What I have not yet done is put global variables >> in the vnet. This needs to be done so that the nfsd >> can be run in multiple jail instances and/or in and >> outside of a jail. >> The problem is that there are 100s of global variables. >> >> I can see two approaches: >> 1 - Move them all into the vnet jail. This would imply >> that all the sysctls need to somehow be changed, >> which would seem to be a POLA violation. >> It also implies a lot of stuff in the vnet. >> 2 - Just move the global variables that will always >> differ from one nfsd to another (this would make >> the sysctls global and apply to all nfsds). >> This will keep the number of globals in the vnet >> smaller. >> >> I am currently leaning towards #2, put what do others >> think? >> >> rick >> ps: Personally, I don't know what use there is of >> running the nfsd inside a vnet jail, but bz@ has >> some use case. >> > > This is super-awesome! Thank you so much! I've got a use case too. I > think it would be fine to leave most of the settings global, like > max_threads. But we should probably decide on a case by case basis . > The minthreads, maxthreads happen to be handled via nfsd command line options, so the sysctls are not needed and they can be set per-prison. Most of the sysctls are for weird cases or tuning of the DRC. Since the DRC is only used for NFSv4.0 mounts and not NFSv4.1 or NFSv4.2 ones, tuning the DRC should not usually be necessary. I have left them global for now. If anyone identifies one that needs to be set per-prison, I can move it into the vnet. If you want to see them all: # sysctl -a | fgrep vfs.nfsd I have put a first patch up on phabricator as D37519. Although I listed three people as reviewers, anyone is welcome to test/comment/review. If you can't easily get the patch from phabricator, just email me and I'll send it to you. I think it will apply cleanly to main and, maybe, stable/13. You only need to build a kernel from patched sources to test it. There is a change to rc.d/nfsd, which you only need in the prison's etc/rc.d/nfsd. A very basic setup document (also definitely a work in progress) can be found at... https://people.freebsd.org/~rmacklem/nfsd-vnet-prison-setup.txt Let me know if you test it or have other suggestions, rick ps: Thanks everyone for your comments. If I have specific questions related to them, I'll post. Otherwise I am digesting them.
Re: RFC: nfsd in a vnet jail
On 11/27/22 11:13 AM, Bjoern A. Zeeb wrote: On Sun, 27 Nov 2022, James Gritton wrote: On 2022-11-25 15:17, Rick Macklem wrote: Hi, bz@ has encouraged me to fiddle with the nfsd so that it works in a vnet jail. I have now basically done so, specifically for NFSv4, since NFSv3 presents various issues. What I have not yet done is put global variables in the vnet. This needs to be done so that the nfsd can be run in multiple jail instances and/or in and outside of a jail. The problem is that there are 100s of global variables. I can see two approaches: 1 - Move them all into the vnet jail. This would imply that all the sysctls need to somehow be changed, which would seem to be a POLA violation. Not a POLA. The sysctl (names) don't change. Just the values are duplicated per-jail. As Marko and I (mostly Marko) were assigning different variable and sysctls when Vnet first hit, it was generally pretty obvious which were local and which were global. The sysctls values need to mean the same thing in the jail as they do in an unjailed system, so the sysctl names don't change, just the value reported changes. Some systls become read-only or invisible in jails. Sometimes this takes adding some boilerplate virtualization code to each sysctl howeve there is already some inbuilt support in the SYSCTL framework for VNET isolation. You just need to set the CTLFLAG_VNET bit in the sysctl definition. I agree with what Bjorn says below. There is a slight added complication in that it is not just vnet but jailing as well that needs to be take into account because vnet doesn't affect VFS, but jails do. It also implies a lot of stuff in the vnet. 2 - Just move the global variables that will always differ from one nfsd to another (this would make the sysctls global and apply to all nfsds). This will keep the number of globals in the vnet smaller. I am currently leaning towards #2, put what do others think? rick ps: Personally, I don't know what use there is of running the nfsd inside a vnet jail, but bz@ has some use case. I would prefer closer to #2, unless you want to support only one jail running nfsd (which is admittedly one of the more likely scenarios). I imagine it's a case-by-case judgement call, as to whether a particular knob should be global or per-jail. I think the call is: everything that if changed in a vnet jail that could cause the entire system to be DoSed by changing the setting in the jail defintitvely stays global. Everything which needs to be writeable on a per-instance base probably needs to be virtualised. My main concern with virtualising the variables will be early boot and and NFSROOT szenarios that will need access to them early on before the virtual network stacks are properly initialized; I can help sorting that out if my concerns become real. Most probably was sorted before for NFSROOT with the IP stack so it's likely an okay job now. Also given I have once done it before for another subsystem; we could think of a V_FS bit (and I write it that way to not confuse it with VFS). Now NFS sits somewhere between FS and NET so I am not surt where it should belong should we ponder that route as yet another option (and if we think that VNETs will be way too big one day? -- there's probably a lot other fish still to fry though some of that has been burnt in the past already). I think I have another email or two on the subject (possibly privately); sorry Rick that I haven't gotten to them more timely. I'll have a look later tonight. /bz
Re: RFC: nfsd in a vnet jail
On Sun, 27 Nov 2022, James Gritton wrote: On 2022-11-25 15:17, Rick Macklem wrote: Hi, bz@ has encouraged me to fiddle with the nfsd so that it works in a vnet jail. I have now basically done so, specifically for NFSv4, since NFSv3 presents various issues. What I have not yet done is put global variables in the vnet. This needs to be done so that the nfsd can be run in multiple jail instances and/or in and outside of a jail. The problem is that there are 100s of global variables. I can see two approaches: 1 - Move them all into the vnet jail. This would imply that all the sysctls need to somehow be changed, which would seem to be a POLA violation. Not a POLA. The sysctl (names) don't change. Just the values are duplicated per-jail. It also implies a lot of stuff in the vnet. 2 - Just move the global variables that will always differ from one nfsd to another (this would make the sysctls global and apply to all nfsds). This will keep the number of globals in the vnet smaller. I am currently leaning towards #2, put what do others think? rick ps: Personally, I don't know what use there is of running the nfsd inside a vnet jail, but bz@ has some use case. I would prefer closer to #2, unless you want to support only one jail running nfsd (which is admittedly one of the more likely scenarios). I imagine it's a case-by-case judgement call, as to whether a particular knob should be global or per-jail. I think the call is: everything that if changed in a vnet jail that could cause the entire system to be DoSed by changing the setting in the jail defintitvely stays global. Everything which needs to be writeable on a per-instance base probably needs to be virtualised. My main concern with virtualising the variables will be early boot and and NFSROOT szenarios that will need access to them early on before the virtual network stacks are properly initialized; I can help sorting that out if my concerns become real. Most probably was sorted before for NFSROOT with the IP stack so it's likely an okay job now. Also given I have once done it before for another subsystem; we could think of a V_FS bit (and I write it that way to not confuse it with VFS). Now NFS sits somewhere between FS and NET so I am not surt where it should belong should we ponder that route as yet another option (and if we think that VNETs will be way too big one day? -- there's probably a lot other fish still to fry though some of that has been burnt in the past already). I think I have another email or two on the subject (possibly privately); sorry Rick that I haven't gotten to them more timely. I'll have a look later tonight. /bz -- Bjoern A. Zeeb r15:7
Re: RFC: nfsd in a vnet jail
On 2022-11-25 15:17, Rick Macklem wrote: Hi, bz@ has encouraged me to fiddle with the nfsd so that it works in a vnet jail. I have now basically done so, specifically for NFSv4, since NFSv3 presents various issues. What I have not yet done is put global variables in the vnet. This needs to be done so that the nfsd can be run in multiple jail instances and/or in and outside of a jail. The problem is that there are 100s of global variables. I can see two approaches: 1 - Move them all into the vnet jail. This would imply that all the sysctls need to somehow be changed, which would seem to be a POLA violation. It also implies a lot of stuff in the vnet. 2 - Just move the global variables that will always differ from one nfsd to another (this would make the sysctls global and apply to all nfsds). This will keep the number of globals in the vnet smaller. I am currently leaning towards #2, put what do others think? rick ps: Personally, I don't know what use there is of running the nfsd inside a vnet jail, but bz@ has some use case. I would prefer closer to #2, unless you want to support only one jail running nfsd (which is admittedly one of the more likely scenarios). I imagine it's a case-by-case judgement call, as to whether a particular knob should be global or per-jail. - Jamie
Re: RFC: nfsd in a vnet jail
Keep the global variables as defaults that apply to all nfsds and allow (at least some subset) to be overridden inside the net jails if some things need to be changed from the defaults? - Peter On Fri, Nov 25, 2022, 4:24 PM Rick Macklem mailto:rick.mack...@gmail.com>> wrote: > Hi, > > bz@ has encouraged me to fiddle with the nfsd > so that it works in a vnet jail. > I have now basically done so, specifically for > NFSv4, since NFSv3 presents various issues. > > What I have not yet done is put global variables > in the vnet. This needs to be done so that the nfsd > can be run in multiple jail instances and/or in and > outside of a jail. > The problem is that there are 100s of global variables. > > I can see two approaches: > 1 - Move them all into the vnet jail. This would imply > that all the sysctls need to somehow be changed, > which would seem to be a POLA violation. > It also implies a lot of stuff in the vnet. > 2 - Just move the global variables that will always > differ from one nfsd to another (this would make > the sysctls global and apply to all nfsds). > This will keep the number of globals in the vnet > smaller. > > I am currently leaning towards #2, put what do others > think? > > rick > ps: Personally, I don't know what use there is of > running the nfsd inside a vnet jail, but bz@ has > some use case.
Re: RFC: nfsd in a vnet jail
On Fri, Nov 25, 2022, 4:24 PM Rick Macklem wrote: > Hi, > > bz@ has encouraged me to fiddle with the nfsd > so that it works in a vnet jail. > I have now basically done so, specifically for > NFSv4, since NFSv3 presents various issues. > > What I have not yet done is put global variables > in the vnet. This needs to be done so that the nfsd > can be run in multiple jail instances and/or in and > outside of a jail. > The problem is that there are 100s of global variables. > > I can see two approaches: > 1 - Move them all into the vnet jail. This would imply > that all the sysctls need to somehow be changed, > which would seem to be a POLA violation. > It also implies a lot of stuff in the vnet. > 2 - Just move the global variables that will always > differ from one nfsd to another (this would make > the sysctls global and apply to all nfsds). > This will keep the number of globals in the vnet > smaller. > > I am currently leaning towards #2, put what do others > think? > > rick > ps: Personally, I don't know what use there is of > running the nfsd inside a vnet jail, but bz@ has > some use case. > This is super-awesome! Thank you so much! I've got a use case too. I think it would be fine to leave most of the settings global, like max_threads. But we should probably decide on a case by case basis . > >
RFC: nfsd in a vnet jail
Hi, bz@ has encouraged me to fiddle with the nfsd so that it works in a vnet jail. I have now basically done so, specifically for NFSv4, since NFSv3 presents various issues. What I have not yet done is put global variables in the vnet. This needs to be done so that the nfsd can be run in multiple jail instances and/or in and outside of a jail. The problem is that there are 100s of global variables. I can see two approaches: 1 - Move them all into the vnet jail. This would imply that all the sysctls need to somehow be changed, which would seem to be a POLA violation. It also implies a lot of stuff in the vnet. 2 - Just move the global variables that will always differ from one nfsd to another (this would make the sysctls global and apply to all nfsds). This will keep the number of globals in the vnet smaller. I am currently leaning towards #2, put what do others think? rick ps: Personally, I don't know what use there is of running the nfsd inside a vnet jail, but bz@ has some use case.