On 02/12/2016 10:34 PM, Lennart Poettering wrote: > On Fri, 12.02.16 17:49, Simon McVittie (simon.mcvit...@collabora.co.uk) wrote: > >> On 11/02/16 17:06, Lennart Poettering wrote: >>> 5) Here's the controversial one I think: support for booting up >>> without /var. We have kludges at quite a few places because we >>> cannot access /var early during boot. >> >> I don't think /var is really the same thing as /usr: for a start, it has >> to be read/write, whereas /usr and / can be read-only for at least the >> early stages of boot. >> >> On stateless systems with a read-only / and /etc, requiring /var to be >> mounted from the initramfs would mean that the mechanism for setting up >> /var (NFS or tmpfs or whatever) would have to move into the >> initramfs. > > Since initrds tend to cover root-on-nfs, root-on-iscsi and so on > anyway, that sounds like no change in behaviour really..
Well, kind-of. The root-on-nfs and root-on-iscsi are dumbed-down versions of what's possible once a system is booted. iSCSI: currently the rootfs works fine, because for the rootfs one can easily tell the initramfs implementation explicitly that it's on iSCSI. If your rootfs is on network storage, you have to do so anyway, so that's not an issue. But there's no way to determine *just* from looking at /etc/fstab that a given file system is on iSCSI (or nbd for that matter), because those just look like regular SCSI block devices (which don't exist yet if the initramfs hasn't logged in to the iSCSI session). This is already somewhat problematic for /usr, but since I've never seen a setup where people put /usr on iSCSI but / not, so this was never a huge issue in that regard. On the other hand, what I have seen in practice are systems with /var/log on iSCSI. Also, if you look at how iSCSI login in initramfs works currently, it's basically just running a binary called 'iscsistart' that tells the kernel to log in to a specific session where the rootfs is on, the real daemon isn't started yet. So only a specific session that is configured separately (!) from all the other configured sessions is logged into in the initramfs - and the daemon that reads the proper configuration is only started after the system has booted. So in order to support generic filesystems on iSCSI in initramfs, one would need to start the full daemon already in the initramfs, plus the full configuration database must be available in the initramfs (which can change with just some admin commands, after which the admin would need to remember to regenerate the initramfs image), and the daemon would need to be modified to support that. NFS: nfsroot is supported only for NFSv2/3 and (depending on the initramfs implementation) in NFSv4 with sec=sys without idmapping. If you need NFSv4 with idmapping or want to actually have a secure NFS mount (e.g. encrypted + authenticated via Kerberos), that currently doesn't work at all from the initramfs. idmapping requires that request-key works within the initramfs and properly calls the nfsidmap binary, which will in turn usually require the full NSS stack of the system to be available. For Kerberos you need rpc.svcgssd to be running, as well as have a program like k5start running to get a ticket for the root user, otherwise the file system is inaccessible on a kernel level. (And Kerberos also requires idmapping btw.) Also, in contrast to e.g. iSCSI, where you could probably get away with killing the daemon before switching to the rootfs, and then restarting the daemon, both the idmapping binaries and the rpc.svcgssd have to remain available, (the former as an upcall from the kernel, the latter as a running daemon), otherwise the kernel won't be able to properly handle the filesystem. And NFS and iSCSI are just two things I have quite a bit of experience with. You could also imagine that people put /var/log on sshfs, or any other FUSE filesystem for that matter, which as of now works, but will break if you introduce the change, because the vast majority of FUSE filesystems (if any at all) support running from initramfs. Or you could have /var/log as a bind mount of a directory within an OCFS2 filesystem on a multi-master DRBD. It's not that difficult to set up on a normal system, but good luck getting that to work from an initramfs. Of course, it's not impossible to make all these setups work. But it would require changes to a lot of other software that's currently used, which are probably going to be relatively painful and it's going to be a lot of work for a lot of other people. The maintenance burden in systemd for buffering things in /run before /var, /var/log, etc. are available is minuscule compared to that amount of pain this change would cause other people. Which in turn means what would more likely happen is that this would not be implemented in many cases and then once the version of systemd with this requirement hits distributions, this would break users' systems without them being able to run their setup as designed. I think that would be really bad. Note that this is different from /usr: not-mounted /usr was already broken beforehand, which /var currently isn't. A lot of the scenarios I've described above haven't worked for /usr beforehand anyway (e.g. I haven't seen a single distribution that didn't have Kerberos stuff in /usr even before any UsrMerge so that /usr via Kerberized NFSv4 wasn't possible anyway) and so there were already many, many more constraints for /usr, so that the breakage in that case was quite limited. Also, in contrast to /usr, where a merged /usr actually has very real advantages, such as enabling stateless systems, I don't see any advantages for /var here, other than making systemd simpler in a very minuscule way. I don't think that trade-off is warranted. Regards, Christian PS: Btw. if you do run journald already in initramfs (which I think is a good thing to have), then it still needs to have code to flush /run/log/journal to /var/log/journal. So in that case you don't actually gain anything.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel