I haven’t chimed in on previous iterations of this topic as I was still coming to grips with understanding the bigger picture. I’ve spent a great deal of time recently trying to execute every combination of rump kernels on most of the target platforms (haven’t done much with ISO yet but I have with all the others) so hopefully I can contribute.
#### Beginners perspective From the perspective of the person new to the project. There is a significant mental model to form of what the rump project is and how all the pieces fit together. It’s only recently really become clear to me that the primary functionality builds a kernel, the secondary functionality launches the kernel in a specific environment, and that currently these things are somewhat combined. Because all this functionality is integrated, it’s alot harder to grasp for someone new to the project - I’m not sure if that matters. It would be easier to understand if it was divided into one kernel build and many scripts for run, one for each run target. Then for example if I was interested in running on EC2 I would need understand only the “build-kernel and run-on-ec2” scripts and documentation. ####Core kernel build functionality - has an *inward* perspective - builds a unikernel and deals only with the inner workings of the unikernel. Some ideas (not maybe some of which are current behaviour?): ** Use initramfs / initrd or similar for built in rootfs https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt ** initramfs loaded by grub along with kernel at boot ** Kernels are diskless - no requirement for disk beyond initrd rootfs ** Kernel automounts all attached block volumes by default unless overridden by json.cfg ** Kernel configures itself with DHCP if available unless overridden with specific network configuration in json.cfg ** json.cfg file is stored in initial ramfs, or may be overridden by being passed in text of json.cfg as a kernel argument at launch Build process produces three files: ** (optional) json.cfg ** kernel ** initial ramdisk containing rootfs and default json.cfg json.cfg: ** defines configuration that happens INSIDE the unikernel ** does not attempt to specify anything about what configuration happens OUTSIDE the unikernel ** specifies names of required block devices and their mountpoints ** specifies network device configuration ** other things kernel needs to configure? ** json.cfg not needed at all for example in a diskless config where there a no block devices to mount, and where only DHCP is required Where possible and where is makes sense, the build process provides EXTERNAL VM host configuration information somehow, suggesting for example the command line required to boot with the various target platforms. Maybe this is just text or some sort of configuration json structure. Boot process: Grub starts, then: config.json —> unikernel <— initrd I’ll give away my lack of understanding of the technical details here, but if there was any way to not need to specify the rumpbake target that would lead to there only ever being one sort of rump unikernel. The rumpbake process means that there are four different incompatible rump unikernels and no easy way to tell one from another. Maybe this is not practical but it would be a good thing if there was just one single type of rump kernel created. #### Separate launch scripts for execution on: Simple solutions for common use cases: Xen XVM/QEMU Build ISO EC2 PV EC2 HVM Other cloud targets ####The hardest bit (for me) The hardest bit currently is trying to understand network device naming and what names are needed by what target and how to map the kernel’s naming to the external name. I’m still trying to figure this out. Not sure if this needs code or just written explanation. Feels a bit like black magic at the moment to work our network device names. I need to dive into the code to try to fully understand this area. ####To present the above wall of text as responses to Antti’s questions: >> 1: a distributable format which does not require the toolchain to launch >> (what rumpbake currently gives you) ** A multiboot kernel is a good well defined distributable format. Ideally only one type of kernel, not different types for xen_pv, xen_pci, hw_generic, hw_virtio (perhaps foolish to suggest but I don’t know enough about the technical reason for needing different kernels for each). It would be great if there was only one single type of rump kernel. >> 2: a mechanism to configure the runtime behaviour of the distributable >> format (what the rumprun tool currently does) ** json.cfg stored on initrd/initramfs, which can be overridden or initially provided as json.cfg passed in as a kernel argument at boot time >> 3: a mechanism to easily launch the result of 1+2 *where available* (what >> the rumprun tool does for xen+kvm+qemu) ** separate launch scripts, one for each target. goal here is just to be helpful in common launch cases, not to provide a universally applicable solution to all launch configuration requirements. >> 4: a mechanism to "specialize" the distributed format, e.g. "baked in root >> files" or even including "2" (to enable running without block storage) ** initrd/initramfs provided as part of boot process to solve the “baked in root files" ** Kernel able to run diskless without block storage would be great and would simplify things greatly in some configurations ** mounting all found block devices by default would be great, optionally turned off in json.cfg >> There are actually 2 different configs, the Rumprun runtime config and the >> config of whatever you launch on. We can't always control the latter from >> software, consider e.g. the case where you're launching on an embedded >> system, so this bit is only about the Rumprun config. We now use a json >> config. I'm thinking that the best option is to not provide a config >> generating tool at all, simply polish the json format spec and be happy with >> it. ** Agreed. json.cfg should deal only with the internal configuration of the kernel and specify nothing about external configuration as required by the virtual host. >>Then, "3" or launching. Now, there needs to be a congruence between your >>config from "2" is and what you launch on. The current rumprun tool sort of >>attempts to help you there, but with anything except a trivial setup you need >>to know what you're doing anyway, e.g. "-I ,,'-net tap,ifname=tap0" >>constructs and so forth. So the tool is not *really* catching you because it >>cannot read your mind. Do we need a quasi-abstracting tool? I'm going to >>say "no". The real eye-opener here for me was working on EC2. We simply >>have no way to sensibly abstract the 3 billion toggles available via EC2. >>Even for trivial systems like xl.conf (when compared to EC2), if you know >>xl.conf you'll just have to learn a second syntax to do what you already >>know. If you don't know xl.conf, we can provide an example or two. Not sure >>if we should provide some case-specific helper scripts, though nothing which >>pretends that everything is the same, hides power and throws off people who'd >>know how to use the relevant backend tools. ** This is spot on. Simple launch scripts and examples for common cases without attempting to build universally functional vm launcher scripts. as
