Re: Creating a hackable kernel for AMD64
On Tue, 23 Apr 2024, Taylor R Campbell wrote: Cool, thanks! The main thing you're trying to do -- pass a kernel on the command line to qemu -- doesn't work on x86. It does with these patches: https://github.com/NetBSDfr/NetBSD-src/tree/nbfr_master They need to be reviewed before I merge them into current. Cheers, Emile `iMil' Heitor | https://imil.net
Re: [GSOC][NEWBIE]
On 3/28/24 17:44, Jamgadepatil Shivraj Shashikant wrote: I even tried gziped kernel image stored in releasedir directory, so I guess it's not only PVH boot that was causing the problem. I suspected I might have built the kernel incorrectly, so I also tried booting from the daily kernel build provided on the github but the kvm error persists. qemu command I am using for booting the system is: qemu-system-x86_64 -m 2048 -kernel netbsd-GENERIC.gz -drive if=virtio,file=disk.qcow2,format=qcow2 -enable-kvm Don't gzip the kernel, here's an example session: $ git branch * perf $ ./build.sh -U -u -j4 -T obj/tooldir -m amd64 tools $ ./build.sh -U -u -j4 -T obj/tooldir -m amd64 kernel=MICROVM $ KERNEL=sys/arch/amd64/compile/obj/MICROVM/netbsd $ IMG=NetBSD-10.99.10-amd64-live.img # fetch and gunzip this from https://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/202403200640Z/images/NetBSD-10.99. 10-amd64-live.img.gz $ qemu-system-x86_64 \ -M microvm,x-option-roms=off,rtc=on,acpi=off,pic=off \ -enable-kvm -m 512 -cpu host,+invtsc \ -append "root=ld0a console=com rw -v" -display none \ -device virtio-blk-device,drive=hd0 \ -drive file=${IMG},format=raw,id=hd0 \ -netdev user,id=net0,hostfwd=tcp::10022-:22 \ -device virtio-net-device,netdev=net0 \ -kernel ${KERNEL} -serial mon:stdio FYI I'll be mostly AFK till next Tuesday. Cheers, -- -------- Emile `iMil' Heitor | https://imil.net
Re: [GSOC][NEWBIE]
Hi Shivraj, On 3/27/24 18:27, Jamgadepatil Shivraj Shashikant wrote: I am using Linux as my host OS and have cross-compiled the NetBSD kernel but loading the custom kernel results in a KVM emulation error. I am If you seek to run a NetBSD/amd64 kernel with the -kernel qemu/kvm option, I suggest you give a try to this development branch: https://github.com/NetBSDfr/NetBSD-src/tree/perf Official NetBSD source tree doesn't have PVH boot merged yet, this branch has it, along with performance improvements. Cheers, -- Emile `iMil' Heitor | https://imil.net
NVMM sync up from DragonFlyBSD
I've synced up our NVMM code (kmod, lib and tool) with its current state in DragonFlyBSD; if you're using NVMM on NetBSD as your hypervisor you might want to give it a try https://github.com/NetBSD/src/compare/trunk...NetBSDfr:NetBSD-src:nvmm I've also added a vmware compatible CPU frequency cpuid leaf which can be used to get CPU frequency from the host instead of doing spending 100ms in DELAY(). Qemu knows hot to expose it via the -cpu host,+invtsc flag. -- Emile `iMil' Heitor | https://imil.net
NetBSD/amd64 current performance patch
ernel and minimal root disk at https://imil.net/NetBSD/netbsd-perf and https://imil.net/NetBSD/disk.img Feedback as always very welcome. Cheers, [1]: https://www.qemu.org/docs/master/system/i386/microvm.html -- Emile `iMil' Heitor | https://imil.net
Re: *oldlenp comes back with wrong value in helper sysctl_createv() function
case CTLTYPE_STRING: { unsigned char buf[1024], *tbuf; tbuf = buf; sz = sizeof(buf); rc = prog_sysctl([0], namelen, tbuf, , NULL, 0); The sysctl command first tries with a buffer of 1024 bytes and retries with the right size when that was too small. Got it, makes sense. Thanks for the reply! -- Emile `iMil' Heitor | https://imil.net
Re: *oldlenp comes back with wrong value in helper sysctl_createv() function
The helper function produces the value that is returned in *oldlenp. If you happen to use CTL_DESCRIBE (e.g. running sysctl -d), it's not your helper function being called but the sysctl_describe helper that returns a value of 1024. Maybe you can show your helper routine and how you call sysctl ? sure, I call sysctl this way: sysctl debug.tslog here's the helper function code: static int sysctl_debug_tslog(SYSCTLFN_ARGS) { char buf[LINE_MAX] = ""; char *where = oldp; size_t slen, i, limit; int error = 0; static size_t needed = 0; /* Come back with the right size */ /* here at second call, *oldlenp == 1024 */ if (*oldlenp < needed) { *oldlenp = needed; return 0; /* will be back with correct size */ } /* Add data logged within the kernel. */ limit = MIN(nrecs, nitems(timestamps)); for (i = 0; i < limit; i++) { snprintf(buf, LINE_MAX, "0x%x %llu", timestamps[i].lid, (unsigned long long)timestamps[i].tsc); switch (timestamps[i].type) { case TS_ENTER: strcat(buf, " ENTER"); break; case TS_EXIT: strcat(buf, " EXIT"); break; case TS_THREAD: strcat(buf, " THREAD"); break; case TS_EVENT: strcat(buf, " EVENT"); break; } snprintf(buf, LINE_MAX, "%s %s", buf, timestamps[i].f ? timestamps[i].f : "(null)"); if (timestamps[i].s) snprintf(buf, LINE_MAX, "%s %s\n", buf, timestamps[i].s); else strcat(buf, "\n"); slen = strlen(buf) + 1; if (where == NULL) /* 1st pass, calculate needed */ needed += slen; else { if (i > 0) where--; /* overwrite last \0 */ if ((error = copyout(buf, where, slen))) break; where += slen; } } /* Come back with an address */ if (oldp == NULL) *oldlenp = needed; return error; } Here's the setup: SYSCTL_SETUP(sysctl_tslog_setup, "tslog sysctl") { sysctl_createv(NULL, 0, NULL, NULL, CTLFLAG_PERMANENT|CTLFLAG_READONLY, CTLTYPE_STRING, "tslog", SYSCTL_DESCR("Dump recorded event timestamps"), sysctl_debug_tslog, 0, NULL, 0, CTL_DEBUG, CTL_CREATE, CTL_EOL); } -- Emile `iMil' Heitor | https://imil.net
*oldlenp comes back with wrong value in helper sysctl_createv() function
I am creating a sysctl entry for TSLOG https://man.freebsd.org/cgi/man.cgi?query=tslog=4=freebsd-release In every example I see in our source tree, I read that when oldp is NULL, *oldlenp should be set to desired size, return 0, and the sysctl_createv() helper function will be again called with the right *oldlenp value. Except it does not, the first time it calls back the helper function, *oldlenp value is 1024 no matter what I set it to before. But if I return once again (either with ENOMEM or 0, doesn't matter), the helper function will now be called with the right *oldlenp value. I fail to see where this behavior comes from and I'd like to understand the logic behind it. -- Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
On 1/8/24 23:05, Manuel Bouyer wrote: in consinit.c you have: +#if defined(XENPVHVM) || defined(GENPVH) +#ifndef GENPVH if (vm_guest == VM_GUEST_XENPVH) { if (xen_pvh_consinit() != 0) return; /* fallback to native console selection, usefull for dom0 PVH */ } +#endif shouldn't the #ifndef GENPVH really be #ifdef XENPVHVM ? oh absolutely In the same way, the #ifndef GENPVH in xen_machdep.c should either be #ifdef XENPVHVM or #ifdef XEN Indeed. I've changed those thank you for the review! -- Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
This morning I was given the idea of having the possibility to build a Xen-free kernel but still GENPVH capable. This doesn't impact GENERIC which is still able to boot both Xen and GENPVH with the following configuration: options XENPVHVM options XEN hypervisor* at mainbus? # Xen hypervisor xenbus* at hypervisor? # Xen virtual bus xencons*at hypervisor? # Xen virtual console ... Now for GENPVH only we would have a unique kernel configuration option: options GENPVH The only drawback I see is that it adds quite some ifn?def's GENPVH. Here's the patch: https://imil.net/NetBSD/noxen.patch Does this look reasonable to you? -- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
Here's the full patch for GENPVH+cmdline MMIO+Firecracker support https://imil.net/NetBSD/genpvh+mmio.patch This patch includes: - generic PVH boot for qemu, Firecracker etc... see the "PVH boot with qemu" thread for details - a new bus, pvbus, used to attach devices that could be used in more than one hypervisor, such as the cmdline MMIO device or pvclock (not ported yet) - MMIO cmdline support, used by qemu and FC to advertise virtio devices memory mappings to guests - a "fix" for MP tables, or like Colin says "bug for bug compatibility", needed to detect IOAPIC on guests with no ACPI - a fix for ld_virtio block device enabling FC to use it The patch applies on the latest current. kernel configuration: pv* at pvbus? virtio* at pv? ld* at virtio? vioif* at virtio? viornd* at virtio? This is how fast this thing boots: https://twitter.com/iMilnb/status/1743576957046931647 Feedback appreciated. Cheers, -- ------------ Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
On 1/6/24 08:33, Emile 'iMil' Heitor wrote: To support this, I've attached `virtio*` to a newly created bus, `cmdlinebus?` which is attached to mainbus in amd64_mainbus.c I am not entirely sure this is the cleanest way to proceed and would like to have your feedback. I like the design of OpenBSD's pvbus, it would be well fitted for this and more https://github.com/openbsd/src/tree/master/sys/dev/pv Thoughts? -- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
Update My NetBSD/amd64 branch can now boot using PVH on qemu/x86_64, qemu/microvm and Firecracker using MMIO backed block devices. It also supports multiple VirtIO devices such as NICs. The last blocking point was that Firecracker only supports one data segment and as such needs DMA bouncing which was not enabled for that case (thanks Jason for the hint!). Now before opening a PR I'd like your advice on the "cmdline bus" part. As I explained earlier in this thread, without PCI or ACPI, MMIO devices are "detected" through a kernel parameter, for example: root=ld0a console=com rw -v virtio_mmio.device=512@0xfeb00e00:12 virtio_mmio.device=512@0xfeb00c00:11 To support this, I've attached `virtio*` to a newly created bus, `cmdlinebus?` which is attached to mainbus in amd64_mainbus.c I am not entirely sure this is the cleanest way to proceed and would like to have your feedback. The working branch is here: https://github.com/NetBSDfr/NetBSD-src/tree/mmio_cmdline Cheers, -- -------- Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
Here's a final patch https://imil.net/NetBSD/GENPVH.patch It implements PVH boot from both qemu with the -kernel flag and Firecracker with Colin Perceval's PVH patches https://github.com/firecracker-microvm/firecracker/tree/feature/pvh The patch should apply on current as-is https://github.com/NetBSD/src/compare/trunk...NetBSDfr:NetBSD-src:GENPVH I've already explained the rationale in this thread, this iteration is mostly refinements from Gregory, especially regarding the assembly part. Like previously asked, it works on GENERIC wihout modfication of kernel's configuration. Can we have a review before I do a PR? Thanks, Emile `iMil' Heitor | https://imil.net
Re: pvclock (kvm_clock) support: where to attach
On 12/31/23 15:17, Taylor R Campbell wrote: https://nxr.netbsd.org/xref/src/sys/arch/xen/xen/xen_clock.c oh. Unless there's a compelling reason that the pvclock and xenclock interfaces are different enough to warrant having multiple copies of the logic in src, I think we should adapt the existing code to work in both settings. I put a lot of work into the xen_clock.c driver to record useful diagnostics about when the host's time is not behaving right (vs when NetBSD itself has done something wrong), which we've seen in practice on various hosts, and it would be a shame to lose that. Ok, the point is to have an interface that is able to expose kvm_clock, which is used by Firecracker, I guess this could be added without much pain into this existing Xen code. If not, I think long-term we should introduce a new sys/dev/pv or something, move the bulk of xen_clock.c to that (other than the Xen-specific parts), and have both the Firecracker code and the Xen code use it. Actually that's what OpenBSD does, they do have a sys/dev/pv with a pvbus and it's honestly a classy way of dealing with various hypervisors. On a side note: I'm not used to this clock/rtc mechanisms, but something puzzles me, when the virtual machine is started without MC146818 RTC support, it hangs at todr_gettime_ymdhms, which is mapped to rtc_get_ymdhms in sys/arch/x86/isa/rtc.c, which at the end of the day calls mc146818_read(). Shouldn't sys/arch/x86/isa/clock.c:startrtclock() return when mc146818_read() fails? There seem to have nothing but MC146818 for RTC in x86. -- Emile `iMil' Heitor | https://imil.net
pvclock (kvm_clock) support: where to attach
I ported pvclock / kvm_clock from OpenBSD in order for Firecracker to have an RTC. It's working but I'm not entirely sure where to attach it. The device is x86 only so I added the source code in arch/x86/x86, and for now I have it attached at cpufeaturebus which felt the more natural. Thoughts? Here's the code: https://github.com/NetBSD/src/commit/c09440be5aca7e16ce845c3ccbdfb47bac03fb63 -- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
Update I've got current/amd64 booting on an MMIO-backed block device with qemu's microvm machine, in both ACPI and command-line-hack (which passes the "device" address as a kernel parameter). If anybody wants to give it a try: $ qemu-system-x86_64 -M microvm,rtc=on -enable-kvm -m 256 -cpu host -kernel netbsd.gdb -append "root=ld0a console=com rw" -serial stdio -display none -device virtio-blk-device,drive=hd0 -drive file=netbsd.img,format=raw,id=hd0 -no-acpi Note the "rtc=on", without this init_main.c/inittodr(rootfstime) hangs. On the other hand, Firecracker loops on an init sig11 crash, I suspect that might also be related to RTC, I guess I'll have to port a virtio RTC. The branch with this feature is here https://github.com/NetBSDfr/NetBSD-src/tree/mmio_cmdline It includes the GENPVH mode to boot the NetBSD kernel from qemu's -kernel flag. You'll need at least those two: virtio* at cmdlinebus? #virtio* at acpi? ld* at virtio? # Virtio disk device As previously said, virtio* at acpi? also works. -- ------------ Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
On 12/26/23 11:35, Emile 'iMil' Heitor wrote: intercepted (or so it seems); I thought the hang, which actually happens sys/kern/subr_disk.c:disk_read_sectors/biowait, Found it. Wrong bus_space_write_4() in virtio_mmio_setup_queue(), 100% my fault, sorry for the noise. IRQ is triggered, block device is recognized, we're close :) -- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
Following up I ported the "Linux MPtable bug"-feature found by Colin and IOAPIC is now detected https://github.com/NetBSD/src/commit/37e57b3a0f464246611a005a350e01e25e091bfe Nevertheless, the disk IRQ from Firecracker is still not intercepted (or so it seems); I thought the hang, which actually happens sys/kern/subr_disk.c:disk_read_sectors/biowait, might be related to bad/missing VirtIO device features but I checked on FreeBSD and the features are correctly computed. Ideas are welcome. -- -------- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
On 12/20/23 16:56, Taylor R Campbell wrote: My guess is that this is not going to be MSI -- it'll either be i8259 or ioapic. In that case, it may work to do this: Thanks a lot for that, here's the result, unfortunately it still hangs at the first VOP_OPEN() from subr_disk_open.c and the handler function never gets triggered: static int virtio_mmio_cmdline_alloc_interrupts(struct virtio_mmio_softc *msc) { struct virtio_mmio_cmdline_softc *const sc = (struct virtio_mmio_cmdline_softc *)msc; struct virtio_softc *const vsc = >sc_sc; struct ioapic_softc *ioapic; struct pic *pic; int irq = sc->margs.irq; int pin = irq; ioapic = ioapic_find_bybase(irq); /* we don't enter here */ if (ioapic != NULL) { pic = >sc_pic; pin = irq - pic->pic_vecbase; irq = -1; } else pic = _pic; msc->sc_ih = intr_establish_xname(irq, pic, pin, IST_EDGE, IPL_BIO, virtio_mmio_intr, msc, false, device_xname(vsc->sc_dev)); if (msc->sc_ih == NULL) { aprint_error_dev(vsc->sc_dev, "failed to establish interrupt\n"); return -1; } aprint_normal_dev(vsc->sc_dev, "interrupting on %d\n", irq); return 0; } /* --- */ intr_establish_xname() does get through : [ 1.030] viommio: 4K@0xd000:5 [ 1.030] virtio0: block device (id 2, rev. 0x02) [ 1.030] ld0 at virtio0: features: 0x1 [ 1.030] virtio0: interrupting on 5 <--- [ 1.030] ld0: 30720 KB, 60 cyl, 16 head, 63 sec, 512 bytes/sect x 61441 sectors [ 1.0107398] boot device: ld0 -- ------------ Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
On 12/20/23 06:55, Emile 'iMil' Heitor wrote: Well that's the thing, I can't find where does MMIO attaches on FreeBSD, they have a very simple way of creating the resources: After a bit of digging, their virtio_mmio device attaches to "nexus0", which if I understand correctly, is our mainbus equivalent. -- -------- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
On 12/19/23 13:11, Taylor R Campbell wrote: to queue a task and then run it. Normally softint_schedule is called from a hard interrupt handler. Here, you need something in the host to get hard interrupts. For example, on ACPI systems there are ACPI interrupt resources that can be used with acpi_intr_establish; on FDT systems, device nodes have interrupt properties that can be used with fdtbus_intr_establish. How does FreeBSD get x86 mmio intrs on these systems? Well that's the thing, I can't find where does MMIO attaches on FreeBSD, they have a very simple way of creating the resources: https://github.com/freebsd/freebsd-src/blob/main/sys/dev/virtio/mmio/virtio_mmio_cmdline.c#L109 and even Colin doesn't know where this goes https://x.com/cperciva/status/173720959840762?s=20 What I can say: it's not ACPI nor PCI as they are both unsupported on Firecracker. -- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
On 12/16/23 10:25, Emile 'iMil' Heitor wrote: FYI I'm on it, based on Colin Percival's work here https://github.com/freebsd/freebsd-src/blob/main/sys/dev/virtio/mmio/virtio_mmio_cmdline.c I'm getting some results but Firecracker uses MMIO v2 and we only have v1 so there's still quite some work to do. So far I get the block device and the correct geometry is reported [ 1.030] virtio0 at mainbus0 [ 1.030] kernel parameters: console=com root=ld0e virtio_mmio.device=4K@0xd000:5 [ 1.030] viommio: 4K@0xd000:5 [ 1.030] virtio0: block device (id 2, rev. 0x02) [ 1.030] ld0 at virtio0: features: 0x1 [ 1.030] ld0: 30720 KB, 60 cyl, 16 head, 63 sec, 512 bytes/sect x 61441 sectors [ 1.0199904] boot device: ld0 but I get stuck in sys/kern/subr_disk_open.c in opendisk() at: error = VOP_OPEN(tmpvn, FREAD | FSILENT, NOCRED); ideas on what could be locking here? Maybe related: the interrupt handler function I wrote uses softint_establish() as there's no "real" hardware behind this block device, is this the correct way to deal with it? Anybody wanting to have a look https://imil.net/NetBSD/mmio.patch It applies over this branch https://github.com/NetBSDfr/NetBSD-src/tree/GENPVH -- -------- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
On 2/23/22 20:26, el16095 wrote: Firecracker informs VMs about MMIO devices by appending to the boot command line a string like this "virtio_mmio.device=4K@0xd000:5" ([virtio_mmio.]device=@:). So, from what I understand I'd need to write glue code that takes this information and uses it to setup the MMIO devices on the VM side the way Firecracker expects it to; and then attach virtio through that, right? FYI I'm on it, based on Colin Percival's work here https://github.com/freebsd/freebsd-src/blob/main/sys/dev/virtio/mmio/virtio_mmio_cmdline.c I'm getting some results but Firecracker uses MMIO v2 and we only have v1 so there's still quite some work to do. -- -------- Emile `iMil' Heitor | https://imil.net
Re: VirtIO MMIO for amd64
On 2/23/22 20:26, el16095 wrote: Firecracker informs VMs about MMIO devices by appending to the boot command line a string like this "virtio_mmio.device=4K@0xd000:5" ([virtio_mmio.]device=@:). So, from what I understand I'd need to write glue code that takes this information and uses it to setup the MMIO devices on the VM side the way Firecracker expects it to; and then attach virtio through that, right? Was there any more work on this topic? -- -------- Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
On Mon, 11 Dec 2023, Emile `iMil' Heitor wrote: We still need to check if we didn't break anything on Xen side and test Firecracker. FYI qemu-system-x86_64 also works with the "microvm" machine type. I am able to boot this patched NetBSD kernel using Colin Percival's PVH-enabled Firecracker https://github.com/firecracker-microvm/firecracker/pull/3155 Nevertheless, Firecracker, as the microvm qemu machine type, uses vioblk for block devices, which we don't have. Yet. -------- Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
On Mon, 11 Dec 2023, Manuel Bouyer wrote: Yes, right now GENERIC can be used on bare-metal, PVHVM and XENPVH. It would be good to have GENERIC working on GENPVH too. Fair enough, I'll switch to this path then, thanks for the advice. Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
Hi Manuel, On Mon, 11 Dec 2023, Manuel Bouyer wrote: #ifndef GENPVH /* get a page for HYPERVISOR_shared_info */ addl$PAGE_SIZE, %ebx addl$PGOFSET,%ebx andl$~PGOFSET,%ebx movl$RELOC(HYPERVISOR_shared_info_pa),%ebp movl%ebx,(%ebp) movl$0,4(%ebp) #endif How can this work on Xen when GENPVH is defined ? Shouldn't this be made conditional on vm_guest == VM_GUEST_XENPVH ? Well the point is that you don't define GENPVH when using Xen, PVH using qemu and friends don't need HYPERVISOR_shared_info neither any of the hypercall portion of the code. A big chunk of Xen related code is ifndef'ed to GENPVH in hypervisor.c; And I was planning on isolating GENPVH so there's as little ifdef's as possible. Or would you prefer the same kernel to be able to boot in both XENPVH and GENPVH modes? I am focusing on making the resulting kernel smaller but this could be done also. Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
Here is a clean(er) patch https://github.com/NetBSD/src/compare/trunk...NetBSDfr:NetBSD-src:GENPVH Rationale Like previously explained, locore.S expects start_info being passed by the calling hypervisor on %ebx to be located at the end of the symbol table. Qemu and Firecracker don't follow this rule which is not part of the official Xen ABI https://xenbits.xen.org/docs/unstable/misc/pvh.html What our patch first does is make memory mapping loops happy by copying the start_info structure where it's expected. After that, memory locations and boot parameters are correctly found and boot can proceed. Of course, the hypervisor not being Xen, a lot of Xen-related code is useless, hence the new VM_GUEST_GENPVH (for Generic PVH) vm_guest type, as first suggested by Manuel, and a new kernel option, GENPVH. I kept the Xen code structure as it was and changed very little code, only some || vm_guest == VM_GUEST_GENPVH and a couple #ifndef GENPVH. In order to build a Generic PVH kernel, the following options are needed #Xen PV support for PVH and HVM guests options XENPVHVM options XEN # Generic PVH support (qemu, firecracker...) options GENPVH I've added https://github.com/NetBSDfr/NetBSD-src/blob/GENPVH/sys/arch/amd64/conf/MICROVM as an example config file. I'll probably end up ditching XENPVHVM and XEN but there's still quite some work in there. We still need to check if we didn't break anything on Xen side and test Firecracker. FYI qemu-system-x86_64 also works with the "microvm" machine type. Feedback very welcome. -------- Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
I got it working. NetBSD/amd64 kernel booting in PVH mode straight from qemu -kernel flag. It now needs a lot of cleaninig as it's basically a PoC, but here's a WIP patch if anyone's interested in hacking into it. https://imil.net/NetBSD/qemu-pvh.patch Let me rephrase: I *know* it is ugly at the moment. I *will* make it clean, just wanted to share the joy ;) Cheers, Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
On Wed, 29 Nov 2023, Manuel Bouyer wrote: Of course, this is *not* a Xen VM, so no surprise that start_xen32 isn't working. I'm just sharing the progress here, in case someone is interested. If this is annoying, I'll just keep it to myself until I post an -hypothetical- final patch, and sorry for the noise. Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
On Thu, 23 Nov 2023, Emile `iMil' Heitor wrote: It seems we have a similar problem to the second bullet point Colin Percival noted here https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html When removing the hvm_start_info address save portion, the sym mapping doesn't fall into an infinite loop anymore. Not yet sure how to fix that, I'll have a look at FreeBSD's commits on this matter. And so it was, in locore.S:start_xen32, this assumption is wrong when the entrypoint is called from qemu: /* * save addr of the hvm_start_info structure. This is also the end * of the symbol table */ this makes esym point to an address (%ebx + KERNBASE) which is not the end of the symbol table. Same goes with eblob which is calculated relative to %ebx. A friend of mine, Gregory in CC, found that putting those 2 (esym and eblob) to 0 made the paging init go fine as both tests (l.660 and 667) will trigger jz 1f and keep %edi to __kernel_end. This brings us to init_xen_early(), which is failing but that's another story. Emile `iMil' Heitor | https://imil.net
Re: PVH boot with qemu
On Mon, 13 Nov 2023, Manuel Bouyer wrote: On Mon, Nov 13, 2023 at 06:37:01AM +0100, Emile `iMil' Heitor wrote: The start_xen32 entrypoint is then found, and the kernel start, but falls in an infinite loop in locore.S when mapping symbols and preloaded modules, more precisely, in the fillkpt_nox macro. I assume %ecx is wrong or the region corrupted for some reason. https://github.com/NetBSD/src/blob/trunk/sys/arch/amd64/amd64/locore.S#L738 I don't think you can use start_xen32 as is, as it expects a Xen environnemwnt. You may need to write a new start routine, or make a difference between Xen vs non-Xen in the existing one. It seems we have a similar problem to the second bullet point Colin Percival noted here https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html When removing the hvm_start_info address save portion, the sym mapping doesn't fall into an infinite loop anymore. Not yet sure how to fix that, I'll have a look at FreeBSD's commits on this matter. Emile `iMil' Heitor | https://imil.net
PVH boot with qemu
I first asked guidance in port-xen@ but the topic doesn't seem to have much success, I'll try my chances here. I am trying to make NetBSD/amd64 boot in PVH mode with qemu, using qemu's -kernel flag. The kernel does start executing thanks to the first step explained here https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html i.e. adding PVH entry point to the kernel ELF notes. #define ELFNOTE(name, type, desctype, descdata...) \ -.pushsection .note.name; \ +.pushsection .note.name, "a", @note; \ .align 4 ; \ .long 2f - 1f/* namesz */; \ .long 4f - 3f/* descsz */; \ @@ -588,6 +603,8 @@ next: pop %edi movl%eax,(%ebp) The start_xen32 entrypoint is then found, and the kernel start, but falls in an infinite loop in locore.S when mapping symbols and preloaded modules, more precisely, in the fillkpt_nox macro. I assume %ecx is wrong or the region corrupted for some reason. https://github.com/NetBSD/src/blob/trunk/sys/arch/amd64/amd64/locore.S#L738 This is far from my comfort zone but I'm willing to go down the rabbit hole, yet some advice on where to look and possible reasons of this loop would be greatly appreciated. Note that this feature would also allow NetBSD to run on AWS's Firecracker, a microvm hypervisor used in their Lambda product. Thanks, -------- Emile `iMil' Heitor | https://imil.net
Disabling drivers from a NetBSD kernel without `userconf(4)` nor rebuild
Hi, I've restarted working on a micro-NetBSD project I've documented back in 2020 here: https://imil.net/blog/posts/2020/fakecracker-netbsd-as-a-function-based-microvm/ The idea being to create a micro-service virtual machine that starts a NetBSD kernel and a dedicated service in less than 200ms. The project works but there are some elements to be improved. First, in order to reduce boot speed, the kernel is directly called from qemu with the -kernel flag, this means the project relies on multiboot(8), so for the time being, it only works with an i386 kernel. Second, in this proof of concept, it is mandatory to rebuild a very minimal kernel in order to kick out every driver that's not absolutely necessary, but this step would be really painful for someone not used and not wanting to dig into kernel build etc. My endgame is being able to build a NetBSD-based micro service with something similar to: $ mksmolnb https://cdn.NetBSD.org 9.3 nginx which will download and prepare all the pieces with a process similar to what I described in the blog post. I began digging for a method that could permit to disable kernel drivers in the same way userconf(4) does, and realized with the help of mlelstv@ on IRC that I'd only need to alter autoconf tables to disable all the drivers I'd want. So I found where cfdata was, and how to overwrite the fstate with gdb --write and finally came to this very nasty hack: https://gitlab.com/iMil/sailor/-/snippets/2491821 which basically permits to disable every driver in the kernel except those needed directly in the kernel binary. I am totally aware that this method could be made obsolete as soon as struct cfdata is modified. I just wanted to share this for anyone curious how to do this, and maybe get some more ideas to make it cleaner. Cheers, Emile `iMil' Heitor | https://imil.net
Re: x86 bootstrap features
Hi Kamil, Emmanuel & all, On Tue, 24 Sep 2019, Kamil Rytarowski wrote: On 24.09.2019 14:26, Emmanuel Dreyfus wrote: On Tue, Sep 24, 2019 at 01:31:51PM +0200, Kamil Rytarowski wrote: My use-case is "qemu-system-x86_64 -kernel ./netbsd". Last I tried (with multiboot2 patches merged) it still did not work. I did not commit anything in multiboot support in the NetBSD kernel, I only worked on bootstraps for now, hence the steady failure you experience should come at no suprise. For now our kernel has support code for multiboot 1 for i386 only. qemu-system-i386 works, but -x86_64 not. Are there plans to add it to the amd64 kernel? Is there any news on this front? Being able to boot an amd64 kernel directly from kvm would give NetBSD the ability to be started by AWS Firecracker[1] out of the box which would be amazing. [1]: https://github.com/firecracker-microvm/firecracker -------- Emile `iMil' Heitor * _ | http://imil.net| ASCII ribbon campaign ( ) | http://www.NetBSD.org | - against HTML email X | http://gcu.info| & vCards / \ !DSPAM:5e25d89018011320695049!