Re: Creating a hackable kernel for AMD64

2024-04-24 Thread Emile `iMil' Heitor

On Tue, 23 Apr 2024, Taylor R Campbell wrote:


Cool, thanks!  The main thing you're trying to do -- pass a kernel on
the command line to qemu -- doesn't work on x86.


It does with these patches: 
https://github.com/NetBSDfr/NetBSD-src/tree/nbfr_master
They need to be reviewed before I merge them into current.

Cheers,


Emile `iMil' Heitor  | https://imil.net



Re: [GSOC][NEWBIE]

2024-03-28 Thread Emile 'iMil' Heitor



On 3/28/24 17:44, Jamgadepatil Shivraj Shashikant wrote:

I even tried gziped kernel image stored in releasedir directory, so I 
guess it's not only PVH boot that was causing the problem. I suspected I 
might have built the kernel incorrectly, so I also tried booting from 
the daily kernel build provided on the github but the kvm error persists.
qemu command I am using for booting the system is: qemu-system-x86_64 -m 
2048 -kernel netbsd-GENERIC.gz -drive 
if=virtio,file=disk.qcow2,format=qcow2 -enable-kvm


Don't gzip the kernel, here's an example session:

$ git branch
* perf
$ ./build.sh -U -u -j4 -T obj/tooldir -m amd64 tools
$ ./build.sh -U -u -j4 -T obj/tooldir -m amd64 kernel=MICROVM
$ KERNEL=sys/arch/amd64/compile/obj/MICROVM/netbsd
$ IMG=NetBSD-10.99.10-amd64-live.img # fetch and gunzip this from 
https://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/202403200640Z/images/NetBSD-10.99.

10-amd64-live.img.gz
$ qemu-system-x86_64 \
-M microvm,x-option-roms=off,rtc=on,acpi=off,pic=off \
-enable-kvm -m 512 -cpu host,+invtsc \
-append "root=ld0a console=com rw -v" -display none \
-device virtio-blk-device,drive=hd0 \
-drive file=${IMG},format=raw,id=hd0 \
-netdev user,id=net0,hostfwd=tcp::10022-:22 \
-device virtio-net-device,netdev=net0 \
-kernel ${KERNEL} -serial mon:stdio

FYI I'll be mostly AFK till next Tuesday.

Cheers,

--
--------
Emile `iMil' Heitor  | https://imil.net


Re: [GSOC][NEWBIE]

2024-03-28 Thread Emile 'iMil' Heitor

Hi Shivraj,

On 3/27/24 18:27, Jamgadepatil Shivraj Shashikant wrote:
I am using Linux as my host OS and have cross-compiled the NetBSD kernel 
but loading the custom kernel results in a KVM emulation error. I am 


If you seek to run a NetBSD/amd64 kernel with the -kernel qemu/kvm
option, I suggest you give a try to this development branch:
https://github.com/NetBSDfr/NetBSD-src/tree/perf
Official NetBSD source tree doesn't have PVH boot merged yet, this
branch has it, along with performance improvements.

Cheers,

--

Emile `iMil' Heitor  | https://imil.net


NVMM sync up from DragonFlyBSD

2024-01-27 Thread Emile 'iMil' Heitor



I've synced up our NVMM code (kmod, lib and tool) with its current
state in DragonFlyBSD; if you're using NVMM on NetBSD as your
hypervisor you might want to give it a try 
https://github.com/NetBSD/src/compare/trunk...NetBSDfr:NetBSD-src:nvmm


I've also added a vmware compatible CPU frequency cpuid leaf which can
be used to get CPU frequency from the host instead of doing spending
100ms in DELAY(). Qemu knows hot to expose it via the -cpu host,+invtsc
flag.

--

Emile `iMil' Heitor  | https://imil.net


NetBSD/amd64 current performance patch

2024-01-22 Thread Emile 'iMil' Heitor
ernel and minimal root disk at https://imil.net/NetBSD/netbsd-perf
and https://imil.net/NetBSD/disk.img

Feedback as always very welcome.

Cheers,

[1]: https://www.qemu.org/docs/master/system/i386/microvm.html

--

Emile `iMil' Heitor  | https://imil.net


Re: *oldlenp comes back with wrong value in helper sysctl_createv() function

2024-01-20 Thread Emile 'iMil' Heitor





 case CTLTYPE_STRING: {
 unsigned char buf[1024], *tbuf;
 tbuf = buf;
 sz = sizeof(buf);
 rc = prog_sysctl([0], namelen, tbuf, , NULL, 0);

The sysctl command first tries with a buffer of 1024 bytes
and retries with the right size when that was too small.


Got it, makes sense. Thanks for the reply!

--

Emile `iMil' Heitor  | https://imil.net


Re: *oldlenp comes back with wrong value in helper sysctl_createv() function

2024-01-20 Thread Emile 'iMil' Heitor

The helper function produces the value that is returned in *oldlenp.

If you happen to use CTL_DESCRIBE (e.g. running sysctl -d), it's
not your helper function being called but the sysctl_describe helper
that returns a value of 1024.

Maybe you can show your helper routine and how you call sysctl ?


sure, I call sysctl this way: sysctl debug.tslog

here's the helper function code:

static int
sysctl_debug_tslog(SYSCTLFN_ARGS)
{
char buf[LINE_MAX] = "";
char *where = oldp;
size_t slen, i, limit;
int error = 0;
static size_t needed = 0;

/* Come back with the right size */
/* here at second call, *oldlenp == 1024 */
if (*oldlenp < needed) {
*oldlenp = needed;
return 0; /* will be back with correct size */
}
/* Add data logged within the kernel. */
limit = MIN(nrecs, nitems(timestamps));
for (i = 0; i < limit; i++) {
snprintf(buf, LINE_MAX, "0x%x %llu",
timestamps[i].lid,
(unsigned long long)timestamps[i].tsc);
switch (timestamps[i].type) {
case TS_ENTER:
strcat(buf, " ENTER");
break;
case TS_EXIT:
strcat(buf, " EXIT");
break;
case TS_THREAD:
strcat(buf, " THREAD");
break;
case TS_EVENT:
strcat(buf, " EVENT");
break;
}
snprintf(buf, LINE_MAX, "%s %s", buf,
timestamps[i].f ? timestamps[i].f : "(null)");
if (timestamps[i].s)
snprintf(buf, LINE_MAX, "%s %s\n", buf,
timestamps[i].s);
else
strcat(buf, "\n");

slen = strlen(buf) + 1;

if (where == NULL) /* 1st pass, calculate needed */
needed += slen;
else {
if (i > 0)
where--; /* overwrite last \0 */
if ((error = copyout(buf, where, slen)))
break;
where += slen;
}
}
/* Come back with an address */
if (oldp == NULL)
*oldlenp = needed;

return error;
}

Here's the setup:

SYSCTL_SETUP(sysctl_tslog_setup, "tslog sysctl")
{
sysctl_createv(NULL, 0, NULL, NULL,
CTLFLAG_PERMANENT|CTLFLAG_READONLY,
CTLTYPE_STRING, "tslog",
SYSCTL_DESCR("Dump recorded event timestamps"),
sysctl_debug_tslog, 0, NULL, 0,
    CTL_DEBUG, CTL_CREATE, CTL_EOL);
}

--

Emile `iMil' Heitor  | https://imil.net


*oldlenp comes back with wrong value in helper sysctl_createv() function

2024-01-19 Thread Emile 'iMil' Heitor



I am creating a sysctl entry for TSLOG 
https://man.freebsd.org/cgi/man.cgi?query=tslog=4=freebsd-release


In every example I see in our source tree, I read that when oldp is
NULL, *oldlenp should be set to desired size, return 0, and the
sysctl_createv() helper function will be again called with the right
*oldlenp value.
Except it does not, the first time it calls back the helper function,
*oldlenp value is 1024 no matter what I set it to before.
But if I return once again (either with ENOMEM or 0, doesn't matter),
the helper function will now be called with the right *oldlenp value.

I fail to see where this behavior comes from and I'd like to
understand the logic behind it.

--

Emile `iMil' Heitor  | https://imil.net


Re: PVH boot with qemu

2024-01-08 Thread Emile 'iMil' Heitor




On 1/8/24 23:05, Manuel Bouyer wrote:

in consinit.c you have:

+#if defined(XENPVHVM) || defined(GENPVH)
+#ifndef GENPVH
 if (vm_guest == VM_GUEST_XENPVH) {
 if (xen_pvh_consinit() != 0)
 return;
 /* fallback to native console selection, usefull for dom0 PVH 
*/
 }
+#endif

shouldn't the #ifndef GENPVH really be #ifdef XENPVHVM ?


oh absolutely


In the same way, the #ifndef GENPVH in xen_machdep.c should either be
#ifdef XENPVHVM or #ifdef XEN


Indeed.

I've changed those thank you for the review!

--

Emile `iMil' Heitor  | https://imil.net


Re: PVH boot with qemu

2024-01-08 Thread Emile 'iMil' Heitor



This morning I was given the idea of having the possibility to build a
Xen-free kernel but still GENPVH capable.
This doesn't impact GENERIC which is still able to boot both Xen and
GENPVH with the following configuration:

options XENPVHVM
options XEN
hypervisor* at mainbus? # Xen hypervisor
xenbus* at hypervisor?  # Xen virtual bus
xencons*at hypervisor?  # Xen virtual console
...

Now for GENPVH only we would have a unique kernel configuration
option:

options GENPVH

The only drawback I see is that it adds quite some ifn?def's GENPVH.

Here's the patch: https://imil.net/NetBSD/noxen.patch

Does this look reasonable to you?

--

Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2024-01-08 Thread Emile 'iMil' Heitor



Here's the full patch for GENPVH+cmdline MMIO+Firecracker support

https://imil.net/NetBSD/genpvh+mmio.patch

This patch includes:

- generic PVH boot for qemu, Firecracker etc... see the "PVH boot with
  qemu" thread for details
- a new bus, pvbus, used to attach devices that could be used in more
  than one hypervisor, such as the cmdline MMIO device or pvclock (not
  ported yet)
- MMIO cmdline support, used by qemu and FC to advertise virtio devices
  memory mappings to guests
- a "fix" for MP tables, or like Colin says "bug for bug compatibility",
  needed to detect IOAPIC on guests with no ACPI
- a fix for ld_virtio block device enabling FC to use it

The patch applies on the latest current.

kernel configuration:

pv* at pvbus?
virtio* at pv?
ld* at virtio?
vioif* at virtio?
viornd* at virtio?

This is how fast this thing boots: 
https://twitter.com/iMilnb/status/1743576957046931647


Feedback appreciated.

Cheers,

--
------------
Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2024-01-06 Thread Emile 'iMil' Heitor




On 1/6/24 08:33, Emile 'iMil' Heitor wrote:

To support this, I've attached `virtio*` to a newly created bus,
`cmdlinebus?` which is attached to mainbus in amd64_mainbus.c
I am not entirely sure this is the cleanest way to proceed and
would like to have your feedback.


I like the design of OpenBSD's pvbus, it would be well fitted
for this and more

https://github.com/openbsd/src/tree/master/sys/dev/pv

Thoughts?

--

Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2024-01-05 Thread Emile 'iMil' Heitor



Update

My NetBSD/amd64 branch can now boot using PVH on qemu/x86_64,
qemu/microvm and Firecracker using MMIO backed block devices.
It also supports multiple VirtIO devices such as NICs.

The last blocking point was that Firecracker only supports one data
segment and as such needs DMA bouncing which was not enabled for that
case (thanks Jason for the hint!).

Now before opening a PR I'd like your advice on the "cmdline bus"
part. As I explained earlier in this thread, without PCI or ACPI, MMIO
devices are "detected" through a kernel parameter, for example:

root=ld0a console=com rw -v virtio_mmio.device=512@0xfeb00e00:12 
virtio_mmio.device=512@0xfeb00c00:11


To support this, I've attached `virtio*` to a newly created bus,
`cmdlinebus?` which is attached to mainbus in amd64_mainbus.c
I am not entirely sure this is the cleanest way to proceed and
would like to have your feedback.

The working branch is here: 
https://github.com/NetBSDfr/NetBSD-src/tree/mmio_cmdline


Cheers,

--
--------
Emile `iMil' Heitor  | https://imil.net


Re: PVH boot with qemu

2024-01-02 Thread Emile `iMil' Heitor



Here's a final patch https://imil.net/NetBSD/GENPVH.patch

It implements PVH boot from both qemu with the -kernel flag and
Firecracker with Colin Perceval's PVH patches 
https://github.com/firecracker-microvm/firecracker/tree/feature/pvh
The patch should apply on current as-is 
https://github.com/NetBSD/src/compare/trunk...NetBSDfr:NetBSD-src:GENPVH

I've already explained the rationale in this thread, this iteration is
mostly refinements from Gregory, especially regarding the assembly
part.
Like previously asked, it works on GENERIC wihout modfication of
kernel's configuration.

Can we have a review before I do a PR?

Thanks,


Emile `iMil' Heitor  | https://imil.net



Re: pvclock (kvm_clock) support: where to attach

2023-12-31 Thread Emile 'iMil' Heitor




On 12/31/23 15:17, Taylor R Campbell wrote:

https://nxr.netbsd.org/xref/src/sys/arch/xen/xen/xen_clock.c


oh.


Unless there's a compelling reason that the pvclock and xenclock
interfaces are different enough to warrant having multiple copies of
the logic in src, I think we should adapt the existing code to work in
both settings.  I put a lot of work into the xen_clock.c driver to
record useful diagnostics about when the host's time is not behaving
right (vs when NetBSD itself has done something wrong), which we've
seen in practice on various hosts, and it would be a shame to lose
that.


Ok, the point is to have an interface that is able to expose
kvm_clock, which is used by Firecracker, I guess this could be
added without much pain into this existing Xen code.


If not, I think long-term we should introduce a new sys/dev/pv or
something, move the bulk of xen_clock.c to that (other than the
Xen-specific parts), and have both the Firecracker code and the Xen
code use it.


Actually that's what OpenBSD does, they do have a sys/dev/pv with a
pvbus and it's honestly a classy way of dealing with various
hypervisors.

On a side note:

I'm not used to this clock/rtc mechanisms, but something puzzles me,
when the virtual machine is started without MC146818 RTC support, it
hangs at todr_gettime_ymdhms, which is mapped to rtc_get_ymdhms in
sys/arch/x86/isa/rtc.c, which at the end of the day calls
mc146818_read(). Shouldn't sys/arch/x86/isa/clock.c:startrtclock()
return when mc146818_read() fails? There seem to have nothing but
MC146818 for RTC in x86.

--

Emile `iMil' Heitor  | https://imil.net


pvclock (kvm_clock) support: where to attach

2023-12-31 Thread Emile 'iMil' Heitor



I ported pvclock / kvm_clock from OpenBSD in order for Firecracker to
have an RTC. It's working but I'm not entirely sure where to attach it.

The device is x86 only so I added the source code in arch/x86/x86, and
for now I have it attached at cpufeaturebus which felt the more natural.

Thoughts?

Here's the code: 
https://github.com/NetBSD/src/commit/c09440be5aca7e16ce845c3ccbdfb47bac03fb63


--

Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-28 Thread Emile 'iMil' Heitor



Update

I've got current/amd64 booting on an MMIO-backed block device
with qemu's microvm machine, in both ACPI and command-line-hack
(which passes the "device" address as a kernel parameter).

If anybody wants to give it a try:

$ qemu-system-x86_64 -M microvm,rtc=on -enable-kvm -m 256 -cpu host 
-kernel netbsd.gdb -append "root=ld0a console=com rw" -serial stdio 
-display none -device virtio-blk-device,drive=hd0 -drive 
file=netbsd.img,format=raw,id=hd0 -no-acpi


Note the "rtc=on", without this init_main.c/inittodr(rootfstime)
hangs.

On the other hand, Firecracker loops on an init sig11 crash, I
suspect that might also be related to RTC, I guess I'll have to port
a virtio RTC.

The branch with this feature is here 
https://github.com/NetBSDfr/NetBSD-src/tree/mmio_cmdline

It includes the GENPVH mode to boot the NetBSD kernel from
qemu's -kernel flag.
You'll need at least those two:

virtio* at cmdlinebus?
#virtio* at acpi?
ld* at virtio?  # Virtio disk device

As previously said, virtio* at acpi? also works.

--
------------
Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-26 Thread Emile 'iMil' Heitor



On 12/26/23 11:35, Emile 'iMil' Heitor wrote:


intercepted (or so it seems); I thought the hang, which
actually happens sys/kern/subr_disk.c:disk_read_sectors/biowait,


Found it. Wrong bus_space_write_4() in virtio_mmio_setup_queue(),
100% my fault, sorry for the noise.
IRQ is triggered, block device is recognized, we're close :)

--

Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-26 Thread Emile 'iMil' Heitor

Following up

I ported the "Linux MPtable bug"-feature found by Colin and
IOAPIC is now detected 
https://github.com/NetBSD/src/commit/37e57b3a0f464246611a005a350e01e25e091bfe


Nevertheless, the disk IRQ from Firecracker is still not
intercepted (or so it seems); I thought the hang, which
actually happens sys/kern/subr_disk.c:disk_read_sectors/biowait,
might be related to bad/missing VirtIO device features but
I checked on FreeBSD and the features are correctly computed.

Ideas are welcome.

--
--------
Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-20 Thread Emile 'iMil' Heitor



On 12/20/23 16:56, Taylor R Campbell wrote:


My guess is that this is not going to be MSI -- it'll either be i8259
or ioapic.  In that case, it may work to do this:


Thanks a lot for that, here's the result, unfortunately it still
hangs at the first VOP_OPEN() from subr_disk_open.c and the
handler function never gets triggered:

static int
virtio_mmio_cmdline_alloc_interrupts(struct virtio_mmio_softc *msc)
{
struct virtio_mmio_cmdline_softc *const sc =
(struct virtio_mmio_cmdline_softc *)msc;
struct virtio_softc *const vsc = >sc_sc;
struct ioapic_softc *ioapic;
struct pic *pic;
int irq = sc->margs.irq;
int pin = irq;

ioapic = ioapic_find_bybase(irq);

/* we don't enter here */
if (ioapic != NULL) {
pic = >sc_pic;
pin = irq - pic->pic_vecbase;
irq = -1;
} else
pic = _pic;

msc->sc_ih = intr_establish_xname(irq, pic, pin, IST_EDGE, IPL_BIO,
virtio_mmio_intr, msc, false, device_xname(vsc->sc_dev));
if (msc->sc_ih == NULL) {
aprint_error_dev(vsc->sc_dev,
"failed to establish interrupt\n");
return -1;
}
aprint_normal_dev(vsc->sc_dev, "interrupting on %d\n", irq);

return 0;
}

/* --- */

intr_establish_xname() does get through :

[   1.030] viommio: 4K@0xd000:5
[   1.030] virtio0: block device (id 2, rev. 0x02)
[   1.030] ld0 at virtio0: features: 0x1
[   1.030] virtio0: interrupting on 5 <---
[   1.030] ld0: 30720 KB, 60 cyl, 16 head, 63 sec, 512 bytes/sect x 
61441 sectors

[   1.0107398] boot device: ld0

--
------------
Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-19 Thread Emile 'iMil' Heitor




On 12/20/23 06:55, Emile 'iMil' Heitor wrote:


Well that's the thing, I can't find where does MMIO attaches on FreeBSD,
they have a very simple way of creating the resources:


After a bit of digging, their virtio_mmio device attaches to "nexus0",
which if I understand correctly, is our mainbus equivalent.

--
--------
Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-19 Thread Emile 'iMil' Heitor



On 12/19/23 13:11, Taylor R Campbell wrote:


to queue a task and then run it.  Normally softint_schedule is called
from a hard interrupt handler.  Here, you need something in the host
to get hard interrupts.

For example, on ACPI systems there are ACPI interrupt resources that
can be used with acpi_intr_establish; on FDT systems, device nodes
have interrupt properties that can be used with fdtbus_intr_establish.

How does FreeBSD get x86 mmio intrs on these systems?


Well that's the thing, I can't find where does MMIO attaches on FreeBSD,
they have a very simple way of creating the resources:

https://github.com/freebsd/freebsd-src/blob/main/sys/dev/virtio/mmio/virtio_mmio_cmdline.c#L109

and even Colin doesn't know where this goes 
https://x.com/cperciva/status/173720959840762?s=20


What I can say: it's not ACPI nor PCI as they are both unsupported
on Firecracker.

--

Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-18 Thread Emile 'iMil' Heitor



On 12/16/23 10:25, Emile 'iMil' Heitor wrote:
FYI I'm on it, based on Colin Percival's work here 
https://github.com/freebsd/freebsd-src/blob/main/sys/dev/virtio/mmio/virtio_mmio_cmdline.c

I'm getting some results but Firecracker uses MMIO v2 and we only have
v1 so there's still quite some work to do.


So far I get the block device and the correct geometry
is reported

[   1.030] virtio0 at mainbus0
[   1.030] kernel parameters: console=com root=ld0e 
virtio_mmio.device=4K@0xd000:5

[   1.030] viommio: 4K@0xd000:5
[   1.030] virtio0: block device (id 2, rev. 0x02)
[   1.030] ld0 at virtio0: features: 0x1
[   1.030] ld0: 30720 KB, 60 cyl, 16 head, 63 sec, 512 bytes/sect x 
61441 sectors

[   1.0199904] boot device: ld0

but I get stuck in sys/kern/subr_disk_open.c in opendisk() at:

error = VOP_OPEN(tmpvn, FREAD | FSILENT, NOCRED);

ideas on what could be locking here?
Maybe related: the interrupt handler function I wrote uses
softint_establish() as there's no "real" hardware behind this
block device, is this the correct way to deal with it?

Anybody wanting to have a look https://imil.net/NetBSD/mmio.patch
It applies over this branch 
https://github.com/NetBSDfr/NetBSD-src/tree/GENPVH


--
--------
Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-16 Thread Emile 'iMil' Heitor




On 2/23/22 20:26, el16095 wrote:
Firecracker informs VMs about MMIO devices by appending to the boot 
command line a string like this "virtio_mmio.device=4K@0xd000:5" 
([virtio_mmio.]device=@:). So, from what I 
understand I'd need to write glue code that takes this information and 
uses it to setup the MMIO devices on the VM side the way Firecracker 
expects it to; and then attach virtio through that, right?


FYI I'm on it, based on Colin Percival's work here 
https://github.com/freebsd/freebsd-src/blob/main/sys/dev/virtio/mmio/virtio_mmio_cmdline.c

I'm getting some results but Firecracker uses MMIO v2 and we only have
v1 so there's still quite some work to do.

--
--------
Emile `iMil' Heitor  | https://imil.net


Re: VirtIO MMIO for amd64

2023-12-12 Thread Emile 'iMil' Heitor



On 2/23/22 20:26, el16095 wrote:

Firecracker informs VMs about MMIO devices by appending to the boot 
command line a string like this "virtio_mmio.device=4K@0xd000:5" 
([virtio_mmio.]device=@:). So, from what I 
understand I'd need to write glue code that takes this information and 
uses it to setup the MMIO devices on the VM side the way Firecracker 
expects it to; and then attach virtio through that, right?


Was there any more work on this topic?

--
--------
Emile `iMil' Heitor  | https://imil.net


Re: PVH boot with qemu

2023-12-11 Thread Emile `iMil' Heitor

On Mon, 11 Dec 2023, Emile `iMil' Heitor wrote:


We still need to check if we didn't break anything on Xen side and test
Firecracker. FYI qemu-system-x86_64 also works with the "microvm"
machine type.


I am able to boot this patched NetBSD kernel using Colin Percival's PVH-enabled
Firecracker https://github.com/firecracker-microvm/firecracker/pull/3155

Nevertheless, Firecracker, as the microvm qemu machine type, uses vioblk for
block devices, which we don't have. Yet.

--------
Emile `iMil' Heitor  | https://imil.net



Re: PVH boot with qemu

2023-12-11 Thread Emile `iMil' Heitor

On Mon, 11 Dec 2023, Manuel Bouyer wrote:


Yes, right now GENERIC can be used on bare-metal, PVHVM and XENPVH.
It would be good to have GENERIC working on GENPVH too.


Fair enough, I'll switch to this path then, thanks for the advice.


Emile `iMil' Heitor  | https://imil.net



Re: PVH boot with qemu

2023-12-11 Thread Emile `iMil' Heitor



Hi Manuel,

On Mon, 11 Dec 2023, Manuel Bouyer wrote:


#ifndef GENPVH
/* get a page for HYPERVISOR_shared_info */
addl$PAGE_SIZE, %ebx
addl$PGOFSET,%ebx
andl$~PGOFSET,%ebx
movl$RELOC(HYPERVISOR_shared_info_pa),%ebp
movl%ebx,(%ebp)
movl$0,4(%ebp)
#endif

How can this work on Xen when GENPVH is defined ?
Shouldn't this be made conditional on vm_guest == VM_GUEST_XENPVH ?


Well the point is that you don't define GENPVH when using Xen, PVH using
qemu and friends don't need HYPERVISOR_shared_info neither any of the
hypercall portion of the code. A big chunk of Xen related code is
ifndef'ed to GENPVH in hypervisor.c; And I was planning on isolating GENPVH
so there's as little ifdef's as possible.

Or would you prefer the same kernel to be able to boot in both XENPVH and
GENPVH modes? I am focusing on making the resulting kernel smaller but this
could be done also.


Emile `iMil' Heitor  | https://imil.net



Re: PVH boot with qemu

2023-12-10 Thread Emile `iMil' Heitor



Here is a clean(er) patch 
https://github.com/NetBSD/src/compare/trunk...NetBSDfr:NetBSD-src:GENPVH

Rationale

Like previously explained, locore.S expects start_info being passed by the
calling hypervisor on %ebx to be located at the end of the symbol table.
Qemu and Firecracker don't follow this rule which is not part of the
official Xen ABI https://xenbits.xen.org/docs/unstable/misc/pvh.html

What our patch first does is make memory mapping loops happy by copying
the start_info structure where it's expected.
After that, memory locations and boot parameters are correctly found and
boot can proceed.
Of course, the hypervisor not being Xen, a lot of Xen-related code is
useless, hence the new VM_GUEST_GENPVH (for Generic PVH) vm_guest type,
as first suggested by Manuel, and a new kernel option, GENPVH.
I kept the Xen code structure as it was and changed very little code, only
some || vm_guest == VM_GUEST_GENPVH and a couple #ifndef GENPVH.

In order to build a Generic PVH kernel, the following options are needed

#Xen PV support for PVH and HVM guests
options XENPVHVM
options XEN
# Generic PVH support (qemu, firecracker...)
options GENPVH

I've added 
https://github.com/NetBSDfr/NetBSD-src/blob/GENPVH/sys/arch/amd64/conf/MICROVM
as an example config file.
I'll probably end up ditching XENPVHVM and XEN but there's still quite
some work in there.

We still need to check if we didn't break anything on Xen side and test
Firecracker. FYI qemu-system-x86_64 also works with the "microvm"
machine type.

Feedback very welcome.

--------
Emile `iMil' Heitor  | https://imil.net



Re: PVH boot with qemu

2023-12-06 Thread Emile `iMil' Heitor



I got it working.

NetBSD/amd64 kernel booting in PVH mode straight from qemu -kernel flag.
It now needs a lot of cleaninig as it's basically a PoC, but here's a
WIP patch if anyone's interested in hacking into it.

https://imil.net/NetBSD/qemu-pvh.patch

Let me rephrase: I *know* it is ugly at the moment. I *will* make it
clean, just wanted to share the joy ;)

Cheers,


Emile `iMil' Heitor  | https://imil.net



Re: PVH boot with qemu

2023-11-29 Thread Emile `iMil' Heitor

On Wed, 29 Nov 2023, Manuel Bouyer wrote:


Of course, this is *not* a Xen VM, so no surprise that start_xen32
isn't working.


I'm just sharing the progress here, in case someone is interested. If this
is annoying, I'll just keep it to myself until I post an -hypothetical-
final patch, and sorry for the noise.


Emile `iMil' Heitor  | https://imil.net



Re: PVH boot with qemu

2023-11-28 Thread Emile `iMil' Heitor

On Thu, 23 Nov 2023, Emile `iMil' Heitor wrote:


It seems we have a similar problem to the second bullet point Colin Percival
noted here 
https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html

When removing the hvm_start_info address save portion, the sym mapping
doesn't fall into an infinite loop anymore.
Not yet sure how to fix that, I'll have a look at FreeBSD's commits on this
matter.


And so it was, in locore.S:start_xen32, this assumption is wrong when the
entrypoint is called from qemu:

/*
 * save addr of the hvm_start_info structure. This is also the end
 * of the symbol table
 */

this makes esym point to an address (%ebx + KERNBASE) which is not the
end of the symbol table.
Same goes with eblob which is calculated relative to %ebx.
A friend of mine, Gregory in CC, found that putting those 2 (esym and eblob)
to 0 made the paging init go fine as both tests (l.660 and 667) will trigger
jz 1f and keep %edi to __kernel_end.
This brings us to init_xen_early(), which is failing but that's another story.


Emile `iMil' Heitor  | https://imil.net



Re: PVH boot with qemu

2023-11-23 Thread Emile `iMil' Heitor

On Mon, 13 Nov 2023, Manuel Bouyer wrote:


On Mon, Nov 13, 2023 at 06:37:01AM +0100, Emile `iMil' Heitor wrote:

The start_xen32 entrypoint is then found, and the kernel start, but falls in
an infinite loop in locore.S when mapping symbols and preloaded modules,
more precisely, in the fillkpt_nox macro. I assume %ecx is wrong or the region
corrupted for some reason. 
https://github.com/NetBSD/src/blob/trunk/sys/arch/amd64/amd64/locore.S#L738


I don't think you can use start_xen32 as is, as it expects a Xen environnemwnt.
You may need to write a new start routine, or make a difference between Xen
vs non-Xen in the existing one.


It seems we have a similar problem to the second bullet point Colin Percival
noted here https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html
When removing the hvm_start_info address save portion, the sym mapping
doesn't fall into an infinite loop anymore.
Not yet sure how to fix that, I'll have a look at FreeBSD's commits on this
matter.


Emile `iMil' Heitor  | https://imil.net



PVH boot with qemu

2023-11-12 Thread Emile `iMil' Heitor



I first asked guidance in port-xen@ but the topic doesn't seem to have much
success, I'll try my chances here.

I am trying to make NetBSD/amd64 boot in PVH mode with qemu, using qemu's
-kernel flag. The kernel does start executing thanks to the first step
explained here 
https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html
i.e. adding PVH entry point to the kernel ELF notes.

   #define ELFNOTE(name, type, desctype, descdata...) \
  -.pushsection .note.name;   \
  +.pushsection .note.name, "a", @note;   \
 .align 4 ;   \
 .long 2f - 1f/* namesz */;   \
 .long 4f - 3f/* descsz */;   \
  @@ -588,6 +603,8 @@ next:   pop %edi
  movl%eax,(%ebp)

The start_xen32 entrypoint is then found, and the kernel start, but falls in
an infinite loop in locore.S when mapping symbols and preloaded modules,
more precisely, in the fillkpt_nox macro. I assume %ecx is wrong or the region
corrupted for some reason. 
https://github.com/NetBSD/src/blob/trunk/sys/arch/amd64/amd64/locore.S#L738

This is far from my comfort zone but I'm willing to go down the rabbit hole, yet
some advice on where to look and possible reasons of this loop would be greatly
appreciated.

Note that this feature would also allow NetBSD to run on AWS's Firecracker, a
microvm hypervisor used in their Lambda product.

Thanks,

--------
Emile `iMil' Heitor  | https://imil.net



Disabling drivers from a NetBSD kernel without `userconf(4)` nor rebuild

2023-01-31 Thread Emile `iMil' Heitor



Hi,

I've restarted working on a micro-NetBSD project I've documented back in 2020
here: 
https://imil.net/blog/posts/2020/fakecracker-netbsd-as-a-function-based-microvm/
The idea being to create a micro-service virtual machine that starts a NetBSD
kernel and a dedicated service in less than 200ms.
The project works but there are some elements to be improved. First, in order
to reduce boot speed, the kernel is directly called from qemu with the
-kernel flag, this means the project relies on multiboot(8), so for the time
being, it only works with an i386 kernel.
Second, in this proof of concept, it is mandatory to rebuild a very minimal
kernel in order to kick out every driver that's not absolutely necessary, but
this step would be really painful for someone not used and not wanting to dig
into kernel build etc.
My endgame is being able to build a NetBSD-based micro service with something
similar to:

$ mksmolnb https://cdn.NetBSD.org 9.3 nginx

which will download and prepare all the pieces with a process similar to what
I described in the blog post.

I began digging for a method that could permit to disable kernel drivers in
the same way userconf(4) does, and realized with the help of mlelstv@ on IRC
that I'd only need to alter autoconf tables to disable all the drivers I'd
want.
So I found where cfdata was, and how to overwrite the fstate with gdb --write
and finally came to this very nasty hack: 
https://gitlab.com/iMil/sailor/-/snippets/2491821
which basically permits to disable every driver in the kernel except those
needed directly in the kernel binary.
I am totally aware that this method could be made obsolete as soon as struct
cfdata is modified.

I just wanted to share this for anyone curious how to do this, and maybe get
some more ideas to make it cleaner.

Cheers,


Emile `iMil' Heitor  | https://imil.net



Re: x86 bootstrap features

2020-01-20 Thread Emile `iMil' Heitor



Hi Kamil, Emmanuel & all,

On Tue, 24 Sep 2019, Kamil Rytarowski wrote:


On 24.09.2019 14:26, Emmanuel Dreyfus wrote:

On Tue, Sep 24, 2019 at 01:31:51PM +0200, Kamil Rytarowski wrote:

My use-case is "qemu-system-x86_64 -kernel ./netbsd". Last I tried (with
multiboot2 patches merged) it still did not work.


I did not commit anything in multiboot support in the NetBSD kernel,
I only worked on bootstraps for now, hence the steady failure you
experience should come at no suprise.

For now our kernel has support code for multiboot 1 for i386 only.


qemu-system-i386 works, but -x86_64 not.

Are there plans to add it to the amd64 kernel?


Is there any news on this front? Being able to boot an amd64 kernel directly
from kvm would give NetBSD the ability to be started by AWS Firecracker[1] out
of the box which would be amazing.

[1]: https://github.com/firecracker-microvm/firecracker

--------
Emile `iMil' Heitor * 
  _
| http://imil.net| ASCII ribbon campaign ( )
| http://www.NetBSD.org  |  - against HTML email  X
| http://gcu.info|  & vCards / \


!DSPAM:5e25d89018011320695049!