Re: rumpkernel and bhyve: triple faults
On 10 Mar 2018, at 23:46, Martin Lucina wrote: > On Friday, 09.03.2018 at 18:45, Fabian Freyer wrote: >> On 6 Mar 2018, at 7:45, Fabian Freyer wrote: >>> I’m not sure where to go from here. Is this a bug in bhyve(4), should these >>> values be initialised somehow, or should I patch rumpkernel(7) to skip this >>> check >>> when running on bhyve(4)? > > You probably want to use a serial console rather than VGA on bhyve in any > case, so you'll want to add the appropriate checks to hypervisor.c and > cons.c. I’ve started on a patch, but the check should fail if bios_crtc_base is unset anyways. >> A build on Linux (which boots fine) shows [bios_com1_base, bios_crtc_base] >> not to >> be uninitialised: >> 003e3480 g O .bss0002 bios_com1_base >> 003e44a0 g O .bss0002 bios_crtc_base > > When you write "which boots fine", I presume you're referring to booting on > bhyve? Yes. The rumprun kernel built on Linux boots fine (including serial output by default on Linux. The one built on FreeBSD triple faults, due to accessing 0x2. >> Further down the rabbit hole, this goes on in rumprun.o: >> >> On Linux, bios_crtc_base is not a local symbol: >> 0002 O *COM* 0002 bios_crtc_base >> 0002 O *COM* 0002 bios_com1_base >> >> While on FreeBSD, they are marked as local: >> 0002 l O *COM* 0002 bios_crtc_base >> 0002 l O *COM* 0002 bios_com1_base > > That seems wrong. Can you try and force the toolchain to use the more > recent GNU ld from devel/binutils and see if that fixes the problem? I’m using GNU Binutils from devel/binutils: pkg list binutils | grep /usr/local/bin/ld /usr/local/bin/ld /usr/local/bin/ld.bfd /usr/local/bin/ld.gold $(x86_64-rumprun-netbsd-gcc -print-prog-name=ld) -v GNU ld (GNU Binutils) 2.30 Fabian signature.asc Description: OpenPGP digital signature
Re: rumpkernel and bhyve: triple faults
On 6 Mar 2018, at 7:45, Fabian Freyer wrote: > Tracking down bios_crtc_base, I find that it’s loaded in > rumprun/platform/hw/arch/amd64/locore.S:70: > > /* save BIOS data area values */ > movw BIOS_COM1_BASE, %bx > movw %bx, bios_com1_base > movw BIOS_CRTC_BASE, %bx > movw %bx, bios_crtc_base > > Where BIOS_CRTC_BASE is 0x463 and BIOS_COM1_BASE is 0x400. Checking the bhyve > device node in /dev/vmm with xxd(1), I find the words at these addresses to be > Uninitialised: > > 0400: .. > 0483: .. > > I’m not sure where to go from here. Is this a bug in bhyve(4), should these > values be initialised somehow, or should I patch rumpkernel(7) to skip this > check > when running on bhyve(4)? I’ve chased this bug down a bit further to what I believe is an issue with the rumprun toolchain I am building on FreeBSD with the misc/rumprun port [1]. objdump -t helloer-rumprun.elf list a number of symbols in the *COM* section, which holds unallocated C external variables [2]: objdump -t helloer-rumprun.elf | grep \*COM\* 0001 l O *COM* 0001 pic1mask 0004 l O *COM* 0004 pgalloc_totalkb 0004 l O *COM* 0004 pgalloc_usedkb 1000 l O *COM* 0020 multiboot_cmdline 0002 l O *COM* 0002 bios_crtc_base 0001 l O *COM* 0001 pic2mask 0002 l O *COM* 0002 bios_com1_base As the pagetable in pagetable.s maps the first page as non-present, accessing any of these will result in a fault. I’m pretty sure that these shouldn’t be undefined. A build on Linux (which boots fine) shows these not to be uninitialised: 003e3480 g O .bss 0002 bios_com1_base 003e44a0 g O .bss 0002 bios_crtc_base Further down the rabbit hole, this goes on in rumprun.o: On Linux, bios_crtc_base is not a local symbol: 0002 O *COM* 0002 bios_crtc_base 0002 O *COM* 0002 bios_com1_base While on FreeBSD, they are marked as local: 0002 l O *COM* 0002 bios_crtc_base 0002 l O *COM* 0002 bios_com1_base Fabian [1] https://svnweb.freebsd.org/ports/head/misc/rumprun/Makefile?view=markup=459195 [2] http://man7.org/linux/man-pages/man5/elf.5.html / SHN_COMMON signature.asc Description: OpenPGP digital signature
Re: rumpkernel and bhyve: triple faults
Hi Fabian, For a page-fault, the virtual address that resulted in the fault will be in the CR2 register. I don’t see a CR2 register in the output of bhyvectl --get-all, I was looking for that too. Oops, I'll add that to bhyvectl. I’m pretty sure it’s tooling that’s displaying something off, since hopper is showing me this as 0x00102a56 cmpword [0x2], 0x0 Which is very similar to what r2 is giving me: ;-- cons_init: 0x00102a50 53 push rbx; /arch/x86:43 0x00102a51 e8ea0a call sym.hypervisor_detect ; /arch/x86:47 0x00102a56 66833da4d5ef. cmp word [0x0002], 0; /arch/x86:62 This is reading the 16-bit value from memory location 0x2. Hard to see why this would generate a page-fault - the zero page is often mapped read-only. Perhaps rumpkernel doesn't have a mapping for it, but then, the offset for the access would be incorrect (maybe a linking issue with the location of variables ?). Maybe I’m off with my analysis of the actual fault here, but how I understand the source (assuming compilers work as I would expect, which is not always true) the values here are initialised from values in the bios data area (which is zeroed out on bhyve): It shouldn't matter that those were zero. Loading them into a memory location shouldn't be a problem. Here’s my full output from bhyvectl --get-all: ID Length Name 0 128MB sysmem Address Length Segment Offset Prot Flags 0 128MB sysmem 0 RWX efer[0] 0x0500 Ok, the guest is in 64-bit mode - the LMA bit is set. This implies that rumpkernel has set up it's own mappings, since the multiboot loader entered the guest in flat 32-bit mode. cr0[0] 0x80010031 Paging is enabled (bit 31) as expected. later, PEter.
Re: rumpkernel and bhyve: triple faults
> On 6 Mar 2018, at 9:28, Rodney W. Grimes wrote: > > >> bios_crtc_base would be part of the isa legacy vga > >> controller card. Bhyve does not, at this time, or > >> in the near future expect to have, support for this > >> legacy device. > > > > I am wrong on this, the framebuffer device does > > infact have support for the legacy i/o addresses > > that this should point to. You should see the > > "vgaconf" section of the FrameBuffer section > > of the bhyve(8) manpage. > > > > I believe you need to be running bhyve with the > > uefi bios options, the with CMS version, and > > have vgaconf=on to get your code to work as is. > > For diskless multiboot kernels I?m going with a > separate userboot.so-compatible loader. Specifying > a UEFI bootrom implicitly resets the CPU. > (See usr.sbin/bhyve/bhyverun.c:960) Well in that case my original claim that there is not a legacy isa vga device avaliable would be correct for this environment, and you should just eliminate anything that tries to use it, or make it understand that this device may not exist. I am not sure if bhyve maps any of these legacy addresses if your not using bhyveloader or uefi bios code. 0x400 and 0x483 being unmapped could lead to your tripple fault. > > I think deciding to use the serial output (which is > what most of rumpkernel?s cons_init is doing) based > on the hypervisor is probably the right way to go. > Something similar is already done for XEN: > > /* > * If running under Xen use the serial console. > */ > if (hypervisor == HYPERVISOR_XEN) > prefer_serial = 1; > > >>> rumprun/platform/hw/arch/x86/cons.c:59: > >>>649 0 350887182668 vm testing[0]: handled exception vmexit > >>> at 0x102a56 > >>> > >>> Therefore, I?m assuming this is the origin of the fault. > >>> > >>> Tracking down bios_crtc_base, I find that it?s loaded in > >>> rumprun/platform/hw/arch/amd64/locore.S:70: > >>> > >>> /* save BIOS data area values */ > >>> movw BIOS_COM1_BASE, %bx > >>> movw %bx, bios_com1_base > >>> movw BIOS_CRTC_BASE, %bx > >>> movw %bx, bios_crtc_base > >>> > >>> Where BIOS_CRTC_BASE is 0x463 and BIOS_COM1_BASE is 0x400. Checking > >>> the bhyve > >>> device node in /dev/vmm with xxd(1), I find the words at these > >>> addresses to be > >>> Uninitialised: > >>> > >>> 0400: .. > >>> 0483: .. > >> Typo here, should this be 0463? > Yes, sorry about that. > > Fabian > ___ > freebsd-virtualizat...@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscr...@freebsd.org" > > -- Rod Grimes rgri...@freebsd.org
Re: rumpkernel and bhyve: triple faults
> Hello lists, > > I?m currently playing around with getting rump kernels built with > rumpkernel(7) running on FreeBSD?s bhyve(4). I?m using a custom boot loader > [1] which builds on some patches to bhyveload / user boot [2]. > > To test, I?m using a simple ?helloer? unikernel from the tutorial [3]: > ... excelent discription of your debug process removed for breif reply ... > Due to compiler optimisations, the check here isn?t the > (hypervisor == HYPERVISOR_XEN) check directly after the call to > hypervisor_detect, > but the check (bios_crtc_base == 0) in bios_crtc_base would be part of the isa legacy vga controller card. Bhyve does not, at this time, or in the near future expect to have, support for this legacy device. > rumprun/platform/hw/arch/x86/cons.c:59: >649 0 350887182668 vm testing[0]: handled exception vmexit at > 0x102a56 > > Therefore, I?m assuming this is the origin of the fault. > > Tracking down bios_crtc_base, I find that it?s loaded in > rumprun/platform/hw/arch/amd64/locore.S:70: > > /* save BIOS data area values */ > movw BIOS_COM1_BASE, %bx > movw %bx, bios_com1_base > movw BIOS_CRTC_BASE, %bx > movw %bx, bios_crtc_base > > Where BIOS_CRTC_BASE is 0x463 and BIOS_COM1_BASE is 0x400. Checking the bhyve > device node in /dev/vmm with xxd(1), I find the words at these addresses to be > Uninitialised: > > 0400: .. > 0483: .. Typo here, should this be 0463? > I?m not sure where to go from here. Is this a bug in bhyve(4), No, it is not a bug, it is an unimplemented device. > should these > values be initialised somehow, or should I patch rumpkernel(7) to skip this > check > when running on bhyve(4)? rumpkernel is assuming or requiring the presence of legacy isa hardware, it should probably be taught that this may not exist. You could simply skip this check, but I expect you would then have a harder to find failure later when it tries to use the hardware it expects to be there. > > Fabian ... Full KTR trace removed for breif reply ... -- Rod Grimes rgri...@freebsd.org
Re: rumpkernel and bhyve: triple faults
Hi Peter, On 6 Mar 2018, at 16:15, Peter Grehan wrote: > Exception 14 is a page fault (SDM Vol3 ch 6.15). The exception type is > "fault" which means it is delivered at the address it was detected at. > > This cascaded very quickly into a triple-fault, so it looks like it could > possibly be an issue with the stack. One debug tool you do have is to get a > register dump on exit, with 'bhyvectl --get-all --vm='. > > For a page-fault, the virtual address that resulted in the fault will be in > the CR2 register. I don’t see a CR2 register in the output of bhyvectl --get-all, I was looking for that too. > From the code at the faulting address: > > > 00102a50 : > >102a50: push rbx > >102a51: call 103540 > >102a56: cmpWORD PTR [rip-0x102a5c],0x0# 2 >> > It's using RIP-relative addressing here, but objdump seems to think this may > be an offset in the current_lwp structure - is it possible that may have an > uninitialized value ? I’m pretty sure it’s tooling that’s displaying something off, since hopper is showing me this as 0x00102a56 cmpword [0x2], 0x0 Which is very similar to what r2 is giving me: ;-- cons_init: 0x00102a50 53 push rbx; /arch/x86:43 0x00102a51 e8ea0a call sym.hypervisor_detect ; /arch/x86:47 0x00102a56 66833da4d5ef. cmp word [0x0002], 0; /arch/x86:62 > (I don't believe this has anything to do with VGA). Maybe I’m off with my analysis of the actual fault here, but how I understand the source (assuming compilers work as I would expect, which is not always true) the values here are initialised from values in the bios data area (which is zeroed out on bhyve): #define BIOS_COM1_BASE 0x400 #define BIOS_CRTC_BASE 0x463 ... movw BIOS_COM1_BASE, %bx movw %bx, bios_com1_base movw BIOS_CRTC_BASE, %bx movw %bx, bios_crtc_base ... /* * If the BIOS says no CRTC is present use the serial console if * available. */ if (bios_crtc_base == 0) prefer_serial = 1; Here’s my full output from bhyvectl --get-all: ID Length Name 0 128MB sysmem Address Length Segment Offset Prot Flags 0 128MB sysmem 0 RWX efer[0] 0x0500 cr0[0] 0x80010031 cr3[0] 0x0010b000 cr4[0] 0x2620 dr7[0] 0x0400 rsp[0] 0x00100ff0 rip[0] 0x00102a56 rax[0] 0x rbx[0] 0x003eaa2b rcx[0] 0x68622065 rdx[0] 0x20657679 rsi[0] 0x00100fd0 rdi[0] 0x4000 rbp[0] 0x r8[0] 0x00100fdc r9[0] 0x00100fd8 r10[0] 0x00100fd4 r11[0] 0x r12[0] 0x r13[0] 0x r14[0] 0x r15[0] 0x rflags[0] 0x00010006 ds desc[0] 0x/0x/0xc093 es desc[0] 0x/0x/0xc093 fs desc[0] 0x/0x/0x0001c001 gs desc[0] 0x/0x/0x0001c001 ss desc[0] 0x/0x/0xc093 cs desc[0] 0x/0x/0xa09b tr desc[0] 0x/0x/0x008b ldtr desc[0]0x/0x/0x0082 gdtr[0] 0x00378040/0x002f idtr[0] 0x/0x cs[0] 0x0008 ds[0] 0x0018 es[0] 0x0018 fs[0] 0x gs[0] 0x ss[0] 0x0018 tr[0] 0x ldtr[0] 0x cr0_mask[0] 0x6020 cr0_shadow[0] 0x0021 cr4_mask[0] 0xffe8f800 cr4_shadow[0] 0x cr3_target_count[0] 0x cr3_target0[0] 0x cr3_target1[0] 0x cr3_target2[0] 0x cr3_target3[0] 0x pinbased_ctls[0]0x003f procbased_ctls[0] 0xf51865f2 procbased_ctls2[0] 0x10a2 gla[0] 0xfec41000 gpa[0] 0x entry_interruption_info[0] 0x tpr_threshold[0]0x instruction_error[0]0x exit_ctls[0]0x0033efff entry_ctls[0] 0x93ff host_pat[0] 0x0001050600070406 host_cr0[0] 0x8005003b host_cr3[0] 0x38045054 host_cr4[0] 0x001726e0 host_rip[0] 0x81435290 host_rsp[0] 0xfe003218d700 vmcs_pointer[0]
Re: rumpkernel and bhyve: triple faults
On 6 Mar 2018, at 9:28, Rodney W. Grimes wrote: bios_crtc_base would be part of the isa legacy vga controller card. Bhyve does not, at this time, or in the near future expect to have, support for this legacy device. I am wrong on this, the framebuffer device does infact have support for the legacy i/o addresses that this should point to. You should see the "vgaconf" section of the FrameBuffer section of the bhyve(8) manpage. I believe you need to be running bhyve with the uefi bios options, the with CMS version, and have vgaconf=on to get your code to work as is. For diskless multiboot kernels I’m going with a separate userboot.so-compatible loader. Specifying a UEFI bootrom implicitly resets the CPU. (See usr.sbin/bhyve/bhyverun.c:960) I think deciding to use the serial output (which is what most of rumpkernel’s cons_init is doing) based on the hypervisor is probably the right way to go. Something similar is already done for XEN: /* * If running under Xen use the serial console. */ if (hypervisor == HYPERVISOR_XEN) prefer_serial = 1; rumprun/platform/hw/arch/x86/cons.c:59: 649 0 350887182668 vm testing[0]: handled exception vmexit at 0x102a56 Therefore, I?m assuming this is the origin of the fault. Tracking down bios_crtc_base, I find that it?s loaded in rumprun/platform/hw/arch/amd64/locore.S:70: /* save BIOS data area values */ movw BIOS_COM1_BASE, %bx movw %bx, bios_com1_base movw BIOS_CRTC_BASE, %bx movw %bx, bios_crtc_base Where BIOS_CRTC_BASE is 0x463 and BIOS_COM1_BASE is 0x400. Checking the bhyve device node in /dev/vmm with xxd(1), I find the words at these addresses to be Uninitialised: 0400: .. 0483: .. Typo here, should this be 0463? Yes, sorry about that. Fabian