Re: rumpkernel and bhyve: triple faults

2018-03-11 Thread Fabian Freyer
On 10 Mar 2018, at 23:46, Martin Lucina wrote:
> On Friday, 09.03.2018 at 18:45, Fabian Freyer wrote:
>> On 6 Mar 2018, at 7:45, Fabian Freyer wrote:
>>> I’m not sure where to go from here. Is this a bug in bhyve(4), should these
>>> values be initialised somehow, or should I patch rumpkernel(7) to skip this 
>>> check
>>> when running on bhyve(4)?
>
> You probably want to use a serial console rather than VGA on bhyve in any
> case, so you'll want to add the appropriate checks to hypervisor.c and
> cons.c.

I’ve started on a patch, but the check should fail if bios_crtc_base is unset
anyways.

>> A build on Linux (which boots fine) shows [bios_com1_base, bios_crtc_base] 
>> not to
>> be uninitialised:
>> 003e3480 g O .bss0002 bios_com1_base
>> 003e44a0 g O .bss0002 bios_crtc_base
>
> When you write "which boots fine", I presume you're referring to booting on
> bhyve?

Yes. The rumprun kernel built on Linux boots fine (including serial output by 
default
on Linux. The one built on FreeBSD triple faults, due to accessing 0x2.

>> Further down the rabbit hole, this goes on in rumprun.o:
>>
>> On Linux, bios_crtc_base is not a local symbol:
>> 0002   O *COM*  0002 bios_crtc_base
>> 0002   O *COM*  0002 bios_com1_base
>>
>> While on FreeBSD, they are marked as local:
>> 0002 l O *COM*  0002 bios_crtc_base
>> 0002 l O *COM*  0002 bios_com1_base
>
> That seems wrong. Can you try and force the toolchain to use the more
> recent GNU ld from devel/binutils and see if that fixes the problem?

I’m using GNU Binutils from devel/binutils:

pkg list binutils | grep /usr/local/bin/ld
/usr/local/bin/ld
/usr/local/bin/ld.bfd
/usr/local/bin/ld.gold

$(x86_64-rumprun-netbsd-gcc -print-prog-name=ld) -v
GNU ld (GNU Binutils) 2.30

Fabian

signature.asc
Description: OpenPGP digital signature


Re: rumpkernel and bhyve: triple faults

2018-03-09 Thread Fabian Freyer
On 6 Mar 2018, at 7:45, Fabian Freyer wrote:
> Tracking down bios_crtc_base, I find that it’s loaded in
> rumprun/platform/hw/arch/amd64/locore.S:70:
>
>   /* save BIOS data area values */
>   movw BIOS_COM1_BASE, %bx
>   movw %bx, bios_com1_base
>   movw BIOS_CRTC_BASE, %bx
>   movw %bx, bios_crtc_base
>
> Where BIOS_CRTC_BASE is 0x463 and BIOS_COM1_BASE is 0x400. Checking the bhyve
> device node in /dev/vmm with xxd(1), I find the words at these addresses to be
> Uninitialised:
>
> 0400:  ..
> 0483:  ..
>
> I’m not sure where to go from here. Is this a bug in bhyve(4), should these
> values be initialised somehow, or should I patch rumpkernel(7) to skip this 
> check
> when running on bhyve(4)?

I’ve chased this bug down a bit further to what I believe is an issue with the
rumprun toolchain I am building on FreeBSD with the misc/rumprun port [1].

objdump -t helloer-rumprun.elf list a number of symbols in the *COM* section, 
which
holds unallocated C external variables [2]:

objdump -t helloer-rumprun.elf | grep \*COM\*
0001 l O *COM*   0001 pic1mask
0004 l O *COM* 0004 pgalloc_totalkb
0004 l O *COM* 0004 pgalloc_usedkb
1000 l O *COM* 0020 multiboot_cmdline
0002 l O *COM* 0002 bios_crtc_base
0001 l O *COM* 0001 pic2mask
0002 l O *COM* 0002 bios_com1_base

As the pagetable in pagetable.s maps the first page as non-present, accessing 
any
of these will result in a fault. I’m pretty sure that these shouldn’t be 
undefined.

A build on Linux (which boots fine) shows these not to be uninitialised:
003e3480 g O .bss   0002 bios_com1_base
003e44a0 g O .bss   0002 bios_crtc_base

Further down the rabbit hole, this goes on in rumprun.o:

On Linux, bios_crtc_base is not a local symbol:
0002   O *COM*  0002 bios_crtc_base
0002   O *COM*  0002 bios_com1_base

While on FreeBSD, they are marked as local:
0002 l O *COM*  0002 bios_crtc_base
0002 l O *COM*  0002 bios_com1_base

Fabian

[1] 
https://svnweb.freebsd.org/ports/head/misc/rumprun/Makefile?view=markup=459195
[2] http://man7.org/linux/man-pages/man5/elf.5.html / SHN_COMMON

signature.asc
Description: OpenPGP digital signature


Re: rumpkernel and bhyve: triple faults

2018-03-06 Thread Peter Grehan

Hi Fabian,


For a page-fault, the virtual address that resulted in the fault
will

be in the CR2 register.


I don’t see a CR2 register in the output of bhyvectl --get-all, I
was  looking for that too.


 Oops, I'll add that to bhyvectl.


I’m pretty sure it’s tooling that’s displaying something off, since
hopper is showing me this as

0x00102a56 cmpword [0x2], 0x0

Which is very similar to what r2 is giving me:

;-- cons_init:
0x00102a50  53 push rbx; /arch/x86:43
0x00102a51  e8ea0a call sym.hypervisor_detect  ; /arch/x86:47
0x00102a56  66833da4d5ef.  cmp word [0x0002], 0; /arch/x86:62


 This is reading the 16-bit value from memory location 0x2. Hard to see 
why this would generate a page-fault - the zero page is often mapped 
read-only. Perhaps rumpkernel doesn't have a mapping for it, but then, 
the offset for the access would be incorrect (maybe a linking issue with 
the location of variables ?).



Maybe I’m off with my analysis of the actual fault here, but how I understand
the source (assuming compilers work as I would expect, which is not always true)
the values here are initialised from values in the bios data area (which is
zeroed out on bhyve):


 It shouldn't matter that those were zero. Loading them into a memory 
location shouldn't be a problem.



Here’s my full output from bhyvectl --get-all:

ID  Length  Name
0   128MB   sysmem
Address Length  Segment Offset  Prot  Flags
0   128MB   sysmem  0   RWX
efer[0] 0x0500


 Ok, the guest is in 64-bit mode - the LMA bit is set. This implies 
that rumpkernel has set up it's own mappings, since the multiboot loader 
entered the guest in flat 32-bit mode.



cr0[0]  0x80010031


 Paging is enabled (bit 31) as expected.

later,

PEter.



Re: rumpkernel and bhyve: triple faults

2018-03-06 Thread Rodney W. Grimes
> On 6 Mar 2018, at 9:28, Rodney W. Grimes wrote:
> 
> >> bios_crtc_base would be part of the isa legacy vga
> >> controller card.  Bhyve does not, at this time, or
> >> in the near future expect to have, support for this
> >> legacy device.
> >
> > I am wrong on this, the framebuffer device does
> > infact have support for the legacy i/o addresses
> > that this should point to.  You should see the
> > "vgaconf" section of the FrameBuffer section
> > of the bhyve(8) manpage.
> >
> > I believe you need to be running bhyve with the
> > uefi bios options, the with CMS version, and
> > have vgaconf=on to get your code to work as is.
> 
> For diskless multiboot kernels I?m going with a
> separate userboot.so-compatible loader. Specifying
> a UEFI bootrom implicitly resets the CPU.
> (See usr.sbin/bhyve/bhyverun.c:960)

Well in that case my original claim that there
is not a legacy isa vga device avaliable would
be correct for this environment, and you should
just eliminate anything that tries to use it,
or make it understand that this device may not
exist.  I am not sure if bhyve maps any of these
legacy addresses if your not using bhyveloader
or uefi bios code.  0x400 and 0x483 being
unmapped could lead to your tripple fault.

> 
> I think deciding to use the serial output (which is
> what most of rumpkernel?s cons_init is doing) based
> on the hypervisor is probably the right way to go.
> Something similar is already done for XEN:
> 
>  /*
>   * If running under Xen use the serial console.
>   */
>  if (hypervisor == HYPERVISOR_XEN)
>  prefer_serial = 1;
> 
> >>> rumprun/platform/hw/arch/x86/cons.c:59:
> >>>649   0 350887182668 vm testing[0]: handled exception vmexit 
> >>> at 0x102a56
> >>>
> >>> Therefore, I?m assuming this is the origin of the fault.
> >>>
> >>> Tracking down bios_crtc_base, I find that it?s loaded in
> >>> rumprun/platform/hw/arch/amd64/locore.S:70:
> >>>
> >>>   /* save BIOS data area values */
> >>>   movw BIOS_COM1_BASE, %bx
> >>>   movw %bx, bios_com1_base
> >>>   movw BIOS_CRTC_BASE, %bx
> >>>   movw %bx, bios_crtc_base
> >>>
> >>> Where BIOS_CRTC_BASE is 0x463 and BIOS_COM1_BASE is 0x400. Checking 
> >>> the bhyve
> >>> device node in /dev/vmm with xxd(1), I find the words at these 
> >>> addresses to be
> >>> Uninitialised:
> >>>
> >>> 0400:  ..
> >>> 0483:  ..
> >> Typo here, should this be 0463?
> Yes, sorry about that.
> 
> Fabian
> ___
> freebsd-virtualizat...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to 
> "freebsd-virtualization-unsubscr...@freebsd.org"
> 
> 

-- 
Rod Grimes rgri...@freebsd.org



Re: rumpkernel and bhyve: triple faults

2018-03-06 Thread Rodney W. Grimes
> Hello lists,
> 
> I?m currently playing around with getting rump kernels built with 
> rumpkernel(7) running on FreeBSD?s bhyve(4). I?m using a custom boot loader 
> [1] which builds on some patches to bhyveload / user boot [2].
> 
> To test, I?m using a simple ?helloer? unikernel from the tutorial [3]:
> 
... excelent discription of your debug process removed for breif reply ...

> Due to compiler optimisations, the check here isn?t the
> (hypervisor == HYPERVISOR_XEN) check directly after the call to 
> hypervisor_detect,
> but the check (bios_crtc_base == 0) in

bios_crtc_base would be part of the isa legacy vga
controller card.  Bhyve does not, at this time, or
in the near future expect to have, support for this
legacy device.


> rumprun/platform/hw/arch/x86/cons.c:59:
>649   0 350887182668 vm testing[0]: handled exception vmexit at 
> 0x102a56
> 
> Therefore, I?m assuming this is the origin of the fault.
> 
> Tracking down bios_crtc_base, I find that it?s loaded in
> rumprun/platform/hw/arch/amd64/locore.S:70:
> 
>   /* save BIOS data area values */
>   movw BIOS_COM1_BASE, %bx
>   movw %bx, bios_com1_base
>   movw BIOS_CRTC_BASE, %bx
>   movw %bx, bios_crtc_base
> 
> Where BIOS_CRTC_BASE is 0x463 and BIOS_COM1_BASE is 0x400. Checking the bhyve
> device node in /dev/vmm with xxd(1), I find the words at these addresses to be
> Uninitialised:
> 
> 0400:  ..
> 0483:  ..
Typo here, should this be 0463?


> I?m not sure where to go from here. Is this a bug in bhyve(4),
No, it is not a bug, it is an unimplemented device.

> should these
> values be initialised somehow, or should I patch rumpkernel(7) to skip this 
> check
> when running on bhyve(4)?
rumpkernel is assuming or requiring the presence of legacy isa hardware,
it should probably be taught that this may not exist.  You could simply
skip this check, but I expect you would then have a harder to find
failure later when it tries to use the hardware it expects to be
there.

> 
> Fabian
 
... Full KTR trace removed for breif reply ...


-- 
Rod Grimes rgri...@freebsd.org



Re: rumpkernel and bhyve: triple faults

2018-03-06 Thread Fabian Freyer
Hi Peter,

On 6 Mar 2018, at 16:15, Peter Grehan wrote:
>  Exception 14 is a page fault (SDM Vol3 ch 6.15). The exception type is 
> "fault" which means it is delivered at the address it was detected at.
>
>  This cascaded very quickly into a triple-fault, so it looks like it could 
> possibly be an issue with the stack. One debug tool you do have is to get a 
> register dump on exit, with 'bhyvectl --get-all --vm='.
>
>  For a page-fault, the virtual address that resulted in the fault will be in 
> the CR2 register.

I don’t see a CR2 register in the output of bhyvectl --get-all, I was looking 
for that too.

>  From the code at the faulting address:
>
>  > 00102a50 :
>  >102a50:   push   rbx
>  >102a51:   call   103540 
>  >102a56:   cmpWORD PTR [rip-0x102a5c],0x0# 2 
> 
>
>  It's using RIP-relative addressing here, but objdump seems to think this may 
> be an offset in the current_lwp structure - is it possible that may have an 
> uninitialized value ?

I’m pretty sure it’s tooling that’s displaying something off, since hopper is 
showing me this as

0x00102a56 cmpword [0x2], 0x0

Which is very similar to what r2 is giving me:

;-- cons_init:
0x00102a50  53 push rbx; /arch/x86:43
0x00102a51  e8ea0a call sym.hypervisor_detect  ; /arch/x86:47
0x00102a56  66833da4d5ef.  cmp word [0x0002], 0; /arch/x86:62

>  (I don't believe this has anything to do with VGA).

Maybe I’m off with my analysis of the actual fault here, but how I understand
the source (assuming compilers work as I would expect, which is not always true)
the values here are initialised from values in the bios data area (which is
zeroed out on bhyve):

#define BIOS_COM1_BASE  0x400
#define BIOS_CRTC_BASE  0x463

...

movw BIOS_COM1_BASE, %bx
movw %bx, bios_com1_base
movw BIOS_CRTC_BASE, %bx
movw %bx, bios_crtc_base

...

/*
 * If the BIOS says no CRTC is present use the serial console if
 * available.
 */
if (bios_crtc_base == 0)
prefer_serial = 1;


Here’s my full output from bhyvectl --get-all:

ID  Length  Name
0   128MB   sysmem
Address Length  Segment Offset  Prot  Flags
0   128MB   sysmem  0   RWX
efer[0] 0x0500
cr0[0]  0x80010031
cr3[0]  0x0010b000
cr4[0]  0x2620
dr7[0]  0x0400
rsp[0]  0x00100ff0
rip[0]  0x00102a56
rax[0]  0x
rbx[0]  0x003eaa2b
rcx[0]  0x68622065
rdx[0]  0x20657679
rsi[0]  0x00100fd0
rdi[0]  0x4000
rbp[0]  0x
r8[0]   0x00100fdc
r9[0]   0x00100fd8
r10[0]  0x00100fd4
r11[0]  0x
r12[0]  0x
r13[0]  0x
r14[0]  0x
r15[0]  0x
rflags[0]   0x00010006
ds desc[0]  0x/0x/0xc093
es desc[0]  0x/0x/0xc093
fs desc[0]  0x/0x/0x0001c001
gs desc[0]  0x/0x/0x0001c001
ss desc[0]  0x/0x/0xc093
cs desc[0]  0x/0x/0xa09b
tr desc[0]  0x/0x/0x008b
ldtr desc[0]0x/0x/0x0082
gdtr[0] 0x00378040/0x002f
idtr[0] 0x/0x
cs[0]   0x0008
ds[0]   0x0018
es[0]   0x0018
fs[0]   0x
gs[0]   0x
ss[0]   0x0018
tr[0]   0x
ldtr[0] 0x
cr0_mask[0] 0x6020
cr0_shadow[0]   0x0021
cr4_mask[0] 0xffe8f800
cr4_shadow[0]   0x
cr3_target_count[0] 0x
cr3_target0[0]  0x
cr3_target1[0]  0x
cr3_target2[0]  0x
cr3_target3[0]  0x
pinbased_ctls[0]0x003f
procbased_ctls[0]   0xf51865f2
procbased_ctls2[0]  0x10a2
gla[0]  0xfec41000
gpa[0]  0x
entry_interruption_info[0]  0x
tpr_threshold[0]0x
instruction_error[0]0x
exit_ctls[0]0x0033efff
entry_ctls[0]   0x93ff
host_pat[0] 0x0001050600070406
host_cr0[0] 0x8005003b
host_cr3[0] 0x38045054
host_cr4[0] 0x001726e0
host_rip[0] 0x81435290
host_rsp[0] 0xfe003218d700
vmcs_pointer[0] 

Re: rumpkernel and bhyve: triple faults

2018-03-06 Thread Fabian Freyer

On 6 Mar 2018, at 9:28, Rodney W. Grimes wrote:


bios_crtc_base would be part of the isa legacy vga
controller card.  Bhyve does not, at this time, or
in the near future expect to have, support for this
legacy device.


I am wrong on this, the framebuffer device does
infact have support for the legacy i/o addresses
that this should point to.  You should see the
"vgaconf" section of the FrameBuffer section
of the bhyve(8) manpage.

I believe you need to be running bhyve with the
uefi bios options, the with CMS version, and
have vgaconf=on to get your code to work as is.


For diskless multiboot kernels I’m going with a
separate userboot.so-compatible loader. Specifying
a UEFI bootrom implicitly resets the CPU.
(See usr.sbin/bhyve/bhyverun.c:960)

I think deciding to use the serial output (which is
what most of rumpkernel’s cons_init is doing) based
on the hypervisor is probably the right way to go.
Something similar is already done for XEN:

/*
 * If running under Xen use the serial console.
 */
if (hypervisor == HYPERVISOR_XEN)
prefer_serial = 1;


rumprun/platform/hw/arch/x86/cons.c:59:
   649   0 350887182668 vm testing[0]: handled exception vmexit 
at 0x102a56


Therefore, I?m assuming this is the origin of the fault.

Tracking down bios_crtc_base, I find that it?s loaded in
rumprun/platform/hw/arch/amd64/locore.S:70:

/* save BIOS data area values */
movw BIOS_COM1_BASE, %bx
movw %bx, bios_com1_base
movw BIOS_CRTC_BASE, %bx
movw %bx, bios_crtc_base

Where BIOS_CRTC_BASE is 0x463 and BIOS_COM1_BASE is 0x400. Checking 
the bhyve
device node in /dev/vmm with xxd(1), I find the words at these 
addresses to be

Uninitialised:

0400:  ..
0483:  ..

Typo here, should this be 0463?

Yes, sorry about that.

Fabian