Re: sparc64 boot issue on qemu
On 01/06/2020 21:08, Jason A. Donenfeld wrote: > On Mon, Jun 1, 2020 at 1:54 PM Mark Cave-Ayland > wrote: >> >> On 01/06/2020 08:23, Jason A. Donenfeld wrote: >> >>> On Sun, May 31, 2020 at 3:18 AM Mark Cave-Ayland >>> wrote: >>>> >>>> AFAICT the problem here is the Forth being used at >>>> https://github.com/openbsd/src/blob/master/sys/arch/sparc64/dev/fb.c#L511: >>>> since the >>>> addr word isn't part of the IEEE-1275 specification, it is currently >>>> unimplemented in >>>> OpenBIOS. >>> >>> Actually, it looks to me like after this line runs: >>> >>> OF_interpret("stdout @ is my-self " >>> "addr char-height addr char-width " >>> "addr window-top addr window-left", >>> 4, , , , ); >>> >>> windowleft and windowtop contain legit addresses, but romwidth and >>> romheight have garbage in them. It might be possible to chalk this up >>> to bogus QEMU firmware, in which case, whatever. >> >> Sadly I think that's more due to luck than anything else. If you have a >> working boot >> loader then can you try booting qemu-system-sparc64 with -prom-env >> 'auto-boot?=false' >> and then entering the following definition of addr at the Forth prompt: >> >> : addr >> parse-word $find if >> cell + >> then >> ; >> >> followed by: >> >> boot >> >> That should give you a definition of addr that will return the address of a >> value >> type in Forth. > > Wow, that's magic, and works perfectly: > https://data.zx2c4.com/openbsd-qemu-sparc64-pretty-vga-with-serif-font.png > Pretty font too. > > It sounds like the issue we're facing here is that the addr function > is missing from QEMU's firmware? Would it be quasi interesting to > remove use of it from OpenBSD? Or should we take this over to QEMU > instead and get it implemented? Oh wow it looks great! I also have commit access to OpenBIOS so I can tidy that up and get it posted over on the OpenBIOS mailing list. Probably the main thing is to figure out what to do if the specified word doesn't exist. I'll also try and find a few mins to fire up my Mac Mini to see if it exists there to work out if it should be restricted to SPARC only. Note that I did my last merge a few days ago so it will be a little while before it hits QEMU git master, but I can certainly get it added in time for the next official QEMU release. ATB, Mark.
Re: sparc64 boot issue on qemu
On 01/06/2020 08:23, Jason A. Donenfeld wrote: > On Sun, May 31, 2020 at 3:18 AM Mark Cave-Ayland > wrote: >> >> AFAICT the problem here is the Forth being used at >> https://github.com/openbsd/src/blob/master/sys/arch/sparc64/dev/fb.c#L511: >> since the >> addr word isn't part of the IEEE-1275 specification, it is currently >> unimplemented in >> OpenBIOS. > > Actually, it looks to me like after this line runs: > > OF_interpret("stdout @ is my-self " > "addr char-height addr char-width " > "addr window-top addr window-left", > 4, , , , ); > > windowleft and windowtop contain legit addresses, but romwidth and > romheight have garbage in them. It might be possible to chalk this up > to bogus QEMU firmware, in which case, whatever. Sadly I think that's more due to luck than anything else. If you have a working boot loader then can you try booting qemu-system-sparc64 with -prom-env 'auto-boot?=false' and then entering the following definition of addr at the Forth prompt: : addr parse-word $find if cell + then ; followed by: boot That should give you a definition of addr that will return the address of a value type in Forth. ATB, Mark.
Re: sparc64 boot issue on qemu
On 31/05/2020 15:58, Theo de Raadt wrote: >> AFAICT the problem here is the Forth being used at >> https://github.com/openbsd/src/blob/master/sys/arch/sparc64/dev/fb.c#L511: >> since the >> addr word isn't part of the IEEE-1275 specification, it is currently >> unimplemented in >> OpenBIOS. >> >> Why is addr needed here? Does the fb.c driver try and change these values >> rather than >> just read them? > > Why does that matter? > > sparc64 isn't a IEEE-1275 openfirmware. > > It is a Sun openfirmware, meaning it is more than the vague > specification. An emulation must be able to emulate THE REAL HARDWARE. > > This should work. Well there are plenty of SUN-ims already included in OpenBIOS to enable Solaris to boot as far it does; I'm not against them, I was just commenting that this was the reason why it is currently unimplemented. > For another 64-bit cell_t usage see dev/prtc.c. > > For another "addr" usage, see romgetcursoraddr() A simple addr implementation for Forth values should be fairly easy to put together. Since I don't have access to any Sun hardware, can someone confirm the semantics of the addr word for me? In particular what does it return for: - Values (presumably this is a pointer to a 64-bit value?) - Defers (does it return a pointer to the deferred word?) - Words (is it the same as the ' word?) ATB, Mark.
Re: sparc64 boot issue on qemu
On 30/05/2020 00:19, Jason A. Donenfeld wrote: > Note that you need to run this with `-nographic`, because the kernel > crashes when trying to use vgafb on sparc64/qemu. I've witnessed two > varieties crashes: > > - https://data.zx2c4.com/openbsd-6.7-sparc64-vga-panic-miniroot67.png > This happens when booting up miniroot67.fs > > - https://data.zx2c4.com/openbsd-6.7-sparc64-vga-panic-after-installation.png > This happens after installation openbsd onto disk properly, and then > booting up into it. > > Passing `-nographic` prevents these from happening, since vgafb doesn't > bind to anything. > > I don't have a bsd.gdb in order to addr2line this, but if the miniroot > panic is related to the normal panic, and we then assume alignment > issues in fb_get_console_metrics, then I wonder if the below patch would > make a difference. On the other hand, a "data access fault" makes it > seem more likely that OF_interpret is just getting bogus addresses from > buggy qemu firmware. > > I probably have another two hours to go in waiting for this thing to > build... > > Jason > > --- a/sys/arch/sparc64/dev/fb.c > +++ b/sys/arch/sparc64/dev/fb.c > @@ -507,6 +507,7 @@ int > fb_get_console_metrics(int *fontwidth, int *fontheight, int *wtop, int > *wleft) > { > cell_t romheight, romwidth, windowtop, windowleft; > + uint64_t romheight_64, romwidth_64, windowtop_64, windowleft_64; > > /* >* Get the PROM font metrics and address > @@ -520,10 +521,15 @@ fb_get_console_metrics(int *fontwidth, int *fontheight, > int *wtop, int *wleft) > windowtop == 0 || windowleft == 0) > return (1); > > - *fontwidth = (int)*(uint64_t *)romwidth; > - *fontheight = (int)*(uint64_t *)romheight; > - *wtop = (int)*(uint64_t *)windowtop; > - *wleft = (int)*(uint64_t *)windowleft; > + memcpy(_64, (void *)romheight, sizeof(romheight_64)); > + memcpy(_64, (void *)romwidth, sizeof(romwidth_64)); > + memcpy(_64, (void *)windowtop, sizeof(windowtop_64)); > + memcpy(_64, (void *)windowleft, sizeof(windowleft_64)); > + > + *fontwidth = (int)romwidth_64; > + *fontheight = (int)romheight_64; > + *wtop = (int)windowtop_64; > + *wleft = (int)windowleft_64; > > return (0); > } AFAICT the problem here is the Forth being used at https://github.com/openbsd/src/blob/master/sys/arch/sparc64/dev/fb.c#L511: since the addr word isn't part of the IEEE-1275 specification, it is currently unimplemented in OpenBIOS. Why is addr needed here? Does the fb.c driver try and change these values rather than just read them? ATB, Mark.
Re: sparc64 boot issue on qemu
On 30/05/2020 10:54, Otto Moerbeek wrote: > https://cdn.openbsd.org/pub/OpenBSD/snapshots/sparc64/ > contains the unpatched miniroot. > > https://www.drijf.net/openbsd/disk.qcow2 > > is the disk image based on the miniroot containing the patch in the > firts post in this thread. > > Thanks for looking into this. > > Note that we did *not* observe boot failure on any real sparc64 > hardware. The bootblock changes I did for the 6.7 release were tested > on many different machines. Thanks for the test case which enables me to reproduce the issue. With ?fcode-verbose enabled you see this at the end of the FCode execution: ... ... 5acf : [ 0x8b7 ] 5ad0 : b(lit) [ 0x10 ] 5ad6 : [ 0x81e ] 5ad7 : 0= [ 0x34 ] 5ad8 : swap [ 0x49 ] 5ad9 : drop [ 0x46 ] 5ada : b?branch [ 0x14 ] (offset) 5 5ade : (compile) [ 0x8c8 ] 5adf : (compile) b(>resolve) [ 0xb2 ] OpenBSD IEEE 1275 Bootblock 2.0 Booting from device /pci@1fe,0/pci@1,1/ide@3/ide@1/cdrom@0 Try superblock read FFS v1 ufs-open complete .Looking for ofwboot in directory... . .. ofwboot Found it .Loading 1a1c8 bytes of file... Copying 2000 bytes to 4000 Copying 2000 bytes to 6000 Copying 2000 bytes to 8000 Copying 2000 bytes to a000 Copying 2000 bytes to c000 Copying 2000 bytes to e000 Copying 2000 bytes to 1 Copying 2000 bytes to 12000 Copying 2000 bytes to 14000 Copying 2000 bytes to 16000 Copying 2000 bytes to 18000 Copying 2000 bytes to 1a000 Copying 2000 bytes to 1c000 Copying 2000 bytes to 1e000 5ae0 : expect [ 0x8a ] Now that 0x8a is completely wrong since according to https://github.com/openbsd/src/blob/master/sys/arch/sparc64/stand/bootblk/bootblk.fth the last instruction should be exit which is 0x33. Since the FCode itself is located at load-base (0x4000) it looks to me from the above debug that you're loading ofwboot at the same address, overwriting the FCode. Once do-boot has finished executing, the FCode interpreter returns to execute the exit word which has now been overwritten: so instead of returning to the updated client context via exit to execute ofwboot, it executes expect which asks for input from the keyboard and then crashes because the stack is incorrect. My recommendation would be to load ofwboot at 0x6000 instead of 0x4000 which I believe will fix the issue. It's interesting you mention that this works on real hardware, since it doesn't agree with my reading of the IEEE-1275 specification so you're certainly relying on some undocumented behaviour here. ATB, Mark.
Re: sparc64 boot issue on qemu
On 29/05/2020 23:56, Jason A. Donenfeld wrote: > Oh that's a nice observation about `boot disk -V`. Doing so actually > got me booting up entirely: > > $ qemu-img convert -O qcow2 miniroot66.fs disk.qcow2 > $ qemu-img resize disk.qcow2 20G > $ qemu-system-sparc64 -m 1024 -drive file=disk.qcow2,if=ide -net > nic,model=ne2k_pci -net user -boot a -nographic -monitor none -serial > stdio I think the problem here is that you're asking OpenBIOS to boot from the (empty) floppy disk with "-boot a" rather than the qcow2 image which is normally attached to the first hard disk "-boot c". As this is the default, then I would expect the command line above to work if you simply drop "-boot a". Also is there a particular reason for using the ne2k_pci NIC instead of the default in-built sunhme device? I try and keep the documentation at https://wiki.qemu.org/Documentation/Platforms/SPARC as accurate as I can, so do look there for latest best practices and command line examples. Finally the version of qemu-system-sparc64 you are running can also boot from a virtio-blk-pci device (again see the above wiki page for details) if you are looking for the best emulated disk performance. ATB, Mark.
Re: sparc64 boot issue on qemu
On 30/05/2020 10:03, Otto Moerbeek wrote: > Hi, > > thanks for the hints, but an unpatched 6.7 miniroot still fails to > boot for me > > qemu-system-sparc64 -machine sun4u -m 1024 -drive \ > file=miniroot67.img,format=raw -nographic -serial stdio -monitor none > > OpenBIOS for Sparc64 > Configuration device id QEMU version 1 machine id 0 > kernel cmdline > CPUs: 1 x SUNW,UltraSPARC-IIi > UUID: ---- > Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08 > Type 'help' for detailed information > Trying disk:a... > Not a bootable ELF image > Not a bootable a.out image > > Loading FCode image... > Loaded 6882 bytes > entry point is 0x4000 > Evaluating FCode... > OpenBSD IEEE 1275 Bootblock 2.0 > .. > > And then hangs > > While the patched bootblocks do boot (but hang later after > > scsibus1 at softraid0: 256 targets > > > as before, > > -Otto Hmmm odd. Is it possible for you to upload your miniroot somewhere for me to take a quick look? I don't have a great deal of time right now, but I can run it through a debugger to see if anything obvious shows up. ATB, Mark.
Re: Status of openbsd/macppc port?
On 17/08/18 14:27, Mark Kettenis wrote: >> Obviously I can't categorically state that QEMU's emulation is perfect, >> but it can now reliably run all of Linux, MacOS, NetBSD and FreeBSD in >> my local tests which makes me suspect that OpenBSD is trying to do >> something different here. > > Runs fairly stable as long as there is enough RAM. There is an > (unknown) pmap bug that causes memory corruption as soon as the > machine starts swapping. Right, I wonder if this is related to the invalid memory accesses I'm seeing in QEMU? Fortunately it's fairly easy to boot different images within the VM, so let's go backwards in time... OpenBSD 6.1 - Boots to userspace, but hangs quickly at the installer shell OpenBSD 6.0 - Hangs on boot just after the USB controller initialises OpenBSD 5.9 - Boots to userspace, but hangs quickly at the installer shell (qemu console logs attempt to execute a NULL opcode, so looks like we're jumping off somewhere strange?) OpenBSD 5.8 - Hangs on boot just after the USB controller initialises (qemu console logs an attempt to execute an invalid/unsupported opcode: 00 - 1c - 17 - 0a (004ad5f8) 1) OpenBSD 5.7 - Lots of "mac_intr_establish called, not yet inited" warnings in the kernel dmesg output - However it boots to userspace and the installer shell seems stable OpenBSD 5.6 - Panics with a stack smash warning: OpenBSD 5.6 (RAMDISK) #163: Fri Aug 8 09:05:59 MDT 2014 dera...@macppc.openbsd.org:/usr/src/sys/arch/macppc/compile/RAMDISK real mem = 1073741824 (1024MB) avail mem = 1029210112 (981MB) warning: no entropy supplied by boot loader mainbus0 at root: model PowerMac3,1 cpu0 at mainbus0: 7400 (Revision 0x209): 900 MHz: L2 cache not enabled mem at mainbus0 not configured mpcpcibr0 at mainbus0 pci: uni-north pci0 at mpcpcibr0 bus 0 panic: smashed stack in ofw_enumerate_pcibus Stopped at Debugger+0x10: lwz r0,36(r1) 00a00ae4: end+0x561cc fp a00ac0 nfp a00ae0 001ee6dc: panic+0xe0 fp a00ae0 nfp a00b40 001e235c: __stack_smash_handler+0x18 fp a00b40 nfp a00b60 0037ea18: ofw_enumerate_pcibus+0x1b0 fp a00b60 nfp a00bc0 0031bc90: pciattach+0xf0 fp a00bc0 nfp a00bf0 001e3e50: config_attach+0x1f0 fp a00bf0 nfp a00c40 0037dc0c: mpcpcibrattach+0x3b0 fp a00c40 nfp a00d60 001e3e50: config_attach+0x1f0 fp a00d60 nfp a00db0 003095f0: dbdma_flush+0x4d8 fp a00db0 nfp a00e90 001e3e50: config_attach+0x1f0 fp a00e90 nfp a00ee0 RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb> trace 00a00ae4: end+0x561cc fp a00ac0 nfp a00ae0 001ee6dc: panic+0xe0 fp a00ae0 nfp a00b40 001e235c: __stack_smash_handler+0x18 fp a00b40 nfp a00b60 0037ea18: ofw_enumerate_pcibus+0x1b0 fp a00b60 nfp a00bc0 0031bc90: pciattach+0xf0 fp a00bc0 nfp a00bf0 001e3e50: config_attach+0x1f0 fp a00bf0 nfp a00c40 0037dc0c: mpcpcibrattach+0x3b0 fp a00c40 nfp a00d60 001e3e50: config_attach+0x1f0 fp a00d60 nfp a00db0 003095f0: dbdma_flush+0x4d8 fp a00db0 nfp a00e90 001e3e50: config_attach+0x1f0 fp a00e90 nfp a00ee0 002f63ec: cpu_configure+0x24 fp a00ee0 nfp a00f00 001c525c: main+0x3f0 fp a00f00 nfp a00f40 001001bc: kernel_text+0xa8 fp a00f40 nfp 0 ddb> ps PID PPID PGRPUID S FLAGS WAIT COMMAND *0 -1 0 0 7 0x10200swapper ddb> OpenBSD 5.5 - Lots of "mac_intr_establish called, not yet inited" warnings in the kernel dmesg output - Panics on boot when initialising USB: uhub0 at usb0 "Apple OHCI root hub" rev 1.00/1.00 addr 1 panic: trap type 600 at 2cf4a0 (mtx_enter+0x28) lr 2cf490 Stopped at Debugger+0x10: lwz r0,20(r1) 00fc: tlbdsmsize+0x14 fp 94ba70 nfp 94ba80 001cec40: panic+0xd0 fp 94ba80 nfp 94bae0 002ce8cc: trap+0x184 fp 94bae0 nfp 94bb60 00100900: ddblow+0x1ac fp 94bb60 nfp 94bc10 002cf48c: mtx_enter+0x14 fp 94bc10 nfp 94bc20 001c4a50: config_attach+0x200 fp 94bc20 nfp 94bc60 00351018: mpcpcibrattach+0x3b0 fp 94bc60 nfp 94bd80 001c4a40: config_attach+0x1f0 fp 94bd80 nfp 94bdc0 002e4af0: mb_matchname+0x4e8 fp 94bdc0 nfp 94beb0 001c4a40: config_attach+0x1f0 fp 94beb0 nfp 94bef0 RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb> trace 00fc: tlbdsmsize+0x14 fp 94ba70 nfp 94ba80 001cec40: panic+0xd0 fp 94ba80 nfp 94bae0 002ce8cc: trap+0x184 fp 94bae0 nfp 94bb60 00100900: ddblow+0x1ac fp 94bb60 nfp 94bc10 002cf48c: mtx_enter+0x14 fp 94bc10 nfp 94bc20 001c4a50: config_attach+0x200 fp 94bc20 nfp 94bc60 00351018: mpcpcibrattach+0x3b0 fp 94bc60 nfp 94bd80 001c4a40: config_attach+0x1f0 fp 94bd80 nfp 94bdc0 002e4af0: mb_matchname+0x4e8 fp 94bdc0 nfp 94beb0 001c4a40: config_attach+0x1f0 fp 94beb0 nfp 94bef0 002d1d9c: cpu_configure+0x24 fp 94bef0 nfp 94bf00 001a7314: main+0x3cc fp 94bf00 nfp 94bf40 001001bc: kernel_text+0xa8 fp 94bf40 nfp 0 ddb> ps PID PPID PGRPUID S FLAGS WAIT COMMAND *0 -1 0 0 7
Re: Status of openbsd/macppc port?
On 17/08/18 13:55, Solene Rapenne wrote: > I'm using the macppc port since 6.1 to -current and apart failing > harware I don't have any issue while playing Doom or rebuilind ports :) Hmmm. 6.1 is the latest version that I can boot to userspace, even if it faults quickly after a few keypresses (QEMU is generally really strict on invalid memory accesses which is basically what I see, but once the access is tracked down it would be possible to fix it). I'd be interested to know if you are able to at least boot a 6.3 installation CDROM on the Mac Mini to the installer without hanging, which is probably the closest match to what I'm doing on real hardware. ATB, Mark.
Re: Status of openbsd/macppc port?
On 17/08/18 13:37, Solene Rapenne wrote: > Mark Cave-Ayland wrote: >> Hi all, >> >> I was just wondering what is the current state of the openbsd/macppc >> port? As part of my recent work on qemu-system-ppc I now have a patch >> that can boot OpenBSD macppc under the New World (-M mac99,via=pmu) >> machine but I'm seeing quite a bit of instability in OpenBSD compared to >> all my other test OSs. > Hello > > I can't help you much with your qemu issue but I can confirm you that > the OpenBSD macppc port works really well as I use 2 macppc devices (an > mac mini and a powerbook) often. The sad state is that less and less > ports are running on them. Thanks for the response Solene. Can I ask which version of openbsd/macppc you are currently running? ATB, Mark.
Re: Status of openbsd/macppc port?
On 17/08/18 13:34, Jonathan Gray wrote: > On Fri, Aug 17, 2018 at 12:15:10PM +0100, Mark Cave-Ayland wrote: >> Hi all, >> >> I was just wondering what is the current state of the openbsd/macppc >> port? As part of my recent work on qemu-system-ppc I now have a patch >> that can boot OpenBSD macppc under the New World (-M mac99,via=pmu) >> machine but I'm seeing quite a bit of instability in OpenBSD compared to >> all my other test OSs. >> >> For those that are interested I have included screenshots below: >> >> OpenBSD 6.3 >> - Hangs just after USB detection >> - https://www.ilande.co.uk/tmp/qemu/openbsd-6.3.png >> >> OpenBSD 6.2 >> - Panics just after USB detection >> - https://www.ilande.co.uk/tmp/qemu/openbsd-6.2.png >> >> OpenBSD 6.1 >> - Boots all the way to the installer but causes qemu-system-ppc to >> terminate fairly easily after pressing a few keys with "qemu: fatal: >> ERROR: instruction should not need address translation" >> - https://www.ilande.co.uk/tmp/qemu/openbsd-6.1.png >> >> Note I also get a constant stream of messages on the console related to >> OpenPIC: >> >> qemu-system-ppc: openpic_iack: bad raised IRQ 47 ctpr 8 ivpr 0x4047002f >> qemu-system-ppc: openpic_iack: bad raised IRQ 47 ctpr 8 ivpr 0x4047002f >> qemu-system-ppc: openpic_iack: bad raised IRQ 47 ctpr 8 ivpr 0x4047002f >> qemu-system-ppc: openpic_iack: bad raised IRQ 47 ctpr 8 ivpr 0x4047002f >> qemu-system-ppc: openpic_iack: bad raised IRQ 28 ctpr 8 ivpr 0x4045001c >> qemu-system-ppc: openpic_iack: bad raised IRQ 28 ctpr 8 ivpr 0x4045001c >> qemu-system-ppc: openpic_iack: bad raised IRQ 28 ctpr 8 ivpr 0x4045001c >> etc. >> >> >> Obviously I can't categorically state that QEMU's emulation is perfect, >> but it can now reliably run all of Linux, MacOS, NetBSD and FreeBSD in >> my local tests which makes me suspect that OpenBSD is trying to do >> something different here. > > Builds are done natively on real hardware (xserves). Your work on > qemu-system-ppc would be improved by being able to compare to a real > machine while it is still possible to find some that work. You could > search bugs@ but I don't believe any of those problems have been reported > running on actual macppc machines. Thanks for information. I guess there is a difference between being able to build and run the guest OS - for example do the builds get regularly tested on any Sawtooth-type PowerMac3,1 machines (which is effectively what QEMU is trying to emulate)? FWIW from the screenshots above the "bad IRQs" being complained about above can be show to be macgpio1 (IRQ 47) and ohci0 (IRQ 28). Is there anything special about these interrupts at all, e.g. edge vs. level triggering? ATB, Mark.
Status of openbsd/macppc port?
Hi all, I was just wondering what is the current state of the openbsd/macppc port? As part of my recent work on qemu-system-ppc I now have a patch that can boot OpenBSD macppc under the New World (-M mac99,via=pmu) machine but I'm seeing quite a bit of instability in OpenBSD compared to all my other test OSs. For those that are interested I have included screenshots below: OpenBSD 6.3 - Hangs just after USB detection - https://www.ilande.co.uk/tmp/qemu/openbsd-6.3.png OpenBSD 6.2 - Panics just after USB detection - https://www.ilande.co.uk/tmp/qemu/openbsd-6.2.png OpenBSD 6.1 - Boots all the way to the installer but causes qemu-system-ppc to terminate fairly easily after pressing a few keys with "qemu: fatal: ERROR: instruction should not need address translation" - https://www.ilande.co.uk/tmp/qemu/openbsd-6.1.png Note I also get a constant stream of messages on the console related to OpenPIC: qemu-system-ppc: openpic_iack: bad raised IRQ 47 ctpr 8 ivpr 0x4047002f qemu-system-ppc: openpic_iack: bad raised IRQ 47 ctpr 8 ivpr 0x4047002f qemu-system-ppc: openpic_iack: bad raised IRQ 47 ctpr 8 ivpr 0x4047002f qemu-system-ppc: openpic_iack: bad raised IRQ 47 ctpr 8 ivpr 0x4047002f qemu-system-ppc: openpic_iack: bad raised IRQ 28 ctpr 8 ivpr 0x4045001c qemu-system-ppc: openpic_iack: bad raised IRQ 28 ctpr 8 ivpr 0x4045001c qemu-system-ppc: openpic_iack: bad raised IRQ 28 ctpr 8 ivpr 0x4045001c etc. Obviously I can't categorically state that QEMU's emulation is perfect, but it can now reliably run all of Linux, MacOS, NetBSD and FreeBSD in my local tests which makes me suspect that OpenBSD is trying to do something different here. ATB, Mark.
Re: hme: incorrect register endian for PCI sun hme devices?
On 14/08/17 21:18, Mark Kettenis wrote: >> So tracing through HME register writes it seems the difference between >> OpenBSD and the other OSs is that OpenBSD appears to write to the >> virtual address 0x40008098000 with a standard (0x80) primary ASI, >> whereas the other OSs seem to write directly to the physical address >> 0x1ff0400 with a physical LE ASI. >> >> Is this because in OpenBSD the memory is being allocated as DVMA memory >> via the IOMMU? > > Ah, no. For memory mapped io it seems we create an actual > little-endian memory mapping (i.e. with the IE bit set). That was > probably done to support mapping framebuffers. Ah yes, I bet that's it - thanks for the pointer! Not sure it's going to be the easiest job to implement though. ATB, Mark.
Re: hme: incorrect register endian for PCI sun hme devices?
On 14/08/17 14:25, Mark Kettenis wrote: >> Great, thanks for the information - the fact that the nsphy0 has been >> detected correctly means that the access still works. Looks like I'll >> have to go digging deeper. > > The OpenBSD code uses %asi if necessary to let the hardware do the > byteswapping. Howver, I think the psycho(4) host bridge also does an > implicit byteswap. Always has been a bit confusing to me. But the > code defenitely works correctly on real hardware. So tracing through HME register writes it seems the difference between OpenBSD and the other OSs is that OpenBSD appears to write to the virtual address 0x40008098000 with a standard (0x80) primary ASI, whereas the other OSs seem to write directly to the physical address 0x1ff0400 with a physical LE ASI. Is this because in OpenBSD the memory is being allocated as DVMA memory via the IOMMU? ATB, Mark.
Re: hme: incorrect register endian for PCI sun hme devices?
On 13/08/17 16:52, Kaashif Hymabaccus wrote: > Hello Mark, > > I have a Sun Ultra 5 with the following dmesg: > > console is /pci@1f,0/pci@1,1/ebus@1/se@14,40:a > Copyright (c) 1982, 1986, 1989, 1991, 1993 > The Regents of the University of California. All rights reserved. > Copyright (c) 1995-2017 OpenBSD. All rights reserved. https://www.OpenBSD.org > > OpenBSD 6.1-current (GENERIC) #225: Fri Aug 11 19:58:43 MDT 2017 > dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/GENERIC > real mem = 536870912 (512MB) > avail mem = 512393216 (488MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root: Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 270MHz) > cpu0 at mainbus0: SUNW,UltraSPARC-IIi (rev 1.3) @ 269.802 MHz > cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external (64 > b/l) > psycho0 at mainbus0 addr 0xfffc4000: SUNW,sabre, impl 0, version 0, ign 7c0 > psycho0: bus range 0-2, PCI bus 0 > psycho0: dvma map c000-dfff > pci0 at psycho0 > ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x11 > pci1 at ppb0 bus 1 > ebus0 at pci1 dev 1 function 0 "Sun PCIO EBus2" rev 0x01 > auxio0 at ebus0 addr 726000-726003, 728000-728003, 72a000-72a003, > 72c000-72c003, 72f000-72f003 > power0 at ebus0 addr 724000-724003 ivec 0x25 > "SUNW,pll" at ebus0 addr 504000-504002 not configured > sab0 at ebus0 addr 40-40007f ivec 0x2b: rev 3.2 > sabtty0 at sab0 port 0: console > sabtty1 at sab0 port 1 > comkbd0 at ebus0 addr 3083f8-3083ff ivec 0x29: no keyboard > comms0 at ebus0 addr 3062f8-3062ff ivec 0x2a > wsmouse0 at comms0 mux 0 > lpt0 at ebus0 addr 3043bc-3043cb, 30015c-30015d, 70-7f ivec 0x22: > polled > "fdthree" at ebus0 addr 3023f0-3023f7, 706000-70600f, 72-720003 ivec 0x27 > not configured > clock1 at ebus0 addr 0-1fff: mk48t59 > "flashprom" at ebus0 addr 0-f not configured > audioce0 at ebus0 addr 20-2000ff, 702000-70200f, 704000-70400f, > 722000-722003 ivec 0x23 ivec 0x24: nvaddrs 0 > audio0 at audioce0 > hme0 at pci1 dev 1 function 1 "Sun HME" rev 0x01: ivec 0x7e1, address > 08:00:20:19:39:20 > nsphy0 at hme0 phy 1: DP83840 10/100 PHY, rev. 1 > machfb0 at pci1 dev 2 function 0 "ATI Mach64" rev 0x9a > machfb0: ATY,GT-B, 1152x900 > wsdisplay0 at machfb0 mux 1 > wsdisplay0: screen 0 added (std, sun emulation) > pciide0 at pci1 dev 3 function 0 "CMD Technology PCI0646" rev 0x03: DMA, > channel 0 configured to native-PCI, channel 1 configured to native-PCI > pciide0: using ivec 0x7e0 for native-PCI interrupt > wd0 at pciide0 channel 0 drive 0: > wd0: 16-sector PIO, LBA48, 117800MB, 241254720 sectors > wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 > atapiscsi0 at pciide0 channel 1 drive 0 > scsibus1 at atapiscsi0: 2 targets > cd0 at scsibus1 targ 0 lun 0:ATAPI 5/cdrom > removable > wd1 at pciide0 channel 1 drive 1: > wd1: 16-sector PIO, LBA, 19546MB, 40031712 sectors > cd0(pciide0:1:0): using PIO mode 4, DMA mode 2 > wd1(pciide0:1:1): using PIO mode 4, DMA mode 2 > ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x11 > pci2 at ppb1 bus 2 > vscsi0 at root > scsibus2 at vscsi0: 256 targets > softraid0 at root > scsibus3 at softraid0: 256 targets > bootpath: /pci@1f,0/pci@1,1/ide@3,0/disk@0,0 > root on wd0a (f52f0bbc65e53556.a) swap on wd0b dump on wd0b > > It has a PCI hme card and it works great. > > I would be happy to help if you want to test some diff or program, but > I am not knowledgeable enough to comment on the inner workings of the > hme driver. Great, thanks for the information - the fact that the nsphy0 has been detected correctly means that the access still works. Looks like I'll have to go digging deeper. ATB, Mark.
hme: incorrect register endian for PCI sun hme devices?
Hi all, Does anyone have any real Sun hardware containing a PCI hme card running OpenBSD, and if so does it work with the current 6.1 release? I've been working on a virtual hme device for qemu-system-sparc64 in the hope of getting working networking on *BSD images and I have a driver that now works well for pretty much all OSs... except OpenBSD. Looking at the hme driver register accesses on OpenBSD, the issue appears to be that all accesses to the hme register blocks defined in if_hme_pci.c (SEB, ETX, ERX, MAC, MIF) are done using big endian accesses, whereas for PCI devices these need to be done using little endian accesses. Is there something I've missed in the hme device emulation or is there something amiss with the hme driver? I see that it is shared between PCI and SBus so perhaps that is part of the puzzle? ATB, Mark.
Re: sparc64 pmap diff
On 17/04/16 20:42, Mark Kettenis wrote: > Ran into an interesting problem with the sparc64 pmap bootstrapping > code. Early on, we ask the firmware what physical memory is > available. Later we use this memory to set up the kernel page tables, > kernel stack and per-cpu data structures. We explicitly tell the > firmware about the mappings of these data structure as the firmware is > handling page faults for us at this stage. To store these mappings > the firmware may need to allocate more memory. And if it happens to > allocate memory that we're using for some other purpose, bad things > will happen. In my case dmesg stopped working because its mappings > were messed up. > > The following diff attempts to fix this issue by telling the firmware > which pages we're stealing. It's not perfect as it doesn't prevent us > from allocating the same pages as the firmware is allocating. > > Tests on a wide variety of sparc64 hardware would be welcome. > > > Index: pmap.c > === > RCS file: /cvs/src/sys/arch/sparc64/sparc64/pmap.c,v > retrieving revision 1.96 > diff -u -p -r1.96 pmap.c > --- pmap.c27 Nov 2015 15:34:01 - 1.96 > +++ pmap.c17 Apr 2016 19:17:45 - > @@ -2869,6 +2869,7 @@ pmap_get_page(paddr_t *pa, const char *w > *pa = VM_PAGE_TO_PHYS(pg); > } else { > uvm_page_physget(pa); > + prom_claim_phys(*pa, PAGE_SIZE); > pmap_zero_phys(*pa); > } This patch feels wrong - essentially it is just hiding the fact there is a missing prom_claim_phys() or prom_alloc_phys() somewhere at the point of allocation. Can you give more information about the particular case you describe above? ATB, Mark.
SPARC64: suggested fixes for OF interface
Hi all, From my work on running OpenBSD under OpenBIOS/QEMU, I found a couple of bugs in the NetBSD OF bindings for SPARC64 which also seem to be relevant to OpenBSD. I've applied patches to OpenBIOS to compensate for these bugs which allows OpenBSD to boot under QEMU, but thought that as there is interest here it would be worth documenting them for the sake of correctness. 1) OF_close has the wrong number of return arguments src/sys/arch/sparc64/stand/ofwboot/Locore.c specifies the OF_close has args.nreturns == 1. From the IEEE1275 specification we can see that the close word doesn't return any arguments, and so args.nreturns should be set to 0. OpenBIOS currently compensates for this and issues a warning when debugging is enabled. 2) OF_test_method takes a phandle not an ihandle, and also returns 0 on success src/sys/arch/sparc64/sparc64/ofw_machdep.c calls OF_test_method with an ihandle instead of an phandle as detailed in the Open Firmware working group proposal at http://www.openfirmware.org/1275/proposals/Closed/Accepted/270-it.txt (WARNING: the above link is currently down, however Google still has a cached version available). Similarly the Forth word signature looks like this: test-method ( method-cstr phandle -- missing-flag? ) This means that missing-flag? should be true if the method is missing and false if it is present, which indicates that the check to determine the existence of SUNW,retain in ofw_machdep.c is the wrong way around, i.e. the result comparison should be == 0 rather than != 0. What happens at the moment is that calling OF_test_method with an ihandle causes an exception and so the client inferface returns -1 to indicate failure. However since the result is checked for != 0 then this is taken to indicate that SUNW,retain exists which is why this currently works on some real PROMs. It's worth mentioning that this fixes test-method on E250/E450 systems and so the NetBSD folks were able to remove the is_e250 hack after testing on real hardware. For interested parties the corresponding NetBSD diff can be found at http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/sparc64/sparc64/ofw_machdep.c.diff?r1=1.41r2=1.42f=h. ATB, Mark.
sparc64: fledgling QEMU support
Hi all, Following up from my posts at the beginning of the summer, I'm pleased to announce that as of today, qemu-system-sparc64 built from QEMU git master will successfully install OpenBSD from an .iso and boot back into it in serial mode with its default sun4u emulation: $ ./qemu-system-sparc64 -cdrom install55.iso -boot d -nographic OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 kernel cmdline CPUs: 1 x SUNW,UltraSPARC-IIi UUID: ---- Welcome to OpenBIOS v1.1 built on Aug 26 2014 12:48 Type 'help' for detailed information Trying cdrom:f... Not a bootable ELF image Not a bootable a.out image Loading FCode image... Loaded 4829 bytes entry point is 0x4000 OpenBSD IEEE 1275 Bootblock 1.3 .. Jumping to entry point 0010 for type 0001... switching to new context: entry point 0x10 stack 0xffe8aa09 OpenBSD BOOT 1.6 Trying bsd... open /pci@1fe,0/pci-ata@5/ide1@2200/cdrom@0:f/etc/random.seed: No such file or directory Booting /pci@1fe,0/pci-ata@5/ide1@2200/cdrom@0:f/bsd 3901336@0x100+6248@0x13b8798+3261984@0x180+932320@0x1b1c620 symbols @ 0xffc5a300 119 start=0x100 Unexpected client interface exception: -1 console is /pci@1fe,0/ebus@3/su Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2014 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 5.5 (RAMDISK) #153: Tue Mar 4 15:12:10 MST 2014 dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/RAMDISK real mem = 134217728 (128MB) avail mem = 122011648 (116MB) mainbus0 at root: OpenBiosTeam,OpenBIOS cpu0 at mainbus0: SUNW,UltraSPARC-IIi (rev 9.1) @ 100 MHz cpu0: physical 256K instruction (64 b/l), 16K data (32 b/l), 256K external (64 b/l) psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0 psycho0: bus range 0-2, PCI bus 0 psycho0: dvma map c000-dfff pci0 at psycho0 ppb0 at pci0 dev 1 function 0 Sun Simba rev 0x11 pci1 at ppb0 bus 1 ppb1 at pci0 dev 1 function 1 Sun Simba rev 0x11 pci2 at ppb1 bus 2 unknown vendor 0x1234 product 0x (class display subclass VGA, rev 0x00) at pci0 dev 2 function 0 not configured ebus0 at pci0 dev 3 function 0 Sun PCIO EBus2 rev 0x01 fdthree at ebus0 addr 0- not configured com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo com0: console kb_ps2 at ebus0 addr 60-67 not configured Realtek 8029 rev 0x00 at pci0 dev 4 function 0 not configured pciide0 at pci0 dev 5 function 0 CMD Technology PCI0646 rev 0x07: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide0: using ivec 0x7d4 for native-PCI interrupt pciide0: channel 0 disabled (no drives) atapiscsi0 at pciide0 channel 1 drive 0 scsibus0 at atapiscsi0: 2 targets cd0 at scsibus0 targ 0 lun 0: QEMU, QEMU DVD-ROM, 2.1. ATAPI 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 prtc0 at mainbus0 softraid0 at root scsibus1 at softraid0: 256 targets bootpath: /pci@1fe,0/pci-ata@5,0/ide1@2200,0/cdrom@0,0:f root on rd0a swap on rd0b dump on rd0b unix-gettod:interpret: exception -13 caught interpret h# 01c099fc unix-gettod failed with error ffed WARNING: bad date in battery clock -- CHECK AND RESET THE DATE! erase ^?, werase ^W, kill ^U, intr ^C, status ^T Welcome to the OpenBSD/sparc64 5.5 installation program. (I)nstall, (U)pgrade, (A)utoinstall or (S)hell? I At any prompt except password prompts you can escape to a shell by typing '!'. Default answers are shown in []'s and are selected by pressing RETURN. You can exit this program at any time by pressing Control-C, but this can leave your system in an inconsistent state. Terminal type? [sun] System hostname? (short form, e.g. 'foo') openbsd Available network interfaces are: vlan0. Which network interface do you wish to configure? (or 'done') [vlan0] done DNS domain name? (e.g. 'bar.com') [my.domain] DNS nameservers? (IP address list or 'none') [none] Password for root account? (will not echo) Password for root account? (again) Start sshd(8) by default? [yes] Start ntpd(8) by default? [no] Do you expect to run the X Window System? [no] Setup a user? (enter a lower-case loginname, or 'no') [no] ... etc. There are still some issues with the device tree to work out; in particular NVRAM and networking (I'd guess that the OpenBSD sparc64 kernel doesn't contain the Realtek device driver so at some point I'll need to create a virtual hme device) but it's good enough to install/boot an OS on different hardware for testing - what could be more fun than that? ATB, Mark.
Re: sparc64: fledgling QEMU support
On 09/09/14 19:54, Mark Kettenis wrote: Sweet. The RealTek 8129 should be supported by the rl(4) driver, and is AFAICT included in the RAMDISK kernel. Not sure why it doesn't attach. If it is easy to hook up QEMU's e1000 hardware emulation to the emulated sparc64 hardware, that should be supported as well on the OpenBSD side. OpenBSD expects the device tree node for the PS/2 keyboard to be named 8042. That's how it is named on the Ultra AXi boards. Thanks for the information. I've had some interest from the NetBSD folk too and it seems that they don't build 8042 support into their default sparc64 kernel, so it looks like I'd have to switch over to su serial ports instead like the real thing (the QEMU sun4u model is fairly close to an Ultra 5). My aim is to try and provide an environment that mostly just works for as many OSs as possible. The NVRAM is supposed to be described by a node named eeprom under ebus. proper emulation of this device will get rid of the unix-gettod:interpret: exception -13 caught interpret h# 01c099fc unix-gettod failed with error ffed WARNING: bad date in battery clock -- CHECK AND RESET THE DATE! spam. Brilliant - very useful. The one issue I am aware of is that currently the NVRAM chap is wired up as ioport rather than MMIO so that will need to change. I believe Artyom posted some patches for this a year or so ago, however they will likely need a bit of work to get them suitable for upstream QEMU. ATB, Mark.
Re: sparc64: fledgling QEMU support
On 09/09/14 19:57, Brad Smith wrote: The Realtek hardware in that dmesg is an NE2000 PCI adapter which the sparc64 kernel config indeed does not have a driver for at the very moment, although it could be added. Having a QEMU driver for the Happy Meal MAC would provide the best level of compatibility with other OS's as that is what comes with a lot of Sun systems. Agreed. Once I've sorted out the NVRAM issues in theory QEMU should be able to run some older 64-bit Solaris 9-10 kernels, and I suspect I'll need to implement a virtual hme device in order for that to work. It seems like people on this list have quite a bit of SPARC experience, so would it be okay to ask questions about hme drivers on this list? Or would somewhere else be more appropriate? But for OpenBSD and sparc64 there are other options that could be used from QEMU's perspective such as the e1000 [em(4)], i82551 / i82559er [fxp(4)] and rtl8139 [re(4)] drivers that should work well. Interesting. Longer term the aim of the QEMU project is to move the hardwired machine types into pluggable devices, e.g. you can build a whole machine on the command line from multiple -device parameters or preload the default machine types such as sun4u using instructions from a file. So while this is not practical now without source hacks, it is likely to become possible in the future. ATB, Mark.
Re: sparc64: fledgling QEMU support
On 09/09/14 20:04, Bryan Steele wrote: Neat! :-) It seems the GENERIC sparc64 kernel already has PCMCIA/CardBus ne(4), so adding 'ne* at pci?' might just work. OpenBSD/sparc64 already supports sun4v LDOMS, so there's drivers implementing the virtual protocols (..vnet(4)/vdsk(4)). Does QEMU support this? Could the PCI virtio stuff be adapted to non-x86 architectures? QEMU already has a virtio PCI device that can be plugged into qemu-system-sparc64 (see Artyom's blog at http://tyom.blogspot.co.uk/2013/03/debiansparc64-wheezy-under-qemu-how-to.html for an example of how to do this with Linux). This could be an amusing project; in theory it would be possible to work on an x86 laptop to test/debug big-endian virtio support with the help of QEMU's virtual hardware. You can do this by plugging in a standard virtual cdrom/hd along with an additional virtio hd/nic, booting from the standard devices and then testing the drivers accessing the extra devices as required. I should probably add that there may still be some CPU bugs lying around, and also you'd need a power source since as I don't believe the UIIi processor has any power-saving instructions (or at least QEMU doesn't emulate them) which causes qemu-system-sparc64 to take a lot of CPU... ATB, Mark.
Re: sparc64: fledgling QEMU support
On 09/09/14 21:26, Miod Vallat wrote: Interesting. Longer term the aim of the QEMU project is to move the hardwired machine types into pluggable devices, e.g. you can build a whole machine on the command line from multiple -device parameters or preload the default machine types such as sun4u using instructions from a file. So while this is not practical now without source hacks, it is likely to become possible in the future. Do not expect any support for the fanciest device combinations. While most sparc64 systems will probably be able to cope with whatever five-feet sheeps you can build, sparc32 qemu will happily attempt to emulate systems which make no sense, physically, and dismissing reports that BSD does not run on such artificial setups is annoying, to say the least. Oh sure. It was more to make a point that at some point the QEMU machine will become ultimately more flexible, which I see as something useful for development rather than production use. As I mentioned in one of my earlier emails, my aim is to get the basic sun4u Ultra 5 machine good enough to be able to run the main Linux/*BSD/Solaris OSs out of the box so the final choices of hardware for the virtual device model will be quite limited. ATB, Mark.
Re: sparc64: problem after trap table takeover under QEMU
On 08/05/14 20:28, Mark Kettenis wrote: Hi Mark, Interesting to see sparc64 support in QEMU. Yeah, it's been a work in progress for quite a while now. There seems to be two main areas of interest: firstly for people who are now migrating away from SPARC but need to keep a legacy application(s), and secondly for open source projects interested in testing across multiple architectures. As soon as I step into address 0x1001804 then this is where things start to go wrong; the TLB (TTE) entry for 0x180 which is accessed by %sp is marked as privileged, but ASI 0x11 is user access only. QEMU's current behaviour for this is to generate a datafault for the page at 0x180 which seems to get all the way through to the retry at the end of winfixsave, but then hits the breakpoint trap above when executing the retry. I've finally located the source of this bug thanks to more testing, which showed that OpenBSD 4.9 was surprisingly also able to boot (something I missed this in my original bisection). This allowed me to track down what was happening fairly easily. The problem is caused by the fact that 0x180 has *two* mappings in the TLB and the way in which QEMU resolves them. Compare the state of the TLB when the fill_0_normal trap occurs on OpenBSD 5.5 (faults, incorrect) and OpenBSD 4.9 (no fault, correct): OpenBSD 5.5: (qemu) info tlb MMU contexts: Primary: 0, Secondary: 0 DMMU dump ... [14] VA: 180, PA: f40, 4M, priv, RW, locked, ctx 0 local ... [42] VA: 180, PA: f40, 8k, user, RW, unlocked, ctx 0 local ... OpenBSD 4.9: (qemu) info tlb MMU contexts: Primary: 0, Secondary: 0 DMMU dump ... [08] VA: 180, PA: f40, 8k, user, RW, unlocked, ctx 0 local ... [14] VA: 180, PA: f40, 4M, priv, RW, locked, ctx 0 local ... The bug occurs because the QEMU TLB algorithm currently searches the TLB *in order* starting from entry 0 until it finds a VA match. In the OpenBSD 5.5 case, the first mapping it finds is the 4M privileged mapping, and so the fill_0_normal trap which uses user ASI 0x11 faults due to not being privileged. This is in contrast to the OpenBSD 4.9 case where the first mapping it finds is the 8K unprivileged mapping, hence the fill_0_normal trap succeeds and we proceed to boot. Does anyone know how real hardware resolves conflicts between multiple TLB entries with the same VA? My guess would be that the smaller 8K mapping should take priority, but the documentation in relation to address aliasing is fairly non-existent so I wondering if there are any other rules relating to whether privileged mappings should take priority or not? Once the behaviour is known, it will be fairly easy to fix up QEMU to match. It seems that this first hypothesis was incorrect; after some help from the NetBSD guys we found out that all PROM mappings should default to privileged. So the issue is no longer to do with the difference between privileged/unprivileged mappings, but why does the fault occur in the first place? I don;t know how the real hardware behaves. But it certainly is the intention that the 4M locked mapping gets used as soon as we've taken over the trap table. Not sure where the 8K mapping is coming from. Finally it does raise an eyebrow that the first window trap taken when the kernel takes over the trap table is a fill_0_normal *user* trap, particularly when it's against an *unlocked* TLB entry which could potentially could have been evicted beforehand. It might be worth double-checking as to whether this is the intended behaviour or not. Right. It certainly isn't the intention that we end up a fill_0_normal at this point. Perhaps %wstate is initialized differently in QEMU than on real hardware? The OpenBSD bootstrap code does set %wstate appropriately immediately after taking over the trap table. We can't really do this earlier since we don't know the conventions used by the spill and fill handlers provided by the firmware. But it looks like a Sun Fire T2000 actually initializes %wstate to 0. So perhaps we're just getting lucky on real hardware that the prom code doesn't spill our trap frame and therefore we don't have to fill it again. After more work, I believe that your theory here is correct. Take a look at cpu_initialize() in locore.S: /* * Initialize a CPU. This is used both for bootstrapping the first CPU * and spinning up each subsequent CPU. Basically: * * Install trap table. * Switch to the initial stack. * Call the routine passed in in cpu_info-ci_spinup. */ _C_LABEL(cpu_initialize): wrpr%g0, 0, %tl ! Make sure we're not in NUCLEUS mode flushw /* Change the trap base register */ set _C_LABEL(trapbase), %l1 #ifdef SUN4V sethi %hi(_C_LABEL(cputyp)), %l0 ld [%l0 + %lo(_C_LABEL(cputyp))], %l0 cmp %l0, CPU_SUN4V bne,pt %icc, 1f nop set _C_LABEL(trapbase_sun4v), %l1
Re: sparc64: problem after trap table takeover under QEMU
On 06/05/14 19:18, Mark Cave-Ayland wrote: (cut) As soon as I step into address 0x1001804 then this is where things start to go wrong; the TLB (TTE) entry for 0x180 which is accessed by %sp is marked as privileged, but ASI 0x11 is user access only. QEMU's current behaviour for this is to generate a datafault for the page at 0x180 which seems to get all the way through to the retry at the end of winfixsave, but then hits the breakpoint trap above when executing the retry. I've finally located the source of this bug thanks to more testing, which showed that OpenBSD 4.9 was surprisingly also able to boot (something I missed this in my original bisection). This allowed me to track down what was happening fairly easily. The problem is caused by the fact that 0x180 has *two* mappings in the TLB and the way in which QEMU resolves them. Compare the state of the TLB when the fill_0_normal trap occurs on OpenBSD 5.5 (faults, incorrect) and OpenBSD 4.9 (no fault, correct): OpenBSD 5.5: (qemu) info tlb MMU contexts: Primary: 0, Secondary: 0 DMMU dump ... [14] VA: 180, PA: f40, 4M, priv, RW, locked, ctx 0 local ... [42] VA: 180, PA: f40, 8k, user, RW, unlocked, ctx 0 local ... OpenBSD 4.9: (qemu) info tlb MMU contexts: Primary: 0, Secondary: 0 DMMU dump ... [08] VA: 180, PA: f40, 8k, user, RW, unlocked, ctx 0 local ... [14] VA: 180, PA: f40, 4M, priv, RW, locked, ctx 0 local ... The bug occurs because the QEMU TLB algorithm currently searches the TLB *in order* starting from entry 0 until it finds a VA match. In the OpenBSD 5.5 case, the first mapping it finds is the 4M privileged mapping, and so the fill_0_normal trap which uses user ASI 0x11 faults due to not being privileged. This is in contrast to the OpenBSD 4.9 case where the first mapping it finds is the 8K unprivileged mapping, hence the fill_0_normal trap succeeds and we proceed to boot. Does anyone know how real hardware resolves conflicts between multiple TLB entries with the same VA? My guess would be that the smaller 8K mapping should take priority, but the documentation in relation to address aliasing is fairly non-existent so I wondering if there are any other rules relating to whether privileged mappings should take priority or not? Once the behaviour is known, it will be fairly easy to fix up QEMU to match. Finally it does raise an eyebrow that the first window trap taken when the kernel takes over the trap table is a fill_0_normal *user* trap, particularly when it's against an *unlocked* TLB entry which could potentially could have been evicted beforehand. It might be worth double-checking as to whether this is the intended behaviour or not. Kind regards, Mark.
sparc64: problem after trap table takeover under QEMU
Hi all, I'm currently working on a set of patches for OpenBIOS (the OF implementation for QEMU) in order to get the various *BSD kernels to boot under QEMU SPARC64 with some success, but I'm struggling with a privilege violation trap which occurs on the first window fill trap after OpenBSD takes over the trap table. This is with the latest OpenBSD 5.5 and with my current patchset the console output looks like this: Loading FCode image... Loaded 4829 bytes entry point is 0x4000 OpenBSD IEEE 1275 Bootblock 1.3 .. Jumping to entry point 0010 for type 0001... switching to new context: entry point 0x10 stack 0xffe8aa09 OpenBSD BOOT 1.6 Trying bsd... open /pci@1fe,0/pci-ata@5/ide1@600/cdrom@0:f/etc/random.seed: No such file or directory Booting /pci@1fe,0/pci-ata@5/ide1@600/cdrom@0:f/bsd 3901336@0x100+6248@0x13b8798+3261984@0x180+932320@0x1b1c620 symbols @ 0xffc5a300 119 start=0x100 Unexpected client interface exception: -1 panic: trap type 0x101 (breakpoint): pc=1010254 npc=1010258 pstate=99110414MG,PEF,PRIV halted EXIT I asked around on IRC and it was suggested that I post the information here in order to get some further input on this. My feeling is that QEMU SPARC64 may be doing something different to real hardware but I don't have any to play with and this is my first dig into OpenBSD, so I'd really appreciate some pointers from interested parties. The privilege violation trap I experience occurs just after OpenBSD invokes the OF SUNW,set-trap-table call and occurs in the epilogue of openfirmware() in locore.S at the final restore: ... rdpr%pstate, %l0 jmpl%i4, %o7 wrpr %g0, PSTATE_PROM|PSTATE_IE, %pstate wrpr%l0, %g0, %pstate mov %l1, %g1 mov %l2, %g2 mov %l3, %g3 mov %l4, %g4 mov %l5, %g5 mov %l6, %g6 mov %l7, %g7 wrpr%i2, 0, %pil ret restore%o0, %g0, %o0 What happens here is that when the final restore is executed in the delay slot, a fill_0_normal trap is generated which vectors into 0x1001800 here: (gdb) disas 0x1001800, 0x100185c Dump of assembler code from 0x1001800 to 0x100185c: = 0x01001800: wr %g0, 0x11, %asi 0x01001804: ldxa [ %sp + 0x7ff ] %asi, %l0 0x01001808: ldxa [ %sp + 0x807 ] %asi, %l1 0x0100180c: ldxa [ %sp + 0x80f ] %asi, %l2 0x01001810: ldxa [ %sp + 0x817 ] %asi, %l3 0x01001814: ldxa [ %sp + 0x81f ] %asi, %l4 0x01001818: ldxa [ %sp + 0x827 ] %asi, %l5 0x0100181c: ldxa [ %sp + 0x82f ] %asi, %l6 0x01001820: ldxa [ %sp + 0x837 ] %asi, %l7 0x01001824: ldxa [ %sp + 0x83f ] %asi, %i0 0x01001828: ldxa [ %sp + 0x847 ] %asi, %i1 0x0100182c: ldxa [ %sp + 0x84f ] %asi, %i2 0x01001830: ldxa [ %sp + 0x857 ] %asi, %i3 0x01001834: ldxa [ %sp + 0x85f ] %asi, %i4 0x01001838: ldxa [ %sp + 0x867 ] %asi, %i5 0x0100183c: ldxa [ %sp + 0x86f ] %asi, %fp 0x01001840: ldxa [ %sp + 0x877 ] %asi, %i7 0x01001844: nop 0x01001848: sethi %hi(0xe0018000), %g5 0x0100184c: ldx [ %g5 + 0x10 ], %g5! 0xe0018010 0x01001850: ldx [ %g5 + 0x28 ], %g5 0x01001854: xor %g5, %i7, %i7 0x01001858: restored End of assembler dump. (gdb) info regi sp sp 0x18006710x1800671 As soon as I step into address 0x1001804 then this is where things start to go wrong; the TLB (TTE) entry for 0x180 which is accessed by %sp is marked as privileged, but ASI 0x11 is user access only. QEMU's current behaviour for this is to generate a datafault for the page at 0x180 which seems to get all the way through to the retry at the end of winfixsave, but then hits the breakpoint trap above when executing the retry. Based on this I have a couple of questions about what is happening here: 1) Is the fill_0_normal (user-level) trap the correct one? Or does OpenBIOS need to do something with %otherwin to invoke a supervisor-level trap? 2) Is the QEMU SPARC64 behaviour of invoking a data_access_exception when accessing supervisor memory with a user ASI correct? FWIW I also tried some older OpenBSD ISOs and found that this behaviour was introduced between the 4.3 and 4.4 releases, and older releases don't exhibit this problem. Repeating the same test in 4.3, which is the last release that doesn't trap with the breakpoint error above, shows that the fill_0_normal trap is still invoked in the openfirmware() epilogue, however the stack pointer is now different: (gdb) info regi sp sp 0x1c096210x1c09621 And I can confirm that page 0x1c08000 exists in the TLB but compared to the current release above *isn't* marked as privileged, so no fault occurs