Re: [PATCH qemu v18] spapr: Implement Open Firmware client interface

2021-04-21 Thread David Gibson
On Wed, Apr 21, 2021 at 04:50:12PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 4/21/21 15:27, David Gibson wrote:
> > On Tue, Apr 20, 2021 at 07:16:35PM +1000, Alexey Kardashevskiy wrote:
> > > On 20/04/2021 13:14, David Gibson wrote:
[snip]
> > > > > diff --git a/pc-bios/vof/Makefile b/pc-bios/vof/Makefile
> > > > > new file mode 100644
> > > > > index ..1451e0551818
> > > > > --- /dev/null
> > > > > +++ b/pc-bios/vof/Makefile
> > > > > @@ -0,0 +1,18 @@
> > > > > +all: build-all
> > > > > +
> > > > > +build-all: vof.bin
> > > > > +
> > > > > +%.o: %.S
> > > > > + cc -m32 -mbig-endian -c -o $@ $<
> > > > 
> > > > Should probably use a $(CC) variable to make it easier for people to
> > > > point this at a cross-compiler.
> > > 
> > > 
> > > 
> > > CROSS ?=
> > > CC = $(CROSS)gcc
> > > LD = $(CROSS)ld
> > > OBJCOPY = $(CROSS)objcopy
> > > 
> > > 
> > > ?
> > > 
> > > Works with
> > > 
> > > make 
> > > CROSS=/opt/cross/gcc-10.1.0-nolibc/powerpc64-linux/bin/powerpc64-linux-
> > 
> > I was just thinking "CC = cc" etc. so someone can override it from the
> > command line, but your suggestion is even better.
> 
> 
> I am not sure why (there is no "?" in "CC="/etc) but this works too with the
> change above:

The command line overrides variables in the Makefile by default, using
?= just lets environment variables override them as well.

[snip]
> > > > > +return;
> > > > > +}
> > > > > +
> > > > > +g_array_sort(claimed, of_claimed_compare_func);
> > > > > +vof_claimed_dump(claimed);
> > > > > +
> > > > > +/*
> > > > > + * VOF resides in the first page so we do not need to check if 
> > > > > there is
> > > > > + * available memory before the first claimed block
> > > > > + */
> > > > > +g_assert(claimed->len && (g_array_index(claimed, OfClaimed, 
> > > > > 0).start == 0));
> > > > > +
> > > > > +avail = g_malloc0(sizeof(avail[0]) * claimed->len);
> > > > > +for (i = 0, n = 0; i < claimed->len; ++i) {
> > > > > +OfClaimed c = g_array_index(claimed, OfClaimed, i);
> > > > > +uint64_t start, size;
> > > > > +
> > > > > +start = c.start + c.size;
> > > > > +if (i < claimed->len - 1) {
> > > > > +OfClaimed cn = g_array_index(claimed, OfClaimed, i + 1);
> > > > > +
> > > > > +size = cn.start - start;
> > > > > +} else {
> > > > > +size = be64_to_cpu(mem0_reg[1]) - start;
> > > > 
> > > > Don't you have vof->top_addr for the end of the ram you care about, so
> > > > you don't need to go poking at the memory node?
> > > 
> > > 
> > > top_addr is limited by 4GB but memory@0 is not and I'd like "available" to
> > > report free memory till the end of the memory@0 node part of which
> > > "available" is.
> > 
> > Hmmm.  AIUI the purpose of 'available' is so the client can know what
> > things it can claim, but IIUC claim only works in the 32-bit arena up
> > to top_addr.  So, does it really make sense to have it include stuff
> > beyond that?
> 
> 
> I am really not sure. The format uses 2 cells for an address. The client
> cannot claim memory above 4GB as the CLI ABI returns only cells but the
> firmware may run 64bit, use some memory above 4GB and report this use to the
> client so the client would have to avoid using that memory until ... I do
> not know... quiesce?

Huh.. yeah, that's pretty confusing.

> It is all very theoretical of course but still feels safer to stretch
> "available" till the end of the node.

Ok, you've convinced me.

[snip]
> > > > > +void vof_build_dt(void *fdt, Vof *vof)
> > > > > +{
> > > > > +uint32_t phandle;
> > > > > +int i, offset, proplen = 0;
> > > > > +const void *prop;
> > > > > +bool found = false;
> > > > > +GArray *phandles = g_array_new(false, false, sizeof(uint32_t));
> > > > > +
> > > > > +/* Find all predefined phandles */
> > > > > +for (offset = fdt_next_node(fdt, -1, NULL);
> > > > > + offset >= 0;
> > > > > + offset = fdt_next_node(fdt, offset, NULL)) {
> > > > > +prop = fdt_getprop(fdt, offset, "phandle", );
> > > > > +if (prop && proplen == sizeof(uint32_t)) {
> > > > > +phandle = fdt32_ld(prop);
> > > > > +g_array_append_val(phandles, phandle);
> > > > > +}
> > > > > +}
> > > > > +
> > > > > +/* Assign phandles skipping the predefined ones */
> > > > > +for (offset = fdt_next_node(fdt, -1, NULL), phandle = 1;
> > > > > + offset >= 0;
> > > > > + offset = fdt_next_node(fdt, offset, NULL), ++phandle) {
> > > > > +prop = fdt_getprop(fdt, offset, "phandle", );
> > > > > +if (prop) {
> > > > > +continue;
> > > > > +}
> > > > > +/* Check if the current phandle is not allocated already */
> > > > > +for ( ; ; ++phandle) {
> > > > > +for (i = 0, found = false; i < phandles->len; ++i) {
> > > > > +if (phandle == g_array_index(phandles, uint32_t, 

Re: [PATCH qemu v18] spapr: Implement Open Firmware client interface

2021-04-21 Thread Alexey Kardashevskiy




On 4/21/21 15:27, David Gibson wrote:

On Tue, Apr 20, 2021 at 07:16:35PM +1000, Alexey Kardashevskiy wrote:



On 20/04/2021 13:14, David Gibson wrote:


Overall, looking good.  I'm pretty much happy to take it into 6.1.  I
do have quite a few comments below, but they're basically all just
polish.

On Wed, Mar 31, 2021 at 01:53:08PM +1100, Alexey Kardashevskiy wrote:

The PAPR platform which describes an OS environment that's presented by


Nit: remove "which" and this will become a sentence.


a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.


The above is a really good description of the rationale, thanks.


The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..1 - stack 
40.. - kernel

3ea.. - initramdisk


This memory map info would probably be more useful in a comment
somewhere in the code than in the commit message.


This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This is coded in assumption that later on we might be adding support for
booting from QEMU backends (blockdev is the first candidate) without
devices/drivers in between as OF1275 does not require that and
it is quite easy to so.

Signed-off-by: Alexey Kardashevskiy 
---

The example command line is:

/home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 8G \
-machine 
pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
 \
-kernel pbuild/kernel-le-guest/vmlinux \
-initrd pb/rootfs.cpio.xz \
-drive 
id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \
-global spapr-nvram.drive=DRIVE0 \
-snapshot \
-smp 8,threads=8 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
-mon chardev=SOCKET0,mode=control

---
Changes:
v18:
* fixed top addr (max address for "claim") on radix - it equals to ram_size
and vof->top_addr was uint32_t
* fixed "available" property which got broken in v14 but it is only visible
to clients which care (== grub)
* reshuffled vof_dt_memory_available() calls, added vof_init() to allow
vof_claim() before rendering the FDT

v17:
* mv hw/ppc/vof.h include/hw/ppc/vof.h
* VofMachineIfClass -> 

Re: [PATCH qemu v18] spapr: Implement Open Firmware client interface

2021-04-20 Thread David Gibson
On Tue, Apr 20, 2021 at 07:16:35PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 20/04/2021 13:14, David Gibson wrote:
> > 
> > Overall, looking good.  I'm pretty much happy to take it into 6.1.  I
> > do have quite a few comments below, but they're basically all just
> > polish.
> > 
> > On Wed, Mar 31, 2021 at 01:53:08PM +1100, Alexey Kardashevskiy wrote:
> > > The PAPR platform which describes an OS environment that's presented by
> > 
> > Nit: remove "which" and this will become a sentence.
> > 
> > > a combination of a hypervisor and firmware. The features it specifies
> > > require collaboration between the firmware and the hypervisor.
> > > 
> > > Since the beginning, the runtime component of the firmware (RTAS) has
> > > been implemented as a 20 byte shim which simply forwards it to
> > > a hypercall implemented in qemu. The boot time firmware component is
> > > SLOF - but a build that's specific to qemu, and has always needed to be
> > > updated in sync with it. Even though we've managed to limit the amount
> > > of runtime communication we need between qemu and SLOF, there's some,
> > > and it has become increasingly awkward to handle as we've implemented
> > > new features.
> > > 
> > > This implements a boot time OF client interface (CI) which is
> > > enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
> > > Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
> > > which implements Open Firmware Client Interface (OF CI). This allows
> > > using a smaller stateless firmware which does not have to manage
> > > the device tree.
> > 
> > The above is a really good description of the rationale, thanks.
> > 
> > > The new "vof.bin" firmware image is included with source code under
> > > pc-bios/. It also includes RTAS blob.
> > > 
> > > This implements a handful of CI methods just to get -kernel/-initrd
> > > working. In particular, this implements the device tree fetching and
> > > simple memory allocator - "claim" (an OF CI memory allocator) and updates
> > > "/memory@0/available" to report the client about available memory.
> > > 
> > > This implements changing some device tree properties which we know how
> > > to deal with, the rest is ignored. To allow changes, this skips
> > > fdt_pack() when x-vof=on as not packing the blob leaves some room for
> > > appending.
> > > 
> > > In absence of SLOF, this assigns phandles to device tree nodes to make
> > > device tree traversing work.
> > > 
> > > When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
> > > 
> > > This adds basic instances support which are managed by a hash map
> > > ihandle -> [phandle].
> > > 
> > > Before the guest started, the used memory is:
> > > 0..e60 - the initial firmware
> > > 8000..1 - stack
> > > 40.. - kernel
> > > 3ea.. - initramdisk
> > 
> > This memory map info would probably be more useful in a comment
> > somewhere in the code than in the commit message.
> > 
> > > This OF CI does not implement "interpret".
> > > 
> > > Unlike SLOF, this does not format uninitialized nvram. Instead, this
> > > includes a disk image with pre-formatted nvram.
> > > 
> > > With this basic support, this can only boot into kernel directly.
> > > However this is just enough for the petitboot kernel and initradmdisk to
> > > boot from any possible source. Note this requires reasonably recent guest
> > > kernel with:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
> > > 
> > > The immediate benefit is much faster booting time which especially
> > > crucial with fully emulated early CPU bring up environments. Also this
> > > may come handy when/if GRUB-in-the-userspace sees light of the day.
> > > 
> > > This separates VOF and sPAPR in a hope that VOF bits may be reused by
> > > other POWERPC boards which do not support pSeries.
> > > 
> > > This is coded in assumption that later on we might be adding support for
> > > booting from QEMU backends (blockdev is the first candidate) without
> > > devices/drivers in between as OF1275 does not require that and
> > > it is quite easy to so.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy 
> > > ---
> > > 
> > > The example command line is:
> > > 
> > > /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
> > > -nodefaults \
> > > -chardev stdio,id=STDIO0,signal=off,mux=on \
> > > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> > > -mon id=MON0,chardev=STDIO0,mode=readline \
> > > -nographic \
> > > -vga none \
> > > -enable-kvm \
> > > -m 8G \
> > > -machine 
> > > pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
> > >  \
> > > -kernel pbuild/kernel-le-guest/vmlinux \
> > > -initrd pb/rootfs.cpio.xz \
> > > -drive 
> > > id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw 
> > > \
> > > -global spapr-nvram.drive=DRIVE0 \
> > > -snapshot \
> > > -smp 8,threads=8 \
> > > -L 

Re: [PATCH qemu v18] spapr: Implement Open Firmware client interface

2021-04-20 Thread Alexey Kardashevskiy




On 20/04/2021 13:14, David Gibson wrote:


Overall, looking good.  I'm pretty much happy to take it into 6.1.  I
do have quite a few comments below, but they're basically all just
polish.

On Wed, Mar 31, 2021 at 01:53:08PM +1100, Alexey Kardashevskiy wrote:

The PAPR platform which describes an OS environment that's presented by


Nit: remove "which" and this will become a sentence.


a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.


The above is a really good description of the rationale, thanks.


The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..1 - stack
40.. - kernel
3ea.. - initramdisk


This memory map info would probably be more useful in a comment
somewhere in the code than in the commit message.


This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This is coded in assumption that later on we might be adding support for
booting from QEMU backends (blockdev is the first candidate) without
devices/drivers in between as OF1275 does not require that and
it is quite easy to so.

Signed-off-by: Alexey Kardashevskiy 
---

The example command line is:

/home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 8G \
-machine 
pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
 \
-kernel pbuild/kernel-le-guest/vmlinux \
-initrd pb/rootfs.cpio.xz \
-drive 
id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \
-global spapr-nvram.drive=DRIVE0 \
-snapshot \
-smp 8,threads=8 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
-mon chardev=SOCKET0,mode=control

---
Changes:
v18:
* fixed top addr (max address for "claim") on radix - it equals to ram_size
and vof->top_addr was uint32_t
* fixed "available" property which got broken in v14 but it is only visible
to clients which care (== grub)
* reshuffled vof_dt_memory_available() calls, added vof_init() to allow
vof_claim() before rendering the FDT

v17:
* mv hw/ppc/vof.h include/hw/ppc/vof.h
* VofMachineIfClass -> VofMachineClass; it is not VofMachineInterface as
nobody used this scheme, usually "Interface" is dropped, a couple of times
it 

Re: [PATCH qemu v18] spapr: Implement Open Firmware client interface

2021-04-19 Thread David Gibson

Overall, looking good.  I'm pretty much happy to take it into 6.1.  I
do have quite a few comments below, but they're basically all just
polish.

On Wed, Mar 31, 2021 at 01:53:08PM +1100, Alexey Kardashevskiy wrote:
> The PAPR platform which describes an OS environment that's presented by

Nit: remove "which" and this will become a sentence.

> a combination of a hypervisor and firmware. The features it specifies
> require collaboration between the firmware and the hypervisor.
> 
> Since the beginning, the runtime component of the firmware (RTAS) has
> been implemented as a 20 byte shim which simply forwards it to
> a hypercall implemented in qemu. The boot time firmware component is
> SLOF - but a build that's specific to qemu, and has always needed to be
> updated in sync with it. Even though we've managed to limit the amount
> of runtime communication we need between qemu and SLOF, there's some,
> and it has become increasingly awkward to handle as we've implemented
> new features.
> 
> This implements a boot time OF client interface (CI) which is
> enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
> which implements Open Firmware Client Interface (OF CI). This allows
> using a smaller stateless firmware which does not have to manage
> the device tree.

The above is a really good description of the rationale, thanks.

> The new "vof.bin" firmware image is included with source code under
> pc-bios/. It also includes RTAS blob.
> 
> This implements a handful of CI methods just to get -kernel/-initrd
> working. In particular, this implements the device tree fetching and
> simple memory allocator - "claim" (an OF CI memory allocator) and updates
> "/memory@0/available" to report the client about available memory.
> 
> This implements changing some device tree properties which we know how
> to deal with, the rest is ignored. To allow changes, this skips
> fdt_pack() when x-vof=on as not packing the blob leaves some room for
> appending.
> 
> In absence of SLOF, this assigns phandles to device tree nodes to make
> device tree traversing work.
> 
> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
> 
> This adds basic instances support which are managed by a hash map
> ihandle -> [phandle].
> 
> Before the guest started, the used memory is:
> 0..e60 - the initial firmware
> 8000..1 - stack
> 40.. - kernel
> 3ea.. - initramdisk

This memory map info would probably be more useful in a comment
somewhere in the code than in the commit message.


> This OF CI does not implement "interpret".
> 
> Unlike SLOF, this does not format uninitialized nvram. Instead, this
> includes a disk image with pre-formatted nvram.
> 
> With this basic support, this can only boot into kernel directly.
> However this is just enough for the petitboot kernel and initradmdisk to
> boot from any possible source. Note this requires reasonably recent guest
> kernel with:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
> 
> The immediate benefit is much faster booting time which especially
> crucial with fully emulated early CPU bring up environments. Also this
> may come handy when/if GRUB-in-the-userspace sees light of the day.
> 
> This separates VOF and sPAPR in a hope that VOF bits may be reused by
> other POWERPC boards which do not support pSeries.
> 
> This is coded in assumption that later on we might be adding support for
> booting from QEMU backends (blockdev is the first candidate) without
> devices/drivers in between as OF1275 does not require that and
> it is quite easy to so.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> The example command line is:
> 
> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
> -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> -mon id=MON0,chardev=STDIO0,mode=readline \
> -nographic \
> -vga none \
> -enable-kvm \
> -m 8G \
> -machine 
> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
>  \
> -kernel pbuild/kernel-le-guest/vmlinux \
> -initrd pb/rootfs.cpio.xz \
> -drive 
> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \
> -global spapr-nvram.drive=DRIVE0 \
> -snapshot \
> -smp 8,threads=8 \
> -L /home/aik/t/qemu-ppc64-bios/ \
> -trace events=qemu_trace_events \
> -d guest_errors \
> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
> -mon chardev=SOCKET0,mode=control
> 
> ---
> Changes:
> v18:
> * fixed top addr (max address for "claim") on radix - it equals to ram_size
> and vof->top_addr was uint32_t
> * fixed "available" property which got broken in v14 but it is only visible
> to clients which care (== grub)
> * reshuffled vof_dt_memory_available() calls, added vof_init() to allow
> vof_claim() before rendering the FDT
> 
> v17:
> * mv hw/ppc/vof.h 

Re: [PATCH qemu v18] spapr: Implement Open Firmware client interface

2021-04-08 Thread Alexey Kardashevskiy




On 31/03/2021 13:53, Alexey Kardashevskiy wrote:

The PAPR platform which describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..1 - stack
40.. - kernel
3ea.. - initramdisk

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This is coded in assumption that later on we might be adding support for
booting from QEMU backends (blockdev is the first candidate) without
devices/drivers in between as OF1275 does not require that and
it is quite easy to so.

Signed-off-by: Alexey Kardashevskiy 


[...]


diff --git a/hw/ppc/spapr_vof.c b/hw/ppc/spapr_vof.c
new file mode 100644
index ..9d22e230e3c0
--- /dev/null
+++ b/hw/ppc/spapr_vof.c


[...]


+
+void spapr_vof_client_dt_finalize(SpaprMachineState *spapr, void *fdt)
+{
+char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
+
+vof_build_dt(fdt, spapr->vof);
+
+/*
+ * SLOF-less setup requires an open instance of stdout for early
+ * kernel printk. By now all phandles are settled so we can open
+ * the default serial console.
+ */
+if (stdout_path) {
+_FDT(vof_client_open_store(fdt, spapr->vof, "/chosen", "stdout",
+   stdout_path));
+}
+}
+
+void spapr_vof_reset(SpaprMachineState *spapr, void *fdt,
+ target_ulong *stack_ptr, Error **errp)
+{
+Vof *vof = spapr->vof;
+
+vof_init(vof, spapr->rma_size);
+
+if (vof_claim(vof, 0, spapr->fw_size, 0) == -1) {
+error_setg(errp, "Memory for firmware is in use");
+return;
+}
+
+*stack_ptr = vof_claim(vof, 0, OF_STACK_SIZE, OF_STACK_SIZE);
+if (*stack_ptr == -1) {
+error_setg(errp, "Memory allocation for stack failed");
+return;
+}
+/* Stack grows downwards plus reserve space for the minimum stack frame */
+*stack_ptr += OF_STACK_SIZE - 0x20;
+
+if (spapr->kernel_size &&
+vof_claim(vof, spapr->kernel_addr, spapr->kernel_size, 0) == -1) {
+error_setg(errp, "Memory for kernel is in use");
+return;
+}
+
+if (spapr->initrd_size &&
+vof_claim(vof, spapr->initrd_base, spapr->initrd_size, 0) == -1) {
+error_setg(errp, "Memory for initramdisk is in use");
+return;
+}
+
+spapr_vof_client_dt_finalize(spapr, fdt);
+
+/*
+ * We skip