Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map

2016-10-15 Thread David Gibson
On Fri, Oct 14, 2016 at 09:25:58AM +0200, Laurent Vivier wrote:
> On 14/10/2016 01:29, David Gibson wrote:
> > From 98f07c09c6d72218800d6cfbf44b973a88ece2aa Mon Sep 17 00:00:00 2001
> > From: David Gibson 
> > Date: Fri, 14 Oct 2016 10:21:00 +1100
> > Subject: [PATCH] spapr: Improved placement of PCI host bridges in guest 
> > memory
> >  map
> > 
> > Currently, the MMIO space for accessing PCI on pseries guests begins at
> > 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> > chunk of address space in which it places its outbound PIO and 32-bit and
> > 64-bit MMIO windows.
> > 
> > This scheme as several problems:
> >   - It limits guest RAM to 1 TiB (though we have a limited fix for this
> > now)
> >   - It limits the total MMIO window to 64 GiB.  This is not always enough
> > for some of the large nVidia GPGPU cards
> >   - Putting all the windows into a single 64 GiB area means that naturally
> > aligning things within there will waste more address space.
> > In addition there was a miscalculation in some of the defaults, which meant
> > that the MMIO windows for each PHB actually slightly overran the 64 GiB
> > region for that PHB.  We got away without nasty consequences because
> > the overrun fit within an unused area at the beginning of the next PHB's
> > region, but it's not pretty.
> > 
> > This patch implements a new scheme which addresses those problems, and is
> > also closer to what bare metal hardware and pHyp guests generally use.
> > 
> > Because some guest versions (including most current distro kernels) can't
> > access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> > 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> > (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> > subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> > one PHB each.
> > 
> > This reduces the number of allowed PHBs (without full manual configuration
> > of all the windows) from 256 to 31, but this should still be plenty in
> > practice.
> > 
> > We also change some of the default window sizes for manually configured
> > PHBs to saner values.
> > 
> > Finally we adjust some tests and libqos so that it correctly uses the new
> > default locations.  Ideally it would parse the device tree given to the
> > guest, but that's a more complex problem for another time.
> > 
> > Signed-off-by: David Gibson 
> 
> 
> I really like this new version.
> 
> Reviewed-by: Laurent Vivier 

Great!  I've merged it into ppc-for-2.8.

> 
> Laurent
> 
> > ---
> >  hw/ppc/spapr.c  | 121 
> > +---
> >  hw/ppc/spapr_pci.c  |   5 +-
> >  include/hw/pci-host/spapr.h |   8 ++-
> >  tests/endianness-test.c |   3 +-
> >  tests/libqos/pci-spapr.c|   9 ++--
> >  tests/spapr-phb-test.c  |   2 +-
> >  6 files changed, 108 insertions(+), 40 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 8db3657..4bdf15b 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2375,31 +2375,37 @@ static void spapr_phb_placement(sPAPRMachineState 
> > *spapr, uint32_t index,
> >  hwaddr *mmio32, hwaddr *mmio64,
> >  unsigned n_dma, uint32_t *liobns, Error 
> > **errp)
> >  {
> > +/*
> > + * New-style PHB window placement.
> > + *
> > + * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> > + * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> > + * windows.
> > + *
> > + * Some guest kernels can't work with MMIO windows above 1<<46
> > + * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> > + *
> > + * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> > + * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> > + * has the 64-bit window for PHB1 and so forth.
> > + */
> >  const uint64_t base_buid = 0x8002000ULL;
> > -const hwaddr phb_spacing = 0x10ULL; /* 64 GiB */
> > -const hwaddr mmio_offset = 0xa000; /* 2 GiB + 512 MiB */
> > -const hwaddr pio_offset = 0x8000; /* 2 GiB */
> > -const uint32_t max_index = 255;
> > -const hwaddr phb0_alignment = 0x100ULL; /* 1 TiB */
> > -
> > -uint64_t ram_top = MACHINE(spapr)->ram_size;
> > -hwaddr phb0_base, phb_base;
> > +const int max_phbs =
> > +(SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
> >  int i;
> >  
> > -/* Do we have hotpluggable memory? */
> > -if (MACHINE(spapr)->maxram_size > ram_top) {
> > -/* Can't just use maxram_size, because there may be an
> > - * alignment gap between normal and hotpluggable memory
> > - * regions */
> > -ram_top = spapr->hotplug_memory.base +
> > -

Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map

2016-10-14 Thread Laurent Vivier
On 14/10/2016 01:29, David Gibson wrote:
> From 98f07c09c6d72218800d6cfbf44b973a88ece2aa Mon Sep 17 00:00:00 2001
> From: David Gibson 
> Date: Fri, 14 Oct 2016 10:21:00 +1100
> Subject: [PATCH] spapr: Improved placement of PCI host bridges in guest memory
>  map
> 
> Currently, the MMIO space for accessing PCI on pseries guests begins at
> 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> chunk of address space in which it places its outbound PIO and 32-bit and
> 64-bit MMIO windows.
> 
> This scheme as several problems:
>   - It limits guest RAM to 1 TiB (though we have a limited fix for this
> now)
>   - It limits the total MMIO window to 64 GiB.  This is not always enough
> for some of the large nVidia GPGPU cards
>   - Putting all the windows into a single 64 GiB area means that naturally
> aligning things within there will waste more address space.
> In addition there was a miscalculation in some of the defaults, which meant
> that the MMIO windows for each PHB actually slightly overran the 64 GiB
> region for that PHB.  We got away without nasty consequences because
> the overrun fit within an unused area at the beginning of the next PHB's
> region, but it's not pretty.
> 
> This patch implements a new scheme which addresses those problems, and is
> also closer to what bare metal hardware and pHyp guests generally use.
> 
> Because some guest versions (including most current distro kernels) can't
> access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> one PHB each.
> 
> This reduces the number of allowed PHBs (without full manual configuration
> of all the windows) from 256 to 31, but this should still be plenty in
> practice.
> 
> We also change some of the default window sizes for manually configured
> PHBs to saner values.
> 
> Finally we adjust some tests and libqos so that it correctly uses the new
> default locations.  Ideally it would parse the device tree given to the
> guest, but that's a more complex problem for another time.
> 
> Signed-off-by: David Gibson 


I really like this new version.

Reviewed-by: Laurent Vivier 

Laurent

> ---
>  hw/ppc/spapr.c  | 121 
> +---
>  hw/ppc/spapr_pci.c  |   5 +-
>  include/hw/pci-host/spapr.h |   8 ++-
>  tests/endianness-test.c |   3 +-
>  tests/libqos/pci-spapr.c|   9 ++--
>  tests/spapr-phb-test.c  |   2 +-
>  6 files changed, 108 insertions(+), 40 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8db3657..4bdf15b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2375,31 +2375,37 @@ static void spapr_phb_placement(sPAPRMachineState 
> *spapr, uint32_t index,
>  hwaddr *mmio32, hwaddr *mmio64,
>  unsigned n_dma, uint32_t *liobns, Error 
> **errp)
>  {
> +/*
> + * New-style PHB window placement.
> + *
> + * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> + * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> + * windows.
> + *
> + * Some guest kernels can't work with MMIO windows above 1<<46
> + * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> + *
> + * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> + * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> + * has the 64-bit window for PHB1 and so forth.
> + */
>  const uint64_t base_buid = 0x8002000ULL;
> -const hwaddr phb_spacing = 0x10ULL; /* 64 GiB */
> -const hwaddr mmio_offset = 0xa000; /* 2 GiB + 512 MiB */
> -const hwaddr pio_offset = 0x8000; /* 2 GiB */
> -const uint32_t max_index = 255;
> -const hwaddr phb0_alignment = 0x100ULL; /* 1 TiB */
> -
> -uint64_t ram_top = MACHINE(spapr)->ram_size;
> -hwaddr phb0_base, phb_base;
> +const int max_phbs =
> +(SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
>  int i;
>  
> -/* Do we have hotpluggable memory? */
> -if (MACHINE(spapr)->maxram_size > ram_top) {
> -/* Can't just use maxram_size, because there may be an
> - * alignment gap between normal and hotpluggable memory
> - * regions */
> -ram_top = spapr->hotplug_memory.base +
> -memory_region_size(>hotplug_memory.mr);
> -}
> -
> -phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> +/* Sanity check natural alignments */
> +QEMU_BUILD_BUG_ON((SPAPR_PCI_BASE % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
> +QEMU_BUILD_BUG_ON((SPAPR_PCI_LIMIT % SPAPR_PCI_MEM64_WIN_SIZE) != 0);
> +

Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map

2016-10-13 Thread David Gibson
On Thu, Oct 13, 2016 at 10:40:32AM +0200, Laurent Vivier wrote:
> 
> 
> On 13/10/2016 01:57, David Gibson wrote:
> > Currently, the MMIO space for accessing PCI on pseries guests begins at
> > 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> > chunk of address space in which it places its outbound PIO and 32-bit and
> > 64-bit MMIO windows.
> > 
> > This scheme as several problems:
> >   - It limits guest RAM to 1 TiB (though we have a limited fix for this
> > now)
> >   - It limits the total MMIO window to 64 GiB.  This is not always enough
> > for some of the large nVidia GPGPU cards
> >   - Putting all the windows into a single 64 GiB area means that naturally
> > aligning things within there will waste more address space.
> > In addition there was a miscalculation in some of the defaults, which meant
> > that the MMIO windows for each PHB actually slightly overran the 64 GiB
> > region for that PHB.  We got away without nasty consequences because
> > the overrun fit within an unused area at the beginning of the next PHB's
> > region, but it's not pretty.
> > 
> > This patch implements a new scheme which addresses those problems, and is
> > also closer to what bare metal hardware and pHyp guests generally use.
> > 
> > Because some guest versions (including most current distro kernels) can't
> > access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> > 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> > (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> > subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> > one PHB each.
> > 
> > This reduces the number of allowed PHBs (without full manual configuration
> > of all the windows) from 256 to 31, but this should still be plenty in
> > practice.
> > 
> > We also change some of the default window sizes for manually configured
> > PHBs to saner values.
> > 
> > Finally we adjust some tests and libqos so that it correctly uses the new
> > default locations.  Ideally it would parse the device tree given to the
> > guest, but that's a more complex problem for another time.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  hw/ppc/spapr.c  | 126 
> > +---
> >  hw/ppc/spapr_pci.c  |   5 +-
> >  include/hw/pci-host/spapr.h |   8 ++-
> >  tests/endianness-test.c |   3 +-
> >  tests/libqos/pci-spapr.c|   9 ++--
> >  tests/spapr-phb-test.c  |   2 +-
> >  6 files changed, 113 insertions(+), 40 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 8db3657..2d952a8 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2375,31 +2375,42 @@ static void spapr_phb_placement(sPAPRMachineState 
> > *spapr, uint32_t index,
> >  hwaddr *mmio32, hwaddr *mmio64,
> >  unsigned n_dma, uint32_t *liobns, Error 
> > **errp)
> >  {
> > +/*
> > + * New-style PHB window placement.
> > + *
> > + * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> > + * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> > + * windows.
> > + *
> > + * Some guest kernels can't work with MMIO windows above 1<<46
> > + * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> > + *
> > + * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> > + * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> > + * has the 64-bit window for PHB1 and so forth.
> > + */
> >  const uint64_t base_buid = 0x8002000ULL;
> > -const hwaddr phb_spacing = 0x10ULL; /* 64 GiB */
> > -const hwaddr mmio_offset = 0xa000; /* 2 GiB + 512 MiB */
> > -const hwaddr pio_offset = 0x8000; /* 2 GiB */
> > -const uint32_t max_index = 255;
> > -const hwaddr phb0_alignment = 0x100ULL; /* 1 TiB */
> >  
> > -uint64_t ram_top = MACHINE(spapr)->ram_size;
> > -hwaddr phb0_base, phb_base;
> > +int max_phbs =
> > +(SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
> > +hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE;
> 
> The result is the same but I would add SPAPR_PCI_MEM_WIN_BUS_OFFSET
> instead of SPAPR_PCI_MEM32_WIN_SIZE.
> 
> As SPAPR_PCI_MEM32_WIN_SIZE is defined as "((1ULL << 32) -
> SPAPR_PCI_MEM_WIN_BUS_OFFSET)", I guess 0..SPAPR_PCI_MEM_WIN_BUS_OFFSET
> is for PIO and SPAPR_PCI_MEM_WIN_BUS_OFFSET..(1<<32) is for MMIO.

No, not quite.

> This is what we have below with:
> 
> *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
> *mmio32 = mmio32_base + index * SPAPR_PCI_MEM32_WIN_SIZE;
> 
> Perhaps we can see *mmio32 as
> 
> SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE

That's the intended effect.  Basically we take 32..64TiB and divide it
into 1TiB regions.  Regions 1..31 are mmio64 

Re: [Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map

2016-10-13 Thread Laurent Vivier


On 13/10/2016 01:57, David Gibson wrote:
> Currently, the MMIO space for accessing PCI on pseries guests begins at
> 1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
> chunk of address space in which it places its outbound PIO and 32-bit and
> 64-bit MMIO windows.
> 
> This scheme as several problems:
>   - It limits guest RAM to 1 TiB (though we have a limited fix for this
> now)
>   - It limits the total MMIO window to 64 GiB.  This is not always enough
> for some of the large nVidia GPGPU cards
>   - Putting all the windows into a single 64 GiB area means that naturally
> aligning things within there will waste more address space.
> In addition there was a miscalculation in some of the defaults, which meant
> that the MMIO windows for each PHB actually slightly overran the 64 GiB
> region for that PHB.  We got away without nasty consequences because
> the overrun fit within an unused area at the beginning of the next PHB's
> region, but it's not pretty.
> 
> This patch implements a new scheme which addresses those problems, and is
> also closer to what bare metal hardware and pHyp guests generally use.
> 
> Because some guest versions (including most current distro kernels) can't
> access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
> 64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
> (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
> subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
> one PHB each.
> 
> This reduces the number of allowed PHBs (without full manual configuration
> of all the windows) from 256 to 31, but this should still be plenty in
> practice.
> 
> We also change some of the default window sizes for manually configured
> PHBs to saner values.
> 
> Finally we adjust some tests and libqos so that it correctly uses the new
> default locations.  Ideally it would parse the device tree given to the
> guest, but that's a more complex problem for another time.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr.c  | 126 
> +---
>  hw/ppc/spapr_pci.c  |   5 +-
>  include/hw/pci-host/spapr.h |   8 ++-
>  tests/endianness-test.c |   3 +-
>  tests/libqos/pci-spapr.c|   9 ++--
>  tests/spapr-phb-test.c  |   2 +-
>  6 files changed, 113 insertions(+), 40 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 8db3657..2d952a8 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2375,31 +2375,42 @@ static void spapr_phb_placement(sPAPRMachineState 
> *spapr, uint32_t index,
>  hwaddr *mmio32, hwaddr *mmio64,
>  unsigned n_dma, uint32_t *liobns, Error 
> **errp)
>  {
> +/*
> + * New-style PHB window placement.
> + *
> + * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
> + * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
> + * windows.
> + *
> + * Some guest kernels can't work with MMIO windows above 1<<46
> + * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
> + *
> + * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
> + * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
> + * has the 64-bit window for PHB1 and so forth.
> + */
>  const uint64_t base_buid = 0x8002000ULL;
> -const hwaddr phb_spacing = 0x10ULL; /* 64 GiB */
> -const hwaddr mmio_offset = 0xa000; /* 2 GiB + 512 MiB */
> -const hwaddr pio_offset = 0x8000; /* 2 GiB */
> -const uint32_t max_index = 255;
> -const hwaddr phb0_alignment = 0x100ULL; /* 1 TiB */
>  
> -uint64_t ram_top = MACHINE(spapr)->ram_size;
> -hwaddr phb0_base, phb_base;
> +int max_phbs =
> +(SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
> +hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE;

The result is the same but I would add SPAPR_PCI_MEM_WIN_BUS_OFFSET
instead of SPAPR_PCI_MEM32_WIN_SIZE.

As SPAPR_PCI_MEM32_WIN_SIZE is defined as "((1ULL << 32) -
SPAPR_PCI_MEM_WIN_BUS_OFFSET)", I guess 0..SPAPR_PCI_MEM_WIN_BUS_OFFSET
is for PIO and SPAPR_PCI_MEM_WIN_BUS_OFFSET..(1<<32) is for MMIO.

This is what we have below with:

*pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE;
*mmio32 = mmio32_base + index * SPAPR_PCI_MEM32_WIN_SIZE;

Perhaps we can see *mmio32 as

SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE

But in this case you should directly defined SPAPR_PCI_MEM32_WIN_SIZE as
0x8000, not as "SIZE - OFFSET". It's confusing...

If it is only a misunderstanding from my side, you can add my:

Reviewed-by: Laurent Vivier 

Thanks,
Laurent



[Qemu-devel] [PATCHv3 7/7] spapr: Improved placement of PCI host bridges in guest memory map

2016-10-12 Thread David Gibson
Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.

This scheme as several problems:
  - It limits guest RAM to 1 TiB (though we have a limited fix for this
now)
  - It limits the total MMIO window to 64 GiB.  This is not always enough
for some of the large nVidia GPGPU cards
  - Putting all the windows into a single 64 GiB area means that naturally
aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB.  We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.

This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.

Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB.  This is broken into 1 TiB chunks.  The 1 TiB contains the PIO
(64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.

This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.

We also change some of the default window sizes for manually configured
PHBs to saner values.

Finally we adjust some tests and libqos so that it correctly uses the new
default locations.  Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  | 126 +---
 hw/ppc/spapr_pci.c  |   5 +-
 include/hw/pci-host/spapr.h |   8 ++-
 tests/endianness-test.c |   3 +-
 tests/libqos/pci-spapr.c|   9 ++--
 tests/spapr-phb-test.c  |   2 +-
 6 files changed, 113 insertions(+), 40 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8db3657..2d952a8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2375,31 +2375,42 @@ static void spapr_phb_placement(sPAPRMachineState 
*spapr, uint32_t index,
 hwaddr *mmio32, hwaddr *mmio64,
 unsigned n_dma, uint32_t *liobns, Error **errp)
 {
+/*
+ * New-style PHB window placement.
+ *
+ * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window
+ * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO
+ * windows.
+ *
+ * Some guest kernels can't work with MMIO windows above 1<<46
+ * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB
+ *
+ * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all
+ * PHBs.  33..34TiB has the 64-bit MMIO window for PHB0, 34..35
+ * has the 64-bit window for PHB1 and so forth.
+ */
 const uint64_t base_buid = 0x8002000ULL;
-const hwaddr phb_spacing = 0x10ULL; /* 64 GiB */
-const hwaddr mmio_offset = 0xa000; /* 2 GiB + 512 MiB */
-const hwaddr pio_offset = 0x8000; /* 2 GiB */
-const uint32_t max_index = 255;
-const hwaddr phb0_alignment = 0x100ULL; /* 1 TiB */
 
-uint64_t ram_top = MACHINE(spapr)->ram_size;
-hwaddr phb0_base, phb_base;
+int max_phbs =
+(SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1;
+hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE;
+hwaddr mmio64_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM64_WIN_SIZE;
 int i;
 
-/* Do we have hotpluggable memory? */
-if (MACHINE(spapr)->maxram_size > ram_top) {
-/* Can't just use maxram_size, because there may be an
- * alignment gap between normal and hotpluggable memory
- * regions */
-ram_top = spapr->hotplug_memory.base +
-memory_region_size(>hotplug_memory.mr);
-}
-
-phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
-
-if (index > max_index) {
+/* Sanity check natural alignments */
+assert((SPAPR_PCI_BASE % SPAPR_PCI_MEM64_WIN_SIZE) == 0);
+assert((SPAPR_PCI_LIMIT % SPAPR_PCI_MEM64_WIN_SIZE) == 0);
+assert((SPAPR_PCI_MEM64_WIN_SIZE % SPAPR_PCI_MEM32_WIN_SIZE) == 0);
+assert((SPAPR_PCI_MEM32_WIN_SIZE % SPAPR_PCI_IO_WIN_SIZE) == 0);
+/* Sanity check bounds */
+assert((SPAPR_PCI_BASE + max_phbs * SPAPR_PCI_IO_WIN_SIZE)
+   <= mmio32_base);
+assert(mmio32_base + max_phbs * SPAPR_PCI_MEM32_WIN_SIZE
+   <= mmio64_base);
+
+if (index >= max_phbs) {
 error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
-