Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-18 Thread Marcel Apfelbaum

On 01/13/2017 01:14 AM, David Gibson wrote:

On Thu, Jan 12, 2017 at 11:03:05AM -0500, Laine Stump wrote:

On 01/05/2017 12:46 AM, David Gibson wrote:

There was a discussion back in November on the qemu list which spilled
onto the libvirt list about how to add support for PCIe devices to
POWER VMs, specifically 'pseries' machine type PAPR guests.

Here's a more concrete proposal for how to handle part of this in
future from the libvirt side.  Strictly speaking what I'm suggesting
here isn't intrinsically linked to PCIe: it will make adding PCIe
support sanely easier, as well as having a number of advantages for
both PCIe and plain-PCI devices on PAPR guests.

Background:

  * Currently the pseries machine type only supports vanilla PCI
buses.
 * This is a qemu limitation, not something inherent - PAPR guests
   running under PowerVM (the IBM hypervisor) can use passthrough
   PCIe devices (PowerVM doesn't emulate devices though).
 * In fact the way PCI access is para-virtalized in PAPR makes the
   usual distinctions between PCI and PCIe largely disappear
  * Presentation of PCIe devices to PAPR guests is unusual
 * Unlike x86 - and other "bare metal" platforms, root ports are
   not made visible to the guest. i.e. all devices (typically)
   appear as though they were integrated devices on x86
 * In terms of topology all devices will appear in a way similar to
   a vanilla PCI bus, even PCIe devices
* However PCIe extended config space is accessible
 * This means libvirt's usual placement of PCIe devices is not
   suitable for PAPR guests
  * PAPR has its own hotplug mechanism
 * This is used instead of standard PCIe hotplug
 * This mechanism works for both PCIe and vanilla-PCI devices
 * This can hotplug/unplug devices even without a root port P2P
   bridge between it and the root "bus
  * Multiple independent host bridges are routine on PAPR
 * Unlike PC (where all host bridges have multiplexed access to
   configuration space) PCI host bridges (PHBs) are truly
   independent for PAPR guests (disjoint MMIO regions in system
   address space)
 * PowerVM typically presents a separate PHB to the guest for each
   host slot passed through

The Proposal:

I suggest that libvirt implement a new default algorithm for placing
(i.e. assigning addresses to) both PCI and PCIe devices for (only)
PAPR guests.

The short summary is that by default it should assign each device to a
separate vPHB, creating vPHBs as necessary.

   * For passthrough sometimes a group of host devices can't be safely
 isolated from each other - this is known as a (host) Partitionable
 Endpoint (PE).  In this case, if any device in the PE is passed
 through to a guest, the whole PE must be passed through to the
 same vPHB in the guest.  From the guest POV, each vPHB has exactly
 one (guest) PE.
   * To allow for hotplugged devices, libvirt should also add a number
 of additional, empty vPHBs (the PAPR spec allows for hotplug of
 PHBs, but this is not yet implemented in qemu).  When hotplugging
 a new device (or PE) libvirt should locate a vPHB which doesn't
 currently contain anything.
   * libvirt should only (automatically) add PHBs - never root ports or
 other PCI to PCI bridges



It's a bit unconventional to leave all but one slot of a controller unused,


Unconventional for x86, maybe.  It's been SOP on IBM Power for a
decade or more.  Both for PAPR guests and in some cases on the
physical hardware (AIUI many, though not all, Power systems used a
separate host bridge for each physical slot to ensure better isolation
between devices).


but your thinking makes sense. I don't think this will be as
large/disruptive of a change as you might be expecting - we already have
different addressing rules to automatically addressed vs. manually
addressed, as well as a framework in place to behave differently for
different PCI controllers (e.g. some support hotplug and others don't), and
to modify behavior based on machinetype / root bus model, so it should be
straightforward to do make things behave as you outline above.


Actually, I had that impression, so I was hoping it wouldn't be too
bad to implement.  I'd really like to get this underway ASAP, so we
can build the PCIe support (both qemu and Power) around that.


(The first item in your list sounds exactly like VFIO iommu groups. Is that
how it's exposed on PPC?


Yes, for Power hosts and guests there's a 1-1 correspondance between
PEs and IOMMU groups.  Technically speaking, I believe the PE provides
more isolation guarantees than the IOMMU group, but they're generally
close enough in practice.


If so, libvirt already takes care of guaranteeing
that any devices in the same group aren't used by other guests or the host
during the time a guest is using a device.


Yes, I'm aware of that, that's not an aspect I was concerned about.

Although that said, 

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-18 Thread Marcel Apfelbaum

On 01/12/2017 07:53 PM, Laine Stump wrote:

On 01/12/2017 11:35 AM, Michael Roth wrote:

Quoting Laine Stump (2017-01-12 08:52:10)


[...]




Yeah you're right, I'm probably remembering the wrong problem and wrong reason 
for the problem. I just remember there was *some* issue about hotplugging new 
PCI controllers. Possibly the internal
representation of the bus hierarchy wasn't updated unless you forced a rescan 
of all the devices or something? My memory of it is vague, I just remember 
being told it wasn't just a case of the
controller itself being initialized.

Alex or Marcel - since whatever it was I likely heard it from one of you (or 
imagined it in a dream), can you straighten me out?



Hi Laine,

Indeed, hot-plugging a QEMU pci-bridge in X86 is somehow problematic since
it comes with some bits on ACPI tables which are already loaded at hot-plug 
time.
We need the bits since the x86 pci-hotplug is ACPI based. For Q35 machine might 
be easier,
but is not implemented yet as far as I know.

Thanks,
Marcel

[...]

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-13 Thread Greg Kurz
On Fri, 13 Jan 2017 09:57:36 +1100
David Gibson  wrote:

> On Thu, Jan 12, 2017 at 11:31:35AM +0100, Andrea Bolognani wrote:
> > On Mon, 2017-01-09 at 10:46 +1100, David Gibson wrote:  
> > > > >* To allow for hotplugged devices, libvirt should also add a number
> > > > >  of additional, empty vPHBs (the PAPR spec allows for hotplug of
> > > > >  PHBs, but this is not yet implemented in qemu).  
> > > > 
> > > > "A number" here will have to mean "one", same number of
> > > > empty PCIe Root Ports libvirt will add to a newly-defined
> > > > q35 guest.  
> > > 
> > > Umm.. why?  
> > 
> > Because some applications using libvirt would inevitably
> > start relying on the fact that such spare PHBs are
> > available, locking us into providing at least the same
> > number forever. In other words, increasing the amount at
> > a later time is always possible, but decreasing it isn't.
> > We did the same when we started automatically adding PCIe
> > Root Ports to q35 machines.
> > 
> > The rationale is that having a single spare hotpluggable
> > slot is extremely convenient for basic usage, eg. a simple
> > guest created by someone who's not necessarily very
> > familiar with virtualization; on the other hand, if you
> > are actually deploying in production you ought to conduct
> > proper capacity planning and figure out in advance how
> > many devices you're likely to need to hotplug throughout
> > the guest's life.  
> 
> Hm, ok.  Well I guess the limitation is the same as on x86, so it
> shouldn't surprise people.
> 
> > Of course this all will be moot once we can hotplug PHBs :)  
> 
> Yes.  Unfortunately, nobody's actually working on that at present.
> 

Well, there might be someone now :)

Michael Roth had posted a RFC patchset back in 2015:

https://lists.gnu.org/archive/html/qemu-ppc/2015-04/msg00275.html

I'll start from here.

Cheers.

--
Greg


pgpaUP5GNOV4p.pgp
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-13 Thread Greg Kurz
On Fri, 13 Jan 2017 15:48:31 +1100
David Gibson  wrote:

> On Thu, Jan 12, 2017 at 10:09:03AM +0100, Greg Kurz wrote:
> > On Thu, 12 Jan 2017 17:19:40 +1100
> > Alexey Kardashevskiy  wrote:
> >   
> > > On 12/01/17 14:52, David Gibson wrote:  
> > > > On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote:
> > > >> On Thu, 5 Jan 2017 16:46:18 +1100
> > > >> David Gibson  wrote:
> > > >>
> > > >>> There was a discussion back in November on the qemu list which spilled
> > > >>> onto the libvirt list about how to add support for PCIe devices to
> > > >>> POWER VMs, specifically 'pseries' machine type PAPR guests.
> > > >>>
> > > >>> Here's a more concrete proposal for how to handle part of this in
> > > >>> future from the libvirt side.  Strictly speaking what I'm suggesting
> > > >>> here isn't intrinsically linked to PCIe: it will make adding PCIe
> > > >>> support sanely easier, as well as having a number of advantages for
> > > >>> both PCIe and plain-PCI devices on PAPR guests.
> > > >>>
> > > >>> Background:
> > > >>>
> > > >>>  * Currently the pseries machine type only supports vanilla PCI
> > > >>>buses.
> > > >>> * This is a qemu limitation, not something inherent - PAPR guests
> > > >>>   running under PowerVM (the IBM hypervisor) can use passthrough
> > > >>>   PCIe devices (PowerVM doesn't emulate devices though).
> > > >>> * In fact the way PCI access is para-virtalized in PAPR makes the
> > > >>>   usual distinctions between PCI and PCIe largely disappear
> > > >>>  * Presentation of PCIe devices to PAPR guests is unusual
> > > >>> * Unlike x86 - and other "bare metal" platforms, root ports are
> > > >>>   not made visible to the guest. i.e. all devices (typically)
> > > >>>   appear as though they were integrated devices on x86
> > > >>> * In terms of topology all devices will appear in a way similar to
> > > >>>   a vanilla PCI bus, even PCIe devices
> > > >>>* However PCIe extended config space is accessible
> > > >>> * This means libvirt's usual placement of PCIe devices is not
> > > >>>   suitable for PAPR guests
> > > >>>  * PAPR has its own hotplug mechanism
> > > >>> * This is used instead of standard PCIe hotplug
> > > >>> * This mechanism works for both PCIe and vanilla-PCI devices
> > > >>> * This can hotplug/unplug devices even without a root port P2P
> > > >>>   bridge between it and the root "bus
> > > >>>  * Multiple independent host bridges are routine on PAPR
> > > >>> * Unlike PC (where all host bridges have multiplexed access to
> > > >>>   configuration space) PCI host bridges (PHBs) are truly
> > > >>>   independent for PAPR guests (disjoint MMIO regions in system
> > > >>>   address space)
> > > >>> * PowerVM typically presents a separate PHB to the guest for each
> > > >>>   host slot passed through
> > > >>>
> > > >>> The Proposal:
> > > >>>
> > > >>> I suggest that libvirt implement a new default algorithm for placing
> > > >>> (i.e. assigning addresses to) both PCI and PCIe devices for (only)
> > > >>> PAPR guests.
> > > >>>
> > > >>> The short summary is that by default it should assign each device to a
> > > >>> separate vPHB, creating vPHBs as necessary.
> > > >>>
> > > >>>   * For passthrough sometimes a group of host devices can't be safely
> > > >>> isolated from each other - this is known as a (host) Partitionable
> > > >>> Endpoint (PE).  In this case, if any device in the PE is passed
> > > >>> through to a guest, the whole PE must be passed through to the
> > > >>> same vPHB in the guest.  From the guest POV, each vPHB has exactly
> > > >>> one (guest) PE.
> > > >>>   * To allow for hotplugged devices, libvirt should also add a number
> > > >>> of additional, empty vPHBs (the PAPR spec allows for hotplug of
> > > >>> PHBs, but this is not yet implemented in qemu).  When hotplugging
> > > >>> a new device (or PE) libvirt should locate a vPHB which doesn't
> > > >>> currently contain anything.
> > > >>>   * libvirt should only (automatically) add PHBs - never root ports or
> > > >>> other PCI to PCI bridges
> > > >>>
> > > >>> In order to handle migration, the vPHBs will need to be represented in
> > > >>> the domain XML, which will also allow the user to override this
> > > >>> topology if they want.
> > > >>>
> > > >>> Advantages:
> > > >>>
> > > >>> There are still some details I need to figure out w.r.t. handling PCIe
> > > >>> devices (on both the qemu and libvirt sides).  However the fact that  
> > > >>>   
> > > >>
> > > >> One such detail may be that PCIe devices should have the
> > > >> "ibm,pci-config-space-type" property set to 1 in the DT,
> > > >> for the driver to be able to access the extended config
> > > >> space.
> > > > 
> > > > So, we have a bit of an oddity here.  It looks like we currently set
> > > > 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual
> > > > device node

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread Alexey Kardashevskiy
On 13/01/17 15:48, David Gibson wrote:
> On Thu, Jan 12, 2017 at 10:09:03AM +0100, Greg Kurz wrote:
>> On Thu, 12 Jan 2017 17:19:40 +1100
>> Alexey Kardashevskiy  wrote:
>>
>>> On 12/01/17 14:52, David Gibson wrote:
 On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote:  
> On Thu, 5 Jan 2017 16:46:18 +1100
> David Gibson  wrote:
>  
>> There was a discussion back in November on the qemu list which spilled
>> onto the libvirt list about how to add support for PCIe devices to
>> POWER VMs, specifically 'pseries' machine type PAPR guests.
>>
>> Here's a more concrete proposal for how to handle part of this in
>> future from the libvirt side.  Strictly speaking what I'm suggesting
>> here isn't intrinsically linked to PCIe: it will make adding PCIe
>> support sanely easier, as well as having a number of advantages for
>> both PCIe and plain-PCI devices on PAPR guests.
>>
>> Background:
>>
>>  * Currently the pseries machine type only supports vanilla PCI
>>buses.
>> * This is a qemu limitation, not something inherent - PAPR guests
>>   running under PowerVM (the IBM hypervisor) can use passthrough
>>   PCIe devices (PowerVM doesn't emulate devices though).
>> * In fact the way PCI access is para-virtalized in PAPR makes the
>>   usual distinctions between PCI and PCIe largely disappear
>>  * Presentation of PCIe devices to PAPR guests is unusual
>> * Unlike x86 - and other "bare metal" platforms, root ports are
>>   not made visible to the guest. i.e. all devices (typically)
>>   appear as though they were integrated devices on x86
>> * In terms of topology all devices will appear in a way similar to
>>   a vanilla PCI bus, even PCIe devices
>>* However PCIe extended config space is accessible
>> * This means libvirt's usual placement of PCIe devices is not
>>   suitable for PAPR guests
>>  * PAPR has its own hotplug mechanism
>> * This is used instead of standard PCIe hotplug
>> * This mechanism works for both PCIe and vanilla-PCI devices
>> * This can hotplug/unplug devices even without a root port P2P
>>   bridge between it and the root "bus
>>  * Multiple independent host bridges are routine on PAPR
>> * Unlike PC (where all host bridges have multiplexed access to
>>   configuration space) PCI host bridges (PHBs) are truly
>>   independent for PAPR guests (disjoint MMIO regions in system
>>   address space)
>> * PowerVM typically presents a separate PHB to the guest for each
>>   host slot passed through
>>
>> The Proposal:
>>
>> I suggest that libvirt implement a new default algorithm for placing
>> (i.e. assigning addresses to) both PCI and PCIe devices for (only)
>> PAPR guests.
>>
>> The short summary is that by default it should assign each device to a
>> separate vPHB, creating vPHBs as necessary.
>>
>>   * For passthrough sometimes a group of host devices can't be safely
>> isolated from each other - this is known as a (host) Partitionable
>> Endpoint (PE).  In this case, if any device in the PE is passed
>> through to a guest, the whole PE must be passed through to the
>> same vPHB in the guest.  From the guest POV, each vPHB has exactly
>> one (guest) PE.
>>   * To allow for hotplugged devices, libvirt should also add a number
>> of additional, empty vPHBs (the PAPR spec allows for hotplug of
>> PHBs, but this is not yet implemented in qemu).  When hotplugging
>> a new device (or PE) libvirt should locate a vPHB which doesn't
>> currently contain anything.
>>   * libvirt should only (automatically) add PHBs - never root ports or
>> other PCI to PCI bridges
>>
>> In order to handle migration, the vPHBs will need to be represented in
>> the domain XML, which will also allow the user to override this
>> topology if they want.
>>
>> Advantages:
>>
>> There are still some details I need to figure out w.r.t. handling PCIe
>> devices (on both the qemu and libvirt sides).  However the fact that  
>
> One such detail may be that PCIe devices should have the
> "ibm,pci-config-space-type" property set to 1 in the DT,
> for the driver to be able to access the extended config
> space.  

 So, we have a bit of an oddity here.  It looks like we currently set
 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual
 device nodes.  Which, AFAICT, is simply incorrect in terms of PAPR.  
>>>
>>>
>>> I asked Paul how to read the spec and this is rather correct but not enough
>>> - having type=1 on a PHB means that extended access requests can go behind
>>> it but underlying devices and bridges still need to have type=1 if they
>

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread David Gibson
On Thu, Jan 12, 2017 at 10:09:03AM +0100, Greg Kurz wrote:
> On Thu, 12 Jan 2017 17:19:40 +1100
> Alexey Kardashevskiy  wrote:
> 
> > On 12/01/17 14:52, David Gibson wrote:
> > > On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote:  
> > >> On Thu, 5 Jan 2017 16:46:18 +1100
> > >> David Gibson  wrote:
> > >>  
> > >>> There was a discussion back in November on the qemu list which spilled
> > >>> onto the libvirt list about how to add support for PCIe devices to
> > >>> POWER VMs, specifically 'pseries' machine type PAPR guests.
> > >>>
> > >>> Here's a more concrete proposal for how to handle part of this in
> > >>> future from the libvirt side.  Strictly speaking what I'm suggesting
> > >>> here isn't intrinsically linked to PCIe: it will make adding PCIe
> > >>> support sanely easier, as well as having a number of advantages for
> > >>> both PCIe and plain-PCI devices on PAPR guests.
> > >>>
> > >>> Background:
> > >>>
> > >>>  * Currently the pseries machine type only supports vanilla PCI
> > >>>buses.
> > >>> * This is a qemu limitation, not something inherent - PAPR guests
> > >>>   running under PowerVM (the IBM hypervisor) can use passthrough
> > >>>   PCIe devices (PowerVM doesn't emulate devices though).
> > >>> * In fact the way PCI access is para-virtalized in PAPR makes the
> > >>>   usual distinctions between PCI and PCIe largely disappear
> > >>>  * Presentation of PCIe devices to PAPR guests is unusual
> > >>> * Unlike x86 - and other "bare metal" platforms, root ports are
> > >>>   not made visible to the guest. i.e. all devices (typically)
> > >>>   appear as though they were integrated devices on x86
> > >>> * In terms of topology all devices will appear in a way similar to
> > >>>   a vanilla PCI bus, even PCIe devices
> > >>>* However PCIe extended config space is accessible
> > >>> * This means libvirt's usual placement of PCIe devices is not
> > >>>   suitable for PAPR guests
> > >>>  * PAPR has its own hotplug mechanism
> > >>> * This is used instead of standard PCIe hotplug
> > >>> * This mechanism works for both PCIe and vanilla-PCI devices
> > >>> * This can hotplug/unplug devices even without a root port P2P
> > >>>   bridge between it and the root "bus
> > >>>  * Multiple independent host bridges are routine on PAPR
> > >>> * Unlike PC (where all host bridges have multiplexed access to
> > >>>   configuration space) PCI host bridges (PHBs) are truly
> > >>>   independent for PAPR guests (disjoint MMIO regions in system
> > >>>   address space)
> > >>> * PowerVM typically presents a separate PHB to the guest for each
> > >>>   host slot passed through
> > >>>
> > >>> The Proposal:
> > >>>
> > >>> I suggest that libvirt implement a new default algorithm for placing
> > >>> (i.e. assigning addresses to) both PCI and PCIe devices for (only)
> > >>> PAPR guests.
> > >>>
> > >>> The short summary is that by default it should assign each device to a
> > >>> separate vPHB, creating vPHBs as necessary.
> > >>>
> > >>>   * For passthrough sometimes a group of host devices can't be safely
> > >>> isolated from each other - this is known as a (host) Partitionable
> > >>> Endpoint (PE).  In this case, if any device in the PE is passed
> > >>> through to a guest, the whole PE must be passed through to the
> > >>> same vPHB in the guest.  From the guest POV, each vPHB has exactly
> > >>> one (guest) PE.
> > >>>   * To allow for hotplugged devices, libvirt should also add a number
> > >>> of additional, empty vPHBs (the PAPR spec allows for hotplug of
> > >>> PHBs, but this is not yet implemented in qemu).  When hotplugging
> > >>> a new device (or PE) libvirt should locate a vPHB which doesn't
> > >>> currently contain anything.
> > >>>   * libvirt should only (automatically) add PHBs - never root ports or
> > >>> other PCI to PCI bridges
> > >>>
> > >>> In order to handle migration, the vPHBs will need to be represented in
> > >>> the domain XML, which will also allow the user to override this
> > >>> topology if they want.
> > >>>
> > >>> Advantages:
> > >>>
> > >>> There are still some details I need to figure out w.r.t. handling PCIe
> > >>> devices (on both the qemu and libvirt sides).  However the fact that  
> > >>
> > >> One such detail may be that PCIe devices should have the
> > >> "ibm,pci-config-space-type" property set to 1 in the DT,
> > >> for the driver to be able to access the extended config
> > >> space.  
> > > 
> > > So, we have a bit of an oddity here.  It looks like we currently set
> > > 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual
> > > device nodes.  Which, AFAICT, is simply incorrect in terms of PAPR.  
> > 
> > 
> > I asked Paul how to read the spec and this is rather correct but not enough
> > - having type=1 on a PHB means that extended access requests can go behind
> > it but underlying devices an

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread David Gibson
On Thu, Jan 12, 2017 at 11:31:35AM +0100, Andrea Bolognani wrote:
> On Mon, 2017-01-09 at 10:46 +1100, David Gibson wrote:
> > > >* To allow for hotplugged devices, libvirt should also add a number
> > > >  of additional, empty vPHBs (the PAPR spec allows for hotplug of
> > > >  PHBs, but this is not yet implemented in qemu).
> > > 
> > > "A number" here will have to mean "one", same number of
> > > empty PCIe Root Ports libvirt will add to a newly-defined
> > > q35 guest.
> > 
> > Umm.. why?
> 
> Because some applications using libvirt would inevitably
> start relying on the fact that such spare PHBs are
> available, locking us into providing at least the same
> number forever. In other words, increasing the amount at
> a later time is always possible, but decreasing it isn't.
> We did the same when we started automatically adding PCIe
> Root Ports to q35 machines.
> 
> The rationale is that having a single spare hotpluggable
> slot is extremely convenient for basic usage, eg. a simple
> guest created by someone who's not necessarily very
> familiar with virtualization; on the other hand, if you
> are actually deploying in production you ought to conduct
> proper capacity planning and figure out in advance how
> many devices you're likely to need to hotplug throughout
> the guest's life.

Hm, ok.  Well I guess the limitation is the same as on x86, so it
shouldn't surprise people.

> Of course this all will be moot once we can hotplug PHBs :)

Yes.  Unfortunately, nobody's actually working on that at present.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread David Gibson
On Thu, Jan 12, 2017 at 12:53:28PM -0500, Laine Stump wrote:
> On 01/12/2017 11:35 AM, Michael Roth wrote:
> > Quoting Laine Stump (2017-01-12 08:52:10)
> > > On 01/12/2017 05:31 AM, Andrea Bolognani wrote:
> > > > On Mon, 2017-01-09 at 10:46 +1100, David Gibson wrote:
> > > > > > > * To allow for hotplugged devices, libvirt should also add a 
> > > > > > > number
> > > > > > >   of additional, empty vPHBs (the PAPR spec allows for 
> > > > > > > hotplug of
> > > > > > >   PHBs, but this is not yet implemented in qemu).
> > > > > > 
> > > > > > "A number" here will have to mean "one", same number of
> > > > > > empty PCIe Root Ports libvirt will add to a newly-defined
> > > > > > q35 guest.
> > > > > 
> > > > > Umm.. why?
> > > > 
> > > > Because some applications using libvirt would inevitably
> > > > start relying on the fact that such spare PHBs are
> > > > available, locking us into providing at least the same
> > > > number forever. In other words, increasing the amount at
> > > > a later time is always possible, but decreasing it isn't.
> > > > We did the same when we started automatically adding PCIe
> > > > Root Ports to q35 machines.
> > > > 
> > > > The rationale is that having a single spare hotpluggable
> > > > slot is extremely convenient for basic usage, eg. a simple
> > > > guest created by someone who's not necessarily very
> > > > familiar with virtualization; on the other hand, if you
> > > > are actually deploying in production you ought to conduct
> > > > proper capacity planning and figure out in advance how
> > > > many devices you're likely to need to hotplug throughout
> > > > the guest's life.
> > > 
> > > And of course the reason we don't want to add "too many" extra
> > > controllers by default is so that we don't end up with *all* guests
> > > burdened with extra hardware they don't need or want. The libguestfs
> > > appliance is one example of a libvirt consumer that definitely doesn't
> > > want extra baggage in its guests - guest startup time is very important
> > > to libguestfs, so any addition to the hardware list is looked upon with
> > > disappointment.
> > > 
> > > > 
> > > > Of course this all will be moot once we can hotplug PHBs :)
> > > 
> > > Will the guest OSes handle that properly? I remember being told that
> > 
> > I believe on pseries we *do* scan for devices on the PHB as part of
> > bringing the PHB online in the hotplug path. But I'm not sure that
> > matters (see below).
> > 
> > > Linux, for example, doesn't scan the new bus for devices when a new
> > > controller is added, making it pointless to hotplug a PCI controller (as
> > > usual, it could be that I'm remembering incorrectly...)
> > > 
> > 
> > Wouldn't that only be an issue if we hotplugged a PHB that already had
> > PCI devices on the bus?
> 
> 
> Yeah you're right, I'm probably remembering the wrong problem and wrong
> reason for the problem. I just remember there was *some* issue about
> hotplugging new PCI controllers. Possibly the internal representation of the
> bus hierarchy wasn't updated unless you forced a rescan of all the devices
> or something? My memory of it is vague, I just remember being told it wasn't
> just a case of the controller itself being initialized.
> 
> Alex or Marcel - since whatever it was I likely heard it from one of you (or
> imagined it in a dream), can you straighten me out?

Regardless, I'm pretty sure it won't be relevant for Power guests.
PHB hotplug has its own protocol in PAPR, and is used routinely for
Linux guests under PowerVM.

> 
> > That only seems possible if we had a way to
> > signal phb hotplug *after* we've hotplugged some PCI devices on the bus,
> > which means we'd need some interface to trigger hotplug  beyond the
> > standard 'device_add' calls, e.g.:
> > 
> >   device_add spapr-pci-host-bridge,hotplug-deferred=true,id=phb2,index=2
> >   device_add virtio-net-pci,bus=phb2.0,...,hotplug-deferred=true
> >   device_signal_hotplug phb2
> > 
> > That's actually akin to how it's normally done on pHyp (not only for PHB
> > hotplug, but for PCI hotplug in general, which is why this could be
> > reasonably expected to work on pseries guests), but it seems quite a bit
> > different from how we'd normally handle this on kvm, which I think would
> > be something more like:
> > 
> >   device_add spapr-pci-host-bridge,id=phb2,index=2
> >   
> >   device_add virtio-net-pci,bus=phb2.0,...
> > 
> > In which case it doesn't really matter if the guest scans the bus at
> > hotplug time or not. Is there some other scenario where this might
> > arise?
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread David Gibson
On Thu, Jan 12, 2017 at 11:03:05AM -0500, Laine Stump wrote:
> On 01/05/2017 12:46 AM, David Gibson wrote:
> > There was a discussion back in November on the qemu list which spilled
> > onto the libvirt list about how to add support for PCIe devices to
> > POWER VMs, specifically 'pseries' machine type PAPR guests.
> > 
> > Here's a more concrete proposal for how to handle part of this in
> > future from the libvirt side.  Strictly speaking what I'm suggesting
> > here isn't intrinsically linked to PCIe: it will make adding PCIe
> > support sanely easier, as well as having a number of advantages for
> > both PCIe and plain-PCI devices on PAPR guests.
> > 
> > Background:
> > 
> >   * Currently the pseries machine type only supports vanilla PCI
> > buses.
> >  * This is a qemu limitation, not something inherent - PAPR guests
> >running under PowerVM (the IBM hypervisor) can use passthrough
> >PCIe devices (PowerVM doesn't emulate devices though).
> >  * In fact the way PCI access is para-virtalized in PAPR makes the
> >usual distinctions between PCI and PCIe largely disappear
> >   * Presentation of PCIe devices to PAPR guests is unusual
> >  * Unlike x86 - and other "bare metal" platforms, root ports are
> >not made visible to the guest. i.e. all devices (typically)
> >appear as though they were integrated devices on x86
> >  * In terms of topology all devices will appear in a way similar to
> >a vanilla PCI bus, even PCIe devices
> > * However PCIe extended config space is accessible
> >  * This means libvirt's usual placement of PCIe devices is not
> >suitable for PAPR guests
> >   * PAPR has its own hotplug mechanism
> >  * This is used instead of standard PCIe hotplug
> >  * This mechanism works for both PCIe and vanilla-PCI devices
> >  * This can hotplug/unplug devices even without a root port P2P
> >bridge between it and the root "bus
> >   * Multiple independent host bridges are routine on PAPR
> >  * Unlike PC (where all host bridges have multiplexed access to
> >configuration space) PCI host bridges (PHBs) are truly
> >independent for PAPR guests (disjoint MMIO regions in system
> >address space)
> >  * PowerVM typically presents a separate PHB to the guest for each
> >host slot passed through
> > 
> > The Proposal:
> > 
> > I suggest that libvirt implement a new default algorithm for placing
> > (i.e. assigning addresses to) both PCI and PCIe devices for (only)
> > PAPR guests.
> > 
> > The short summary is that by default it should assign each device to a
> > separate vPHB, creating vPHBs as necessary.
> > 
> >* For passthrough sometimes a group of host devices can't be safely
> >  isolated from each other - this is known as a (host) Partitionable
> >  Endpoint (PE).  In this case, if any device in the PE is passed
> >  through to a guest, the whole PE must be passed through to the
> >  same vPHB in the guest.  From the guest POV, each vPHB has exactly
> >  one (guest) PE.
> >* To allow for hotplugged devices, libvirt should also add a number
> >  of additional, empty vPHBs (the PAPR spec allows for hotplug of
> >  PHBs, but this is not yet implemented in qemu).  When hotplugging
> >  a new device (or PE) libvirt should locate a vPHB which doesn't
> >  currently contain anything.
> >* libvirt should only (automatically) add PHBs - never root ports or
> >  other PCI to PCI bridges
> 
> 
> It's a bit unconventional to leave all but one slot of a controller unused,

Unconventional for x86, maybe.  It's been SOP on IBM Power for a
decade or more.  Both for PAPR guests and in some cases on the
physical hardware (AIUI many, though not all, Power systems used a
separate host bridge for each physical slot to ensure better isolation
between devices).

> but your thinking makes sense. I don't think this will be as
> large/disruptive of a change as you might be expecting - we already have
> different addressing rules to automatically addressed vs. manually
> addressed, as well as a framework in place to behave differently for
> different PCI controllers (e.g. some support hotplug and others don't), and
> to modify behavior based on machinetype / root bus model, so it should be
> straightforward to do make things behave as you outline above.

Actually, I had that impression, so I was hoping it wouldn't be too
bad to implement.  I'd really like to get this underway ASAP, so we
can build the PCIe support (both qemu and Power) around that.

> (The first item in your list sounds exactly like VFIO iommu groups. Is that
> how it's exposed on PPC?

Yes, for Power hosts and guests there's a 1-1 correspondance between
PEs and IOMMU groups.  Technically speaking, I believe the PE provides
more isolation guarantees than the IOMMU group, but they're generally
close enough in practice.

> If so, libvirt already takes care 

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread Laine Stump

On 01/12/2017 11:35 AM, Michael Roth wrote:

Quoting Laine Stump (2017-01-12 08:52:10)

On 01/12/2017 05:31 AM, Andrea Bolognani wrote:

On Mon, 2017-01-09 at 10:46 +1100, David Gibson wrote:

* To allow for hotplugged devices, libvirt should also add a number
  of additional, empty vPHBs (the PAPR spec allows for hotplug of
  PHBs, but this is not yet implemented in qemu).


"A number" here will have to mean "one", same number of
empty PCIe Root Ports libvirt will add to a newly-defined
q35 guest.


Umm.. why?


Because some applications using libvirt would inevitably
start relying on the fact that such spare PHBs are
available, locking us into providing at least the same
number forever. In other words, increasing the amount at
a later time is always possible, but decreasing it isn't.
We did the same when we started automatically adding PCIe
Root Ports to q35 machines.

The rationale is that having a single spare hotpluggable
slot is extremely convenient for basic usage, eg. a simple
guest created by someone who's not necessarily very
familiar with virtualization; on the other hand, if you
are actually deploying in production you ought to conduct
proper capacity planning and figure out in advance how
many devices you're likely to need to hotplug throughout
the guest's life.


And of course the reason we don't want to add "too many" extra
controllers by default is so that we don't end up with *all* guests
burdened with extra hardware they don't need or want. The libguestfs
appliance is one example of a libvirt consumer that definitely doesn't
want extra baggage in its guests - guest startup time is very important
to libguestfs, so any addition to the hardware list is looked upon with
disappointment.



Of course this all will be moot once we can hotplug PHBs :)


Will the guest OSes handle that properly? I remember being told that


I believe on pseries we *do* scan for devices on the PHB as part of
bringing the PHB online in the hotplug path. But I'm not sure that
matters (see below).


Linux, for example, doesn't scan the new bus for devices when a new
controller is added, making it pointless to hotplug a PCI controller (as
usual, it could be that I'm remembering incorrectly...)



Wouldn't that only be an issue if we hotplugged a PHB that already had
PCI devices on the bus?



Yeah you're right, I'm probably remembering the wrong problem and wrong 
reason for the problem. I just remember there was *some* issue about 
hotplugging new PCI controllers. Possibly the internal representation of 
the bus hierarchy wasn't updated unless you forced a rescan of all the 
devices or something? My memory of it is vague, I just remember being 
told it wasn't just a case of the controller itself being initialized.


Alex or Marcel - since whatever it was I likely heard it from one of you 
(or imagined it in a dream), can you straighten me out?



That only seems possible if we had a way to
signal phb hotplug *after* we've hotplugged some PCI devices on the bus,
which means we'd need some interface to trigger hotplug  beyond the
standard 'device_add' calls, e.g.:

  device_add spapr-pci-host-bridge,hotplug-deferred=true,id=phb2,index=2
  device_add virtio-net-pci,bus=phb2.0,...,hotplug-deferred=true
  device_signal_hotplug phb2

That's actually akin to how it's normally done on pHyp (not only for PHB
hotplug, but for PCI hotplug in general, which is why this could be
reasonably expected to work on pseries guests), but it seems quite a bit
different from how we'd normally handle this on kvm, which I think would
be something more like:

  device_add spapr-pci-host-bridge,id=phb2,index=2
  
  device_add virtio-net-pci,bus=phb2.0,...

In which case it doesn't really matter if the guest scans the bus at
hotplug time or not. Is there some other scenario where this might
arise?



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread Michael Roth
Quoting Laine Stump (2017-01-12 08:52:10)
> On 01/12/2017 05:31 AM, Andrea Bolognani wrote:
> > On Mon, 2017-01-09 at 10:46 +1100, David Gibson wrote:
>  * To allow for hotplugged devices, libvirt should also add a number
>    of additional, empty vPHBs (the PAPR spec allows for hotplug of
>    PHBs, but this is not yet implemented in qemu).
> >>>
> >>> "A number" here will have to mean "one", same number of
> >>> empty PCIe Root Ports libvirt will add to a newly-defined
> >>> q35 guest.
> >>
> >> Umm.. why?
> >
> > Because some applications using libvirt would inevitably
> > start relying on the fact that such spare PHBs are
> > available, locking us into providing at least the same
> > number forever. In other words, increasing the amount at
> > a later time is always possible, but decreasing it isn't.
> > We did the same when we started automatically adding PCIe
> > Root Ports to q35 machines.
> >
> > The rationale is that having a single spare hotpluggable
> > slot is extremely convenient for basic usage, eg. a simple
> > guest created by someone who's not necessarily very
> > familiar with virtualization; on the other hand, if you
> > are actually deploying in production you ought to conduct
> > proper capacity planning and figure out in advance how
> > many devices you're likely to need to hotplug throughout
> > the guest's life.
> 
> And of course the reason we don't want to add "too many" extra 
> controllers by default is so that we don't end up with *all* guests 
> burdened with extra hardware they don't need or want. The libguestfs 
> appliance is one example of a libvirt consumer that definitely doesn't 
> want extra baggage in its guests - guest startup time is very important 
> to libguestfs, so any addition to the hardware list is looked upon with 
> disappointment.
> 
> >
> > Of course this all will be moot once we can hotplug PHBs :)
> 
> Will the guest OSes handle that properly? I remember being told that 

I believe on pseries we *do* scan for devices on the PHB as part of
bringing the PHB online in the hotplug path. But I'm not sure that
matters (see below).

> Linux, for example, doesn't scan the new bus for devices when a new 
> controller is added, making it pointless to hotplug a PCI controller (as 
> usual, it could be that I'm remembering incorrectly...)
> 

Wouldn't that only be an issue if we hotplugged a PHB that already had
PCI devices on the bus? That only seems possible if we had a way to
signal phb hotplug *after* we've hotplugged some PCI devices on the bus,
which means we'd need some interface to trigger hotplug  beyond the
standard 'device_add' calls, e.g.:

  device_add spapr-pci-host-bridge,hotplug-deferred=true,id=phb2,index=2
  device_add virtio-net-pci,bus=phb2.0,...,hotplug-deferred=true
  device_signal_hotplug phb2

That's actually akin to how it's normally done on pHyp (not only for PHB
hotplug, but for PCI hotplug in general, which is why this could be
reasonably expected to work on pseries guests), but it seems quite a bit
different from how we'd normally handle this on kvm, which I think would
be something more like:

  device_add spapr-pci-host-bridge,id=phb2,index=2
  
  device_add virtio-net-pci,bus=phb2.0,...

In which case it doesn't really matter if the guest scans the bus at
hotplug time or not. Is there some other scenario where this might
arise?


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread Laine Stump

On 01/05/2017 12:46 AM, David Gibson wrote:

There was a discussion back in November on the qemu list which spilled
onto the libvirt list about how to add support for PCIe devices to
POWER VMs, specifically 'pseries' machine type PAPR guests.

Here's a more concrete proposal for how to handle part of this in
future from the libvirt side.  Strictly speaking what I'm suggesting
here isn't intrinsically linked to PCIe: it will make adding PCIe
support sanely easier, as well as having a number of advantages for
both PCIe and plain-PCI devices on PAPR guests.

Background:

  * Currently the pseries machine type only supports vanilla PCI
buses.
 * This is a qemu limitation, not something inherent - PAPR guests
   running under PowerVM (the IBM hypervisor) can use passthrough
   PCIe devices (PowerVM doesn't emulate devices though).
 * In fact the way PCI access is para-virtalized in PAPR makes the
   usual distinctions between PCI and PCIe largely disappear
  * Presentation of PCIe devices to PAPR guests is unusual
 * Unlike x86 - and other "bare metal" platforms, root ports are
   not made visible to the guest. i.e. all devices (typically)
   appear as though they were integrated devices on x86
 * In terms of topology all devices will appear in a way similar to
   a vanilla PCI bus, even PCIe devices
* However PCIe extended config space is accessible
 * This means libvirt's usual placement of PCIe devices is not
   suitable for PAPR guests
  * PAPR has its own hotplug mechanism
 * This is used instead of standard PCIe hotplug
 * This mechanism works for both PCIe and vanilla-PCI devices
 * This can hotplug/unplug devices even without a root port P2P
   bridge between it and the root "bus
  * Multiple independent host bridges are routine on PAPR
 * Unlike PC (where all host bridges have multiplexed access to
   configuration space) PCI host bridges (PHBs) are truly
   independent for PAPR guests (disjoint MMIO regions in system
   address space)
 * PowerVM typically presents a separate PHB to the guest for each
   host slot passed through

The Proposal:

I suggest that libvirt implement a new default algorithm for placing
(i.e. assigning addresses to) both PCI and PCIe devices for (only)
PAPR guests.

The short summary is that by default it should assign each device to a
separate vPHB, creating vPHBs as necessary.

   * For passthrough sometimes a group of host devices can't be safely
 isolated from each other - this is known as a (host) Partitionable
 Endpoint (PE).  In this case, if any device in the PE is passed
 through to a guest, the whole PE must be passed through to the
 same vPHB in the guest.  From the guest POV, each vPHB has exactly
 one (guest) PE.
   * To allow for hotplugged devices, libvirt should also add a number
 of additional, empty vPHBs (the PAPR spec allows for hotplug of
 PHBs, but this is not yet implemented in qemu).  When hotplugging
 a new device (or PE) libvirt should locate a vPHB which doesn't
 currently contain anything.
   * libvirt should only (automatically) add PHBs - never root ports or
 other PCI to PCI bridges



It's a bit unconventional to leave all but one slot of a controller 
unused, but your thinking makes sense. I don't think this will be as 
large/disruptive of a change as you might be expecting - we already have 
different addressing rules to automatically addressed vs. manually 
addressed, as well as a framework in place to behave differently for 
different PCI controllers (e.g. some support hotplug and others don't), 
and to modify behavior based on machinetype / root bus model, so it 
should be straightforward to do make things behave as you outline above.


(The first item in your list sounds exactly like VFIO iommu groups. Is 
that how it's exposed on PPC? If so, libvirt already takes care of 
guaranteeing that any devices in the same group aren't used by other 
guests or the host during the time a guest is using a device. It doesn't 
automatically assign the other devices to the guest though, since this 
could have unexpected effects on host operation (the example that kept 
coming up when this was originally discussed wrt vfio device assignment 
was the case where a disk device in use on the host was attached to a 
controller in the same iommu group as a USB controller that was going to 
be assigned to a guest - silently assigning the disk controller to the 
guest would cause the host's disk to suddenly become unusable).)



In order to handle migration, the vPHBs will need to be represented in
the domain XML, which will also allow the user to override this
topology if they want.

Advantages:

There are still some details I need to figure out w.r.t. handling PCIe
devices (on both the qemu and libvirt sides).  However the fact that
PAPR guests don't typically see PCIe root ports means that the normal
libvirt PCIe allocat

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread Laine Stump

On 01/12/2017 05:31 AM, Andrea Bolognani wrote:

On Mon, 2017-01-09 at 10:46 +1100, David Gibson wrote:

* To allow for hotplugged devices, libvirt should also add a number
  of additional, empty vPHBs (the PAPR spec allows for hotplug of
  PHBs, but this is not yet implemented in qemu).


"A number" here will have to mean "one", same number of
empty PCIe Root Ports libvirt will add to a newly-defined
q35 guest.


Umm.. why?


Because some applications using libvirt would inevitably
start relying on the fact that such spare PHBs are
available, locking us into providing at least the same
number forever. In other words, increasing the amount at
a later time is always possible, but decreasing it isn't.
We did the same when we started automatically adding PCIe
Root Ports to q35 machines.

The rationale is that having a single spare hotpluggable
slot is extremely convenient for basic usage, eg. a simple
guest created by someone who's not necessarily very
familiar with virtualization; on the other hand, if you
are actually deploying in production you ought to conduct
proper capacity planning and figure out in advance how
many devices you're likely to need to hotplug throughout
the guest's life.


And of course the reason we don't want to add "too many" extra 
controllers by default is so that we don't end up with *all* guests 
burdened with extra hardware they don't need or want. The libguestfs 
appliance is one example of a libvirt consumer that definitely doesn't 
want extra baggage in its guests - guest startup time is very important 
to libguestfs, so any addition to the hardware list is looked upon with 
disappointment.




Of course this all will be moot once we can hotplug PHBs :)


Will the guest OSes handle that properly? I remember being told that 
Linux, for example, doesn't scan the new bus for devices when a new 
controller is added, making it pointless to hotplug a PCI controller (as 
usual, it could be that I'm remembering incorrectly...)


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread Andrea Bolognani
On Mon, 2017-01-09 at 10:46 +1100, David Gibson wrote:
> > >* To allow for hotplugged devices, libvirt should also add a number
> > >  of additional, empty vPHBs (the PAPR spec allows for hotplug of
> > >  PHBs, but this is not yet implemented in qemu).
> > 
> > "A number" here will have to mean "one", same number of
> > empty PCIe Root Ports libvirt will add to a newly-defined
> > q35 guest.
> 
> Umm.. why?

Because some applications using libvirt would inevitably
start relying on the fact that such spare PHBs are
available, locking us into providing at least the same
number forever. In other words, increasing the amount at
a later time is always possible, but decreasing it isn't.
We did the same when we started automatically adding PCIe
Root Ports to q35 machines.

The rationale is that having a single spare hotpluggable
slot is extremely convenient for basic usage, eg. a simple
guest created by someone who's not necessarily very
familiar with virtualization; on the other hand, if you
are actually deploying in production you ought to conduct
proper capacity planning and figure out in advance how
many devices you're likely to need to hotplug throughout
the guest's life.

Of course this all will be moot once we can hotplug PHBs :)

-- 
Andrea Bolognani / Red Hat / Virtualization

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-12 Thread Greg Kurz
On Thu, 12 Jan 2017 17:19:40 +1100
Alexey Kardashevskiy  wrote:

> On 12/01/17 14:52, David Gibson wrote:
> > On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote:  
> >> On Thu, 5 Jan 2017 16:46:18 +1100
> >> David Gibson  wrote:
> >>  
> >>> There was a discussion back in November on the qemu list which spilled
> >>> onto the libvirt list about how to add support for PCIe devices to
> >>> POWER VMs, specifically 'pseries' machine type PAPR guests.
> >>>
> >>> Here's a more concrete proposal for how to handle part of this in
> >>> future from the libvirt side.  Strictly speaking what I'm suggesting
> >>> here isn't intrinsically linked to PCIe: it will make adding PCIe
> >>> support sanely easier, as well as having a number of advantages for
> >>> both PCIe and plain-PCI devices on PAPR guests.
> >>>
> >>> Background:
> >>>
> >>>  * Currently the pseries machine type only supports vanilla PCI
> >>>buses.
> >>> * This is a qemu limitation, not something inherent - PAPR guests
> >>>   running under PowerVM (the IBM hypervisor) can use passthrough
> >>>   PCIe devices (PowerVM doesn't emulate devices though).
> >>> * In fact the way PCI access is para-virtalized in PAPR makes the
> >>>   usual distinctions between PCI and PCIe largely disappear
> >>>  * Presentation of PCIe devices to PAPR guests is unusual
> >>> * Unlike x86 - and other "bare metal" platforms, root ports are
> >>>   not made visible to the guest. i.e. all devices (typically)
> >>>   appear as though they were integrated devices on x86
> >>> * In terms of topology all devices will appear in a way similar to
> >>>   a vanilla PCI bus, even PCIe devices
> >>>* However PCIe extended config space is accessible
> >>> * This means libvirt's usual placement of PCIe devices is not
> >>>   suitable for PAPR guests
> >>>  * PAPR has its own hotplug mechanism
> >>> * This is used instead of standard PCIe hotplug
> >>> * This mechanism works for both PCIe and vanilla-PCI devices
> >>> * This can hotplug/unplug devices even without a root port P2P
> >>>   bridge between it and the root "bus
> >>>  * Multiple independent host bridges are routine on PAPR
> >>> * Unlike PC (where all host bridges have multiplexed access to
> >>>   configuration space) PCI host bridges (PHBs) are truly
> >>>   independent for PAPR guests (disjoint MMIO regions in system
> >>>   address space)
> >>> * PowerVM typically presents a separate PHB to the guest for each
> >>>   host slot passed through
> >>>
> >>> The Proposal:
> >>>
> >>> I suggest that libvirt implement a new default algorithm for placing
> >>> (i.e. assigning addresses to) both PCI and PCIe devices for (only)
> >>> PAPR guests.
> >>>
> >>> The short summary is that by default it should assign each device to a
> >>> separate vPHB, creating vPHBs as necessary.
> >>>
> >>>   * For passthrough sometimes a group of host devices can't be safely
> >>> isolated from each other - this is known as a (host) Partitionable
> >>> Endpoint (PE).  In this case, if any device in the PE is passed
> >>> through to a guest, the whole PE must be passed through to the
> >>> same vPHB in the guest.  From the guest POV, each vPHB has exactly
> >>> one (guest) PE.
> >>>   * To allow for hotplugged devices, libvirt should also add a number
> >>> of additional, empty vPHBs (the PAPR spec allows for hotplug of
> >>> PHBs, but this is not yet implemented in qemu).  When hotplugging
> >>> a new device (or PE) libvirt should locate a vPHB which doesn't
> >>> currently contain anything.
> >>>   * libvirt should only (automatically) add PHBs - never root ports or
> >>> other PCI to PCI bridges
> >>>
> >>> In order to handle migration, the vPHBs will need to be represented in
> >>> the domain XML, which will also allow the user to override this
> >>> topology if they want.
> >>>
> >>> Advantages:
> >>>
> >>> There are still some details I need to figure out w.r.t. handling PCIe
> >>> devices (on both the qemu and libvirt sides).  However the fact that  
> >>
> >> One such detail may be that PCIe devices should have the
> >> "ibm,pci-config-space-type" property set to 1 in the DT,
> >> for the driver to be able to access the extended config
> >> space.  
> > 
> > So, we have a bit of an oddity here.  It looks like we currently set
> > 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual
> > device nodes.  Which, AFAICT, is simply incorrect in terms of PAPR.  
> 
> 
> I asked Paul how to read the spec and this is rather correct but not enough
> - having type=1 on a PHB means that extended access requests can go behind
> it but underlying devices and bridges still need to have type=1 if they
> support extended space. Having type set to 0 (or none at all) on a PHB
> would mean that extended config space is not available on anything under
> this PHB.
> 

I have the very same understanding of t

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-11 Thread Alexey Kardashevskiy
On 12/01/17 14:52, David Gibson wrote:
> On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote:
>> On Thu, 5 Jan 2017 16:46:18 +1100
>> David Gibson  wrote:
>>
>>> There was a discussion back in November on the qemu list which spilled
>>> onto the libvirt list about how to add support for PCIe devices to
>>> POWER VMs, specifically 'pseries' machine type PAPR guests.
>>>
>>> Here's a more concrete proposal for how to handle part of this in
>>> future from the libvirt side.  Strictly speaking what I'm suggesting
>>> here isn't intrinsically linked to PCIe: it will make adding PCIe
>>> support sanely easier, as well as having a number of advantages for
>>> both PCIe and plain-PCI devices on PAPR guests.
>>>
>>> Background:
>>>
>>>  * Currently the pseries machine type only supports vanilla PCI
>>>buses.
>>> * This is a qemu limitation, not something inherent - PAPR guests
>>>   running under PowerVM (the IBM hypervisor) can use passthrough
>>>   PCIe devices (PowerVM doesn't emulate devices though).
>>> * In fact the way PCI access is para-virtalized in PAPR makes the
>>>   usual distinctions between PCI and PCIe largely disappear
>>>  * Presentation of PCIe devices to PAPR guests is unusual
>>> * Unlike x86 - and other "bare metal" platforms, root ports are
>>>   not made visible to the guest. i.e. all devices (typically)
>>>   appear as though they were integrated devices on x86
>>> * In terms of topology all devices will appear in a way similar to
>>>   a vanilla PCI bus, even PCIe devices
>>>* However PCIe extended config space is accessible
>>> * This means libvirt's usual placement of PCIe devices is not
>>>   suitable for PAPR guests
>>>  * PAPR has its own hotplug mechanism
>>> * This is used instead of standard PCIe hotplug
>>> * This mechanism works for both PCIe and vanilla-PCI devices
>>> * This can hotplug/unplug devices even without a root port P2P
>>>   bridge between it and the root "bus
>>>  * Multiple independent host bridges are routine on PAPR
>>> * Unlike PC (where all host bridges have multiplexed access to
>>>   configuration space) PCI host bridges (PHBs) are truly
>>>   independent for PAPR guests (disjoint MMIO regions in system
>>>   address space)
>>> * PowerVM typically presents a separate PHB to the guest for each
>>>   host slot passed through
>>>
>>> The Proposal:
>>>
>>> I suggest that libvirt implement a new default algorithm for placing
>>> (i.e. assigning addresses to) both PCI and PCIe devices for (only)
>>> PAPR guests.
>>>
>>> The short summary is that by default it should assign each device to a
>>> separate vPHB, creating vPHBs as necessary.
>>>
>>>   * For passthrough sometimes a group of host devices can't be safely
>>> isolated from each other - this is known as a (host) Partitionable
>>> Endpoint (PE).  In this case, if any device in the PE is passed
>>> through to a guest, the whole PE must be passed through to the
>>> same vPHB in the guest.  From the guest POV, each vPHB has exactly
>>> one (guest) PE.
>>>   * To allow for hotplugged devices, libvirt should also add a number
>>> of additional, empty vPHBs (the PAPR spec allows for hotplug of
>>> PHBs, but this is not yet implemented in qemu).  When hotplugging
>>> a new device (or PE) libvirt should locate a vPHB which doesn't
>>> currently contain anything.
>>>   * libvirt should only (automatically) add PHBs - never root ports or
>>> other PCI to PCI bridges
>>>
>>> In order to handle migration, the vPHBs will need to be represented in
>>> the domain XML, which will also allow the user to override this
>>> topology if they want.
>>>
>>> Advantages:
>>>
>>> There are still some details I need to figure out w.r.t. handling PCIe
>>> devices (on both the qemu and libvirt sides).  However the fact that
>>
>> One such detail may be that PCIe devices should have the
>> "ibm,pci-config-space-type" property set to 1 in the DT,
>> for the driver to be able to access the extended config
>> space.
> 
> So, we have a bit of an oddity here.  It looks like we currently set
> 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual
> device nodes.  Which, AFAICT, is simply incorrect in terms of PAPR.


I asked Paul how to read the spec and this is rather correct but not enough
- having type=1 on a PHB means that extended access requests can go behind
it but underlying devices and bridges still need to have type=1 if they
support extended space. Having type set to 0 (or none at all) on a PHB
would mean that extended config space is not available on anything under
this PHB.



-- 
Alexey



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-11 Thread David Gibson
On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote:
> On Thu, 5 Jan 2017 16:46:18 +1100
> David Gibson  wrote:
> 
> > There was a discussion back in November on the qemu list which spilled
> > onto the libvirt list about how to add support for PCIe devices to
> > POWER VMs, specifically 'pseries' machine type PAPR guests.
> > 
> > Here's a more concrete proposal for how to handle part of this in
> > future from the libvirt side.  Strictly speaking what I'm suggesting
> > here isn't intrinsically linked to PCIe: it will make adding PCIe
> > support sanely easier, as well as having a number of advantages for
> > both PCIe and plain-PCI devices on PAPR guests.
> > 
> > Background:
> > 
> >  * Currently the pseries machine type only supports vanilla PCI
> >buses.
> > * This is a qemu limitation, not something inherent - PAPR guests
> >   running under PowerVM (the IBM hypervisor) can use passthrough
> >   PCIe devices (PowerVM doesn't emulate devices though).
> > * In fact the way PCI access is para-virtalized in PAPR makes the
> >   usual distinctions between PCI and PCIe largely disappear
> >  * Presentation of PCIe devices to PAPR guests is unusual
> > * Unlike x86 - and other "bare metal" platforms, root ports are
> >   not made visible to the guest. i.e. all devices (typically)
> >   appear as though they were integrated devices on x86
> > * In terms of topology all devices will appear in a way similar to
> >   a vanilla PCI bus, even PCIe devices
> >* However PCIe extended config space is accessible
> > * This means libvirt's usual placement of PCIe devices is not
> >   suitable for PAPR guests
> >  * PAPR has its own hotplug mechanism
> > * This is used instead of standard PCIe hotplug
> > * This mechanism works for both PCIe and vanilla-PCI devices
> > * This can hotplug/unplug devices even without a root port P2P
> >   bridge between it and the root "bus
> >  * Multiple independent host bridges are routine on PAPR
> > * Unlike PC (where all host bridges have multiplexed access to
> >   configuration space) PCI host bridges (PHBs) are truly
> >   independent for PAPR guests (disjoint MMIO regions in system
> >   address space)
> > * PowerVM typically presents a separate PHB to the guest for each
> >   host slot passed through
> > 
> > The Proposal:
> > 
> > I suggest that libvirt implement a new default algorithm for placing
> > (i.e. assigning addresses to) both PCI and PCIe devices for (only)
> > PAPR guests.
> > 
> > The short summary is that by default it should assign each device to a
> > separate vPHB, creating vPHBs as necessary.
> > 
> >   * For passthrough sometimes a group of host devices can't be safely
> > isolated from each other - this is known as a (host) Partitionable
> > Endpoint (PE).  In this case, if any device in the PE is passed
> > through to a guest, the whole PE must be passed through to the
> > same vPHB in the guest.  From the guest POV, each vPHB has exactly
> > one (guest) PE.
> >   * To allow for hotplugged devices, libvirt should also add a number
> > of additional, empty vPHBs (the PAPR spec allows for hotplug of
> > PHBs, but this is not yet implemented in qemu).  When hotplugging
> > a new device (or PE) libvirt should locate a vPHB which doesn't
> > currently contain anything.
> >   * libvirt should only (automatically) add PHBs - never root ports or
> > other PCI to PCI bridges
> > 
> > In order to handle migration, the vPHBs will need to be represented in
> > the domain XML, which will also allow the user to override this
> > topology if they want.
> > 
> > Advantages:
> > 
> > There are still some details I need to figure out w.r.t. handling PCIe
> > devices (on both the qemu and libvirt sides).  However the fact that
> 
> One such detail may be that PCIe devices should have the
> "ibm,pci-config-space-type" property set to 1 in the DT,
> for the driver to be able to access the extended config
> space.

So, we have a bit of an oddity here.  It looks like we currently set
'ibm,pci-config-space-type' to 1 in the PHB, rather than individual
device nodes.  Which, AFAICT, is simply incorrect in terms of PAPR.

I'm not actually sure if we need to set this dependent on whether the
device actually has extended config space, or if we could just set it
to 1 on every device on a PCIe capable PHB.

> > PAPR guests don't typically see PCIe root ports means that the normal
> > libvirt PCIe allocation scheme won't work.  This scheme has several
> > advantages with or without support for PCIe devices:
> > 
> >  * Better performance for 32-bit devices
> > 
> > With multiple devices on a single vPHB they all must share a (fairly
> > small) 32-bit DMA/IOMMU window.  With separate PHBs they each have a
> > separate window.  PAPR guests have an always-on guest visible IOMMU.
> > 
> >  * Better EEH handling for passthrough devices
> > 
> > EE

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-08 Thread David Gibson
On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote:
> On Thu, 5 Jan 2017 16:46:18 +1100
> David Gibson  wrote:
> 
> > There was a discussion back in November on the qemu list which spilled
> > onto the libvirt list about how to add support for PCIe devices to
> > POWER VMs, specifically 'pseries' machine type PAPR guests.
> > 
> > Here's a more concrete proposal for how to handle part of this in
> > future from the libvirt side.  Strictly speaking what I'm suggesting
> > here isn't intrinsically linked to PCIe: it will make adding PCIe
> > support sanely easier, as well as having a number of advantages for
> > both PCIe and plain-PCI devices on PAPR guests.
> > 
> > Background:
> > 
> >  * Currently the pseries machine type only supports vanilla PCI
> >buses.
> > * This is a qemu limitation, not something inherent - PAPR guests
> >   running under PowerVM (the IBM hypervisor) can use passthrough
> >   PCIe devices (PowerVM doesn't emulate devices though).
> > * In fact the way PCI access is para-virtalized in PAPR makes the
> >   usual distinctions between PCI and PCIe largely disappear
> >  * Presentation of PCIe devices to PAPR guests is unusual
> > * Unlike x86 - and other "bare metal" platforms, root ports are
> >   not made visible to the guest. i.e. all devices (typically)
> >   appear as though they were integrated devices on x86
> > * In terms of topology all devices will appear in a way similar to
> >   a vanilla PCI bus, even PCIe devices
> >* However PCIe extended config space is accessible
> > * This means libvirt's usual placement of PCIe devices is not
> >   suitable for PAPR guests
> >  * PAPR has its own hotplug mechanism
> > * This is used instead of standard PCIe hotplug
> > * This mechanism works for both PCIe and vanilla-PCI devices
> > * This can hotplug/unplug devices even without a root port P2P
> >   bridge between it and the root "bus
> >  * Multiple independent host bridges are routine on PAPR
> > * Unlike PC (where all host bridges have multiplexed access to
> >   configuration space) PCI host bridges (PHBs) are truly
> >   independent for PAPR guests (disjoint MMIO regions in system
> >   address space)
> > * PowerVM typically presents a separate PHB to the guest for each
> >   host slot passed through
> > 
> > The Proposal:
> > 
> > I suggest that libvirt implement a new default algorithm for placing
> > (i.e. assigning addresses to) both PCI and PCIe devices for (only)
> > PAPR guests.
> > 
> > The short summary is that by default it should assign each device to a
> > separate vPHB, creating vPHBs as necessary.
> > 
> >   * For passthrough sometimes a group of host devices can't be safely
> > isolated from each other - this is known as a (host) Partitionable
> > Endpoint (PE).  In this case, if any device in the PE is passed
> > through to a guest, the whole PE must be passed through to the
> > same vPHB in the guest.  From the guest POV, each vPHB has exactly
> > one (guest) PE.
> >   * To allow for hotplugged devices, libvirt should also add a number
> > of additional, empty vPHBs (the PAPR spec allows for hotplug of
> > PHBs, but this is not yet implemented in qemu).  When hotplugging
> > a new device (or PE) libvirt should locate a vPHB which doesn't
> > currently contain anything.
> >   * libvirt should only (automatically) add PHBs - never root ports or
> > other PCI to PCI bridges
> > 
> > In order to handle migration, the vPHBs will need to be represented in
> > the domain XML, which will also allow the user to override this
> > topology if they want.
> > 
> > Advantages:
> > 
> > There are still some details I need to figure out w.r.t. handling PCIe
> > devices (on both the qemu and libvirt sides).  However the fact that
> 
> One such detail may be that PCIe devices should have the
> "ibm,pci-config-space-type" property set to 1 in the DT,
> for the driver to be able to access the extended config
> space.


Right.

> > PAPR guests don't typically see PCIe root ports means that the normal
> > libvirt PCIe allocation scheme won't work.  This scheme has several
> > advantages with or without support for PCIe devices:
> > 
> >  * Better performance for 32-bit devices
> > 
> > With multiple devices on a single vPHB they all must share a (fairly
> > small) 32-bit DMA/IOMMU window.  With separate PHBs they each have a
> > separate window.  PAPR guests have an always-on guest visible IOMMU.
> > 
> >  * Better EEH handling for passthrough devices
> > 
> > EEH is an IBM hardware-assisted mechanism for isolating and safely
> > resetting devices experiencing hardware faults so they don't bring
> > down other devices or the system at large.  It's roughly similar to
> > PCIe AER in concept, but has a different IBM specific interface, and
> > works on both PCI and PCIe devices.
> > 
> > Currently the kernel interfaces for handling EEH eve

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-08 Thread David Gibson
On Fri, Jan 06, 2017 at 06:34:29PM +0100, Andrea Bolognani wrote:
> [Added Laine to CC, fixed qemu-devel address]
> 
> On Thu, 2017-01-05 at 16:46 +1100, David Gibson wrote:
> [...]
> >   * To allow for hotplugged devices, libvirt should also add a number
> > of additional, empty vPHBs (the PAPR spec allows for hotplug of
> > PHBs, but this is not yet implemented in qemu).
> 
> "A number" here will have to mean "one", same number of
> empty PCIe Root Ports libvirt will add to a newly-defined
> q35 guest.

Umm.. why?

> > When hotplugging
> > a new device (or PE) libvirt should locate a vPHB which doesn't
> > currently contain anything.
> 
> This will need to be a PHB-specific behavior, because at the
> moment libvirt will happily pick one of the empty slots in
> an existing PHB.

Exactly.  Well, whether it's PHB model specific or machine type
specific is up to you really.  We can only have PAPR PHBs on a PAPR
machine type, so it's kind of arbitrary.

> 
> >   * libvirt should only (automatically) add PHBs - never root ports or
> > other PCI to PCI bridges
> > 
> > In order to handle migration, the vPHBs will need to be represented in
> > the domain XML, which will also allow the user to override this
> > topology if they want.
> 
> We'll have to decide how to represent them in the XML, but
> that's basically your average bikeshedding.

Right.  Maybe we'd best get started with it, in the hopes of finishing
it in the forseeable future.

> Overall, the plan seems entirely reasonable to me.
> 
> It's pretty clear at this point that pseries guest are
> different enough in their handling of PCI that none of
> the address allocation algorithms currently implemented
> in libvirt could be quite adapted to work with it, so
> a custom one is in order.

Yes, that was my conclusion as well.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-06 Thread Andrea Bolognani
[Added Laine to CC, fixed qemu-devel address]

On Thu, 2017-01-05 at 16:46 +1100, David Gibson wrote:
[...]
>   * To allow for hotplugged devices, libvirt should also add a number
> of additional, empty vPHBs (the PAPR spec allows for hotplug of
> PHBs, but this is not yet implemented in qemu).

"A number" here will have to mean "one", same number of
empty PCIe Root Ports libvirt will add to a newly-defined
q35 guest.

> When hotplugging
> a new device (or PE) libvirt should locate a vPHB which doesn't
> currently contain anything.

This will need to be a PHB-specific behavior, because at the
moment libvirt will happily pick one of the empty slots in
an existing PHB.

>   * libvirt should only (automatically) add PHBs - never root ports or
> other PCI to PCI bridges
> 
> In order to handle migration, the vPHBs will need to be represented in
> the domain XML, which will also allow the user to override this
> topology if they want.

We'll have to decide how to represent them in the XML, but
that's basically your average bikeshedding.


Overall, the plan seems entirely reasonable to me.

It's pretty clear at this point that pseries guest are
different enough in their handling of PCI that none of
the address allocation algorithms currently implemented
in libvirt could be quite adapted to work with it, so
a custom one is in order.

-- 
Andrea Bolognani / Red Hat / Virtualization

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-06 Thread Greg Kurz
Resending because of bad qemu-devel address...

On Thu, 5 Jan 2017 16:46:18 +1100
David Gibson  wrote:

> There was a discussion back in November on the qemu list which spilled
> onto the libvirt list about how to add support for PCIe devices to
> POWER VMs, specifically 'pseries' machine type PAPR guests.
> 
> Here's a more concrete proposal for how to handle part of this in
> future from the libvirt side.  Strictly speaking what I'm suggesting
> here isn't intrinsically linked to PCIe: it will make adding PCIe
> support sanely easier, as well as having a number of advantages for
> both PCIe and plain-PCI devices on PAPR guests.
> 
> Background:
> 
>  * Currently the pseries machine type only supports vanilla PCI
>buses.
> * This is a qemu limitation, not something inherent - PAPR guests
>   running under PowerVM (the IBM hypervisor) can use passthrough
>   PCIe devices (PowerVM doesn't emulate devices though).
> * In fact the way PCI access is para-virtalized in PAPR makes the
>   usual distinctions between PCI and PCIe largely disappear
>  * Presentation of PCIe devices to PAPR guests is unusual
> * Unlike x86 - and other "bare metal" platforms, root ports are
>   not made visible to the guest. i.e. all devices (typically)
>   appear as though they were integrated devices on x86
> * In terms of topology all devices will appear in a way similar to
>   a vanilla PCI bus, even PCIe devices
>* However PCIe extended config space is accessible
> * This means libvirt's usual placement of PCIe devices is not
>   suitable for PAPR guests
>  * PAPR has its own hotplug mechanism
> * This is used instead of standard PCIe hotplug
> * This mechanism works for both PCIe and vanilla-PCI devices
> * This can hotplug/unplug devices even without a root port P2P
>   bridge between it and the root "bus
>  * Multiple independent host bridges are routine on PAPR
> * Unlike PC (where all host bridges have multiplexed access to
>   configuration space) PCI host bridges (PHBs) are truly
>   independent for PAPR guests (disjoint MMIO regions in system
>   address space)
> * PowerVM typically presents a separate PHB to the guest for each
>   host slot passed through
> 
> The Proposal:
> 
> I suggest that libvirt implement a new default algorithm for placing
> (i.e. assigning addresses to) both PCI and PCIe devices for (only)
> PAPR guests.
> 
> The short summary is that by default it should assign each device to a
> separate vPHB, creating vPHBs as necessary.
> 
>   * For passthrough sometimes a group of host devices can't be safely
> isolated from each other - this is known as a (host) Partitionable
> Endpoint (PE).  In this case, if any device in the PE is passed
> through to a guest, the whole PE must be passed through to the
> same vPHB in the guest.  From the guest POV, each vPHB has exactly
> one (guest) PE.
>   * To allow for hotplugged devices, libvirt should also add a number
> of additional, empty vPHBs (the PAPR spec allows for hotplug of
> PHBs, but this is not yet implemented in qemu).  When hotplugging
> a new device (or PE) libvirt should locate a vPHB which doesn't
> currently contain anything.
>   * libvirt should only (automatically) add PHBs - never root ports or
> other PCI to PCI bridges
> 
> In order to handle migration, the vPHBs will need to be represented in
> the domain XML, which will also allow the user to override this
> topology if they want.
> 
> Advantages:
> 
> There are still some details I need to figure out w.r.t. handling PCIe
> devices (on both the qemu and libvirt sides).  However the fact that

One such detail may be that PCIe devices should have the
"ibm,pci-config-space-type" property set to 1 in the DT,
for the driver to be able to access the extended config
space.

> PAPR guests don't typically see PCIe root ports means that the normal
> libvirt PCIe allocation scheme won't work.  This scheme has several
> advantages with or without support for PCIe devices:
> 
>  * Better performance for 32-bit devices
> 
> With multiple devices on a single vPHB they all must share a (fairly
> small) 32-bit DMA/IOMMU window.  With separate PHBs they each have a
> separate window.  PAPR guests have an always-on guest visible IOMMU.
> 
>  * Better EEH handling for passthrough devices
> 
> EEH is an IBM hardware-assisted mechanism for isolating and safely
> resetting devices experiencing hardware faults so they don't bring
> down other devices or the system at large.  It's roughly similar to
> PCIe AER in concept, but has a different IBM specific interface, and
> works on both PCI and PCIe devices.
> 
> Currently the kernel interfaces for handling EEH events on passthrough
> devices will only work if there is a single (host) iommu group in the
> vfio container.  While lifting that restriction would be nice, it's
> quite difficult to do so (it requires keeping state syn

Re: [libvirt] Proposal PCI/PCIe device placement on PAPR guests

2017-01-06 Thread Greg Kurz
On Thu, 5 Jan 2017 16:46:18 +1100
David Gibson  wrote:

> There was a discussion back in November on the qemu list which spilled
> onto the libvirt list about how to add support for PCIe devices to
> POWER VMs, specifically 'pseries' machine type PAPR guests.
> 
> Here's a more concrete proposal for how to handle part of this in
> future from the libvirt side.  Strictly speaking what I'm suggesting
> here isn't intrinsically linked to PCIe: it will make adding PCIe
> support sanely easier, as well as having a number of advantages for
> both PCIe and plain-PCI devices on PAPR guests.
> 
> Background:
> 
>  * Currently the pseries machine type only supports vanilla PCI
>buses.
> * This is a qemu limitation, not something inherent - PAPR guests
>   running under PowerVM (the IBM hypervisor) can use passthrough
>   PCIe devices (PowerVM doesn't emulate devices though).
> * In fact the way PCI access is para-virtalized in PAPR makes the
>   usual distinctions between PCI and PCIe largely disappear
>  * Presentation of PCIe devices to PAPR guests is unusual
> * Unlike x86 - and other "bare metal" platforms, root ports are
>   not made visible to the guest. i.e. all devices (typically)
>   appear as though they were integrated devices on x86
> * In terms of topology all devices will appear in a way similar to
>   a vanilla PCI bus, even PCIe devices
>* However PCIe extended config space is accessible
> * This means libvirt's usual placement of PCIe devices is not
>   suitable for PAPR guests
>  * PAPR has its own hotplug mechanism
> * This is used instead of standard PCIe hotplug
> * This mechanism works for both PCIe and vanilla-PCI devices
> * This can hotplug/unplug devices even without a root port P2P
>   bridge between it and the root "bus
>  * Multiple independent host bridges are routine on PAPR
> * Unlike PC (where all host bridges have multiplexed access to
>   configuration space) PCI host bridges (PHBs) are truly
>   independent for PAPR guests (disjoint MMIO regions in system
>   address space)
> * PowerVM typically presents a separate PHB to the guest for each
>   host slot passed through
> 
> The Proposal:
> 
> I suggest that libvirt implement a new default algorithm for placing
> (i.e. assigning addresses to) both PCI and PCIe devices for (only)
> PAPR guests.
> 
> The short summary is that by default it should assign each device to a
> separate vPHB, creating vPHBs as necessary.
> 
>   * For passthrough sometimes a group of host devices can't be safely
> isolated from each other - this is known as a (host) Partitionable
> Endpoint (PE).  In this case, if any device in the PE is passed
> through to a guest, the whole PE must be passed through to the
> same vPHB in the guest.  From the guest POV, each vPHB has exactly
> one (guest) PE.
>   * To allow for hotplugged devices, libvirt should also add a number
> of additional, empty vPHBs (the PAPR spec allows for hotplug of
> PHBs, but this is not yet implemented in qemu).  When hotplugging
> a new device (or PE) libvirt should locate a vPHB which doesn't
> currently contain anything.
>   * libvirt should only (automatically) add PHBs - never root ports or
> other PCI to PCI bridges
> 
> In order to handle migration, the vPHBs will need to be represented in
> the domain XML, which will also allow the user to override this
> topology if they want.
> 
> Advantages:
> 
> There are still some details I need to figure out w.r.t. handling PCIe
> devices (on both the qemu and libvirt sides).  However the fact that

One such detail may be that PCIe devices should have the
"ibm,pci-config-space-type" property set to 1 in the DT,
for the driver to be able to access the extended config
space.

> PAPR guests don't typically see PCIe root ports means that the normal
> libvirt PCIe allocation scheme won't work.  This scheme has several
> advantages with or without support for PCIe devices:
> 
>  * Better performance for 32-bit devices
> 
> With multiple devices on a single vPHB they all must share a (fairly
> small) 32-bit DMA/IOMMU window.  With separate PHBs they each have a
> separate window.  PAPR guests have an always-on guest visible IOMMU.
> 
>  * Better EEH handling for passthrough devices
> 
> EEH is an IBM hardware-assisted mechanism for isolating and safely
> resetting devices experiencing hardware faults so they don't bring
> down other devices or the system at large.  It's roughly similar to
> PCIe AER in concept, but has a different IBM specific interface, and
> works on both PCI and PCIe devices.
> 
> Currently the kernel interfaces for handling EEH events on passthrough
> devices will only work if there is a single (host) iommu group in the
> vfio container.  While lifting that restriction would be nice, it's
> quite difficult to do so (it requires keeping state synchronized
> between multiple host groups).  That