Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-19 Thread Andrea Bolognani
On Wed, 2016-12-14 at 20:26 +0200, Marcel Apfelbaum wrote:
> > > > > > Maybe I just don't quite get the relationship between Root
> > > > > > Complexes and Root Buses, but I guess my question is: what
> > > > > > is preventing us from simply doing whatever a
> > > > > > spapr-pci-host-bridge is doing in order to expose a legacy
> > > > > > PCI Root Bus (pci.*) to the guest, and create a new
> > > > > > spapr-pcie-host-bridge that exposes a PCIe Root Bus (pcie.*)
> > > > > > instead?
> > > > > 
> > > > > Hrm, the suggestion of providing both a vanilla-PCI and PCI-E host
> > > > > bridge came up before.  I think one of us spotted a problem with that,
> > > > > but I don't recall what it was now.  I guess one is how libvirt would
> > > > > map it's stupid-fake-domain-numbers to which root bus to use.
> > > 
> > > This would be a weird configuration, I never heard of something like that
> > > on a bare metal machine, but I never worked on pseries, who knows...
> > 
> > Which aspect?  Having multiple independent host bridges is perfectly
> > reasonable - x86 just doesn't do it well for rather stupid historical
> > reasons.
> 
> I agree about the multiple host-bridges, is actually what pxb/pxb-pcie
> devices (kind of) do.
> 
> I was talking about having one PCI PHB and another PHB which is PCI Express.
> I was referring to one system having both PCI and PCIe PHBs.

Sorry this confused you: what I was talking about was just
having both a legacy PCI PHB and a PCI Express PHB available
in QEMU, not using both for the same guest. The user would
pick one or the other and stick with it.

> > Again, one possible option here is to continue to treat pseries as
> > having a vanilla-PCI bus, but with a special flag saying that it's
> > magically able to connect PCI-E devices.
> 
> A PCIe bus supporting PCI devices is strange (QEMU allows it ...),
> but a PCI bus supporting PCIe devices is hard to "swallow".

I agree.

> I would say maybe make it a special case of a PCIe bus with different rules.
> It can derive from the PCIe bus class and override the usual behavior
> with PAPR specific rule which happen to be similar with the PCI bus rules.

It's pretty clear by now that libvirt will need to have some
special handling of pseries guests. Having to choose between
"weird legacy PCI" and "weird PCI Express", I think I'd rather
go with the latter :)

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-15 Thread Benjamin Herrenschmidt
On Wed, 2016-12-14 at 20:26 +0200, Marcel Apfelbaum wrote:
> > > The Root complex includes the PCI bus, some configuration
> > > registers if
> > > needed, provides access to the configuration space, translates
> > > relevant CPU
> > > reads/writes to PCI(e) transactions...
> > 
> > Do those configuration registers appear within PCI space, or
> > outside
> > it (e.g. raw MMIO or PIO registers)?
> > 
> 
> Root Complexes use MMIO to expose the PCI configuration space,
> they call it ECAM (enhanced configuration access mechanism) or
> MMConfig.

Not all of them do. There are plenty of PCIe RCs out there that
do config space using different mechanisms such as the good old
indirect address/data method, such as ours. IE. Even in real "bare
metal" powernv, where the root port is visible to Linux, we still
go through firmware for config space to get through to those
registers (among other things).

My PHB3 implementation (not upstream yet and a bit bitrotted by now)
exposed PCIe that way including extended config space and that was
working fine.

Cheers,
Ben.




Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-14 Thread Marcel Apfelbaum

On 12/14/2016 04:46 AM, David Gibson wrote:

On Tue, Dec 13, 2016 at 02:25:44PM +0200, Marcel Apfelbaum wrote:

On 12/07/2016 06:42 PM, Andrea Bolognani wrote:

[Added Marcel to CC]




Hi,

Sorry for the late reply.


On Wed, 2016-12-07 at 15:11 +1100, David Gibson wrote:

Is the difference between q35 and pseries guests with
respect to PCIe only relevant when it comes to assigned
devices, or in general? I'm asking this because you seem to
focus entirely on assigned devices.


Well, in a sense that's up to us.  The only existing model we have is
PowerVM, and PowerVM only does device passthrough, no emulated
devices.  PAPR doesn't really distinguish one way or the other, but
it's written from the perspective of assuming that all PCI devices
correspond to physical devices on the host


Okay, that makes sense.


On q35,
you'd generally expect physically separate (different slot) devices to
appear under separate root complexes.


This part I don't get at all, so please bear with me.

The way I read it you're claiming that eg. a SCSI controller
and a network adapter, being physically separate and assigned
to separate PCI slots, should have a dedicated PCIe Root
Complex each on a q35 guest.




Not a PCIe Root Complex, but a PCIe Root port.


Ah, sorry.  As I said, I've been pretty confused by all the terminology.


Right, my understanding was that if the devices were slotted, rather
than integrated, each one would sit under a separate root complex, the
root complex being a pseudo PCI to PCI bridge.


I assume "slotted" means "plugged into a slot that's not one
of those provided by pcie.0" or something along those lines.

More on the root complex bit later.


That doesn't match with my experience, where you would simply
assign them to separate slots of the default PCIe Root Bus
(pcie.0), eg. 00:01.0 and 00:02.0.


The qemu default, or the libvirt default?


I'm talking about the libvirt default, which is supposed to
follows Marcel's PCIe Guidelines.


I think this represents
treating the devices as though they were integrated devices in the
host bridge.  I believe on q35 they would not be hotpluggable




Correct. Please have a look to the new document
regarding pcie: docs/pcie.txt and the corresponding presentations.



Yeah, that's indeed not quite what libvirt would do by
default: in reality, there would be a ioh3420 between the
pcie.0 slots and each device exactly to enable hotplug.


but on
pseries they would be (because we don't use the standard hot plug
controller).


We can account for that in libvirt and avoid adding the
extra ioh3420s (or rather the upcoming generic PCIe Root
Ports) for pseries guests.


Maybe you're referring to the fact that you might want to
create multiple PCIe Root Complexes in order to assign the
host devices to separate guest NUMA nodes? How is creating
multiple PCIe Root Complexes on q35 using pxb-pcie different
than creating multiple PHBs using spapr-pci-host-bridge on
pseries?


Uh.. AIUI the root complex is the PCI to PCI bridge under which PCI-E
slots appear.  PXB is something different - essentially different host
bridges as you say (though with some weird hacks to access config
space, which make it dependent on the primary bus in a way which spapr
PHBs are not).

I'll admit I'm pretty confused myself about the exact distinction
between root complex, root port and upstream and downstream ports.


I think we both need to get our terminology straight :)
I'm sure Marcel will be happy to point us in the right
direction.

My understanding is that the PCIe Root Complex is the piece
of hardware that exposes a PCIe Root Bus (pcie.0 in QEMU);


right


Oh.. I wasn't as clear as I'd like to be on what the root complex is.
But I thought the root complex did have some guest visible presence in
the PCI tree.  What you're describing here seems equivalent to what
I'd call the PCI Host Bridge (== PHB).



Yes, a Root Complex is a type of Host Bridge in the sense it bridges
between CPU/Memory Controller ant the PCI subsystem.


PXBs can be connected to slots in pcie.0 to create more buses
that behave, for the most part, like pcie.0 and are mostly
useful to bind devices to specific NUMA nodes.


right

 Same applies

to legacy PCI with the pxb (instead of pxb-pcie) device.



pxb should not be used for PCIe machines, only for legacy PCI ones.


Noted.  And not for pseries at all.  Note that because we have a
para-virtualized platform (all PCI config access goes via hypercalls)
the distinction between PCI and PCI-E is much blurrier than in the x86
case.



OK


In a similar fashion, PHBs are the hardware thingies that
expose a PCI Root Bus (pci.0 and so on), the main difference
being that they are truly independent: so a q35 guest will
always have a "primary" PCIe Root Bus and (optionally) a
bunch of "secondary" ones, but the same will not be the case
for pseries guests.


OK


I don't think the difference is that important though, at
least from libvirt's point of view: whether you're 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-14 Thread Marcel Apfelbaum

On 12/13/2016 05:15 PM, Benjamin Herrenschmidt wrote:

On Tue, 2016-12-13 at 14:25 +0200, Marcel Apfelbaum wrote:

Hrm, the suggestion of providing both a vanilla-PCI and PCI-E host
bridge came up before.  I think one of us spotted a problem with that,
but I don't recall what it was now.  I guess one is how libvirt would
map it's stupid-fake-domain-numbers to which root bus to use.


This would be a weird configuration, I never heard of something like that
on a bare metal machine, but I never worked on pseries, who knows...


That's the thing though, it's *not* bare metal ;-)

On bare metal, we use the "powernv" platform for which we haven't yet upstreamed
the PCIE support, but there we have real PCIe root ports with all what's 
exepcted
on them.

They come on physically different PHB, one root complex per PHB, but they are
"proper" PCIe. Hotplug is a bit weird still because it has to go through some
FW interactions as the HW doesn't use the PCIe standard slot control bits (and
our qemu model doesn't handle it yet) but otherwise it's standard.



Thanks for the explanations, so bare metal is standard minus hotplug.


"pseries" on the other hand is a paravirtualized platform. It's the environment
presented to guests by our propritary hypervisor (PowerVM, aka pHyp) and
KVM/qemu. I think there's a public spec these days but to cut the story short,
it doesn't expose PCIe "properly" for bad historical reasons.

It tends to hide the parent, ie, the root port for slots connected directly to
a PHB or the entire hierarchy from the root to (and including) the downstream
switch port for slots under as switch.

So you end up with PCIe devices devoid of a proper "parent" which is a bit
weird.



OK, now is more "clear". I can get back to read the thread from the start :)

Thanks,
Marcel


Hotplug is implemented using specific firmware interfaces that are identical
with pre-PCIe (ie PCI/PCI-X) systems. The FW has interface for supporting 3
kinds of hotplug:

   - Entire PHBs
   - P2P Bridges
   - Individual devices on an existing PHB or P2P bridge

In practice these days mostly the first one is used for everything under pHyp,
though I think we have a mix of 1 and 3 for KVM/qemu.

Cheers,
Ben.






Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-13 Thread David Gibson
On Tue, Dec 13, 2016 at 02:25:44PM +0200, Marcel Apfelbaum wrote:
> On 12/07/2016 06:42 PM, Andrea Bolognani wrote:
> > [Added Marcel to CC]
> > 
> 
> 
> Hi,
> 
> Sorry for the late reply.
> 
> > On Wed, 2016-12-07 at 15:11 +1100, David Gibson wrote:
> > > > Is the difference between q35 and pseries guests with
> > > > respect to PCIe only relevant when it comes to assigned
> > > > devices, or in general? I'm asking this because you seem to
> > > > focus entirely on assigned devices.
> > > 
> > > Well, in a sense that's up to us.  The only existing model we have is
> > > PowerVM, and PowerVM only does device passthrough, no emulated
> > > devices.  PAPR doesn't really distinguish one way or the other, but
> > > it's written from the perspective of assuming that all PCI devices
> > > correspond to physical devices on the host
> > 
> > Okay, that makes sense.
> > 
> > > > > On q35,
> > > > > you'd generally expect physically separate (different slot) devices to
> > > > > appear under separate root complexes.
> > > > 
> > > > This part I don't get at all, so please bear with me.
> > > > 
> > > > The way I read it you're claiming that eg. a SCSI controller
> > > > and a network adapter, being physically separate and assigned
> > > > to separate PCI slots, should have a dedicated PCIe Root
> > > > Complex each on a q35 guest.
> > > 
> 
> Not a PCIe Root Complex, but a PCIe Root port.

Ah, sorry.  As I said, I've been pretty confused by all the terminology.

> > > Right, my understanding was that if the devices were slotted, rather
> > > than integrated, each one would sit under a separate root complex, the
> > > root complex being a pseudo PCI to PCI bridge.
> > 
> > I assume "slotted" means "plugged into a slot that's not one
> > of those provided by pcie.0" or something along those lines.
> > 
> > More on the root complex bit later.
> > 
> > > > That doesn't match with my experience, where you would simply
> > > > assign them to separate slots of the default PCIe Root Bus
> > > > (pcie.0), eg. 00:01.0 and 00:02.0.
> > > 
> > > The qemu default, or the libvirt default?
> > 
> > I'm talking about the libvirt default, which is supposed to
> > follows Marcel's PCIe Guidelines.
> > 
> > > I think this represents
> > > treating the devices as though they were integrated devices in the
> > > host bridge.  I believe on q35 they would not be hotpluggable
> > 
> 
> Correct. Please have a look to the new document
> regarding pcie: docs/pcie.txt and the corresponding presentations.
> 
> 
> > Yeah, that's indeed not quite what libvirt would do by
> > default: in reality, there would be a ioh3420 between the
> > pcie.0 slots and each device exactly to enable hotplug.
> > 
> > > but on
> > > pseries they would be (because we don't use the standard hot plug
> > > controller).
> > 
> > We can account for that in libvirt and avoid adding the
> > extra ioh3420s (or rather the upcoming generic PCIe Root
> > Ports) for pseries guests.
> > 
> > > > Maybe you're referring to the fact that you might want to
> > > > create multiple PCIe Root Complexes in order to assign the
> > > > host devices to separate guest NUMA nodes? How is creating
> > > > multiple PCIe Root Complexes on q35 using pxb-pcie different
> > > > than creating multiple PHBs using spapr-pci-host-bridge on
> > > > pseries?
> > > 
> > > Uh.. AIUI the root complex is the PCI to PCI bridge under which PCI-E
> > > slots appear.  PXB is something different - essentially different host
> > > bridges as you say (though with some weird hacks to access config
> > > space, which make it dependent on the primary bus in a way which spapr
> > > PHBs are not).
> > > 
> > > I'll admit I'm pretty confused myself about the exact distinction
> > > between root complex, root port and upstream and downstream ports.
> > 
> > I think we both need to get our terminology straight :)
> > I'm sure Marcel will be happy to point us in the right
> > direction.
> > 
> > My understanding is that the PCIe Root Complex is the piece
> > of hardware that exposes a PCIe Root Bus (pcie.0 in QEMU);
> 
> right

Oh.. I wasn't as clear as I'd like to be on what the root complex is.
But I thought the root complex did have some guest visible presence in
the PCI tree.  What you're describing here seems equivalent to what
I'd call the PCI Host Bridge (== PHB).

> > PXBs can be connected to slots in pcie.0 to create more buses
> > that behave, for the most part, like pcie.0 and are mostly
> > useful to bind devices to specific NUMA nodes.
> 
> right
> 
>  Same applies
> > to legacy PCI with the pxb (instead of pxb-pcie) device.
> > 
> 
> pxb should not be used for PCIe machines, only for legacy PCI ones.

Noted.  And not for pseries at all.  Note that because we have a
para-virtualized platform (all PCI config access goes via hypercalls)
the distinction between PCI and PCI-E is much blurrier than in the x86
case.

> > In a similar fashion, PHBs are the hardware thingies that
> > expose a PCI Root Bus (pci.0 and 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-13 Thread David Gibson
On Tue, Dec 13, 2016 at 09:15:37AM -0600, Benjamin Herrenschmidt wrote:
> On Tue, 2016-12-13 at 14:25 +0200, Marcel Apfelbaum wrote:
> > > > Hrm, the suggestion of providing both a vanilla-PCI and PCI-E host
> > > > bridge came up before.  I think one of us spotted a problem with that,
> > > > but I don't recall what it was now.  I guess one is how libvirt would
> > > > map it's stupid-fake-domain-numbers to which root bus to use.
> > 
> > This would be a weird configuration, I never heard of something like that
> > on a bare metal machine, but I never worked on pseries, who knows...
> 
> That's the thing though, it's *not* bare metal ;-)
> 
> On bare metal, we use the "powernv" platform for which we haven't yet 
> upstreamed
> the PCIE support, but there we have real PCIe root ports with all what's 
> exepcted
> on them.
> 
> They come on physically different PHB, one root complex per PHB, but they are
> "proper" PCIe. Hotplug is a bit weird still because it has to go through some
> FW interactions as the HW doesn't use the PCIe standard slot control bits (and
> our qemu model doesn't handle it yet) but otherwise it's standard.
> 
> "pseries" on the other hand is a paravirtualized platform. It's the 
> environment
> presented to guests by our propritary hypervisor (PowerVM, aka pHyp) and
> KVM/qemu. I think there's a public spec these days but to cut the story short,
> it doesn't expose PCIe "properly" for bad historical reasons.
> 
> It tends to hide the parent, ie, the root port for slots connected directly to
> a PHB or the entire hierarchy from the root to (and including) the downstream
> switch port for slots under as switch.
> 
> So you end up with PCIe devices devoid of a proper "parent" which is a bit
> weird.
> 
> Hotplug is implemented using specific firmware interfaces that are identical
> with pre-PCIe (ie PCI/PCI-X) systems. The FW has interface for supporting 3
> kinds of hotplug:
> 
>- Entire PHBs
>- P2P Bridges
>- Individual devices on an existing PHB or P2P bridge
> 
> In practice these days mostly the first one is used for everything under pHyp,
> though I think we have a mix of 1 and 3 for KVM/qemu.

Alas, no, we only have 3 on KVM/qemu.  Well, possibly you could do (2)
by plugging a bridge device using (3) then a device onto the bridge.
Whole PHB hotplug has never been implemented for qemu/KVM.. but we
probably want to.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-13 Thread Benjamin Herrenschmidt
On Tue, 2016-12-13 at 14:25 +0200, Marcel Apfelbaum wrote:
> > > Hrm, the suggestion of providing both a vanilla-PCI and PCI-E host
> > > bridge came up before.  I think one of us spotted a problem with that,
> > > but I don't recall what it was now.  I guess one is how libvirt would
> > > map it's stupid-fake-domain-numbers to which root bus to use.
> 
> This would be a weird configuration, I never heard of something like that
> on a bare metal machine, but I never worked on pseries, who knows...

That's the thing though, it's *not* bare metal ;-)

On bare metal, we use the "powernv" platform for which we haven't yet upstreamed
the PCIE support, but there we have real PCIe root ports with all what's 
exepcted
on them.

They come on physically different PHB, one root complex per PHB, but they are
"proper" PCIe. Hotplug is a bit weird still because it has to go through some
FW interactions as the HW doesn't use the PCIe standard slot control bits (and
our qemu model doesn't handle it yet) but otherwise it's standard.

"pseries" on the other hand is a paravirtualized platform. It's the environment
presented to guests by our propritary hypervisor (PowerVM, aka pHyp) and
KVM/qemu. I think there's a public spec these days but to cut the story short,
it doesn't expose PCIe "properly" for bad historical reasons.

It tends to hide the parent, ie, the root port for slots connected directly to
a PHB or the entire hierarchy from the root to (and including) the downstream
switch port for slots under as switch.

So you end up with PCIe devices devoid of a proper "parent" which is a bit
weird.

Hotplug is implemented using specific firmware interfaces that are identical
with pre-PCIe (ie PCI/PCI-X) systems. The FW has interface for supporting 3
kinds of hotplug:

   - Entire PHBs
   - P2P Bridges
   - Individual devices on an existing PHB or P2P bridge

In practice these days mostly the first one is used for everything under pHyp,
though I think we have a mix of 1 and 3 for KVM/qemu.

Cheers,
Ben.




Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-13 Thread Greg Kurz
On Tue, 13 Dec 2016 14:25:44 +0200
Marcel Apfelbaum  wrote:

> >> Now... from what Laine was saying it sounds like more of the
> >> differences between PCI-E placement and PCI placement may be
> >> implemented by libvirt than qemu than I realized.  So possibly we do
> >> want to make the bus be PCI-E on the qemu side, but have libvirt use
> >> the vanilla-PCI placement guidelines rather than PCI-E for pseries.  
> >
> > Basically the special casing I was mentioning earlier.  
> 
> That looks complicated.. I wish I would no more about the pseries
> PCIe stuff, does any one know where I can get the info ? (besides 'google 
> it'...)

This is described in the LoPAPR specification, available at:

https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf

Cheers.

--
Greg



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-13 Thread Marcel Apfelbaum

On 12/07/2016 06:42 PM, Andrea Bolognani wrote:

[Added Marcel to CC]




Hi,

Sorry for the late reply.


On Wed, 2016-12-07 at 15:11 +1100, David Gibson wrote:

Is the difference between q35 and pseries guests with
respect to PCIe only relevant when it comes to assigned
devices, or in general? I'm asking this because you seem to
focus entirely on assigned devices.


Well, in a sense that's up to us.  The only existing model we have is
PowerVM, and PowerVM only does device passthrough, no emulated
devices.  PAPR doesn't really distinguish one way or the other, but
it's written from the perspective of assuming that all PCI devices
correspond to physical devices on the host


Okay, that makes sense.


On q35,
you'd generally expect physically separate (different slot) devices to
appear under separate root complexes.


This part I don't get at all, so please bear with me.

The way I read it you're claiming that eg. a SCSI controller
and a network adapter, being physically separate and assigned
to separate PCI slots, should have a dedicated PCIe Root
Complex each on a q35 guest.




Not a PCIe Root Complex, but a PCIe Root port.


Right, my understanding was that if the devices were slotted, rather
than integrated, each one would sit under a separate root complex, the
root complex being a pseudo PCI to PCI bridge.


I assume "slotted" means "plugged into a slot that's not one
of those provided by pcie.0" or something along those lines.

More on the root complex bit later.


That doesn't match with my experience, where you would simply
assign them to separate slots of the default PCIe Root Bus
(pcie.0), eg. 00:01.0 and 00:02.0.


The qemu default, or the libvirt default?


I'm talking about the libvirt default, which is supposed to
follows Marcel's PCIe Guidelines.


I think this represents
treating the devices as though they were integrated devices in the
host bridge.  I believe on q35 they would not be hotpluggable




Correct. Please have a look to the new document
regarding pcie: docs/pcie.txt and the corresponding presentations.



Yeah, that's indeed not quite what libvirt would do by
default: in reality, there would be a ioh3420 between the
pcie.0 slots and each device exactly to enable hotplug.


but on
pseries they would be (because we don't use the standard hot plug
controller).


We can account for that in libvirt and avoid adding the
extra ioh3420s (or rather the upcoming generic PCIe Root
Ports) for pseries guests.


Maybe you're referring to the fact that you might want to
create multiple PCIe Root Complexes in order to assign the
host devices to separate guest NUMA nodes? How is creating
multiple PCIe Root Complexes on q35 using pxb-pcie different
than creating multiple PHBs using spapr-pci-host-bridge on
pseries?


Uh.. AIUI the root complex is the PCI to PCI bridge under which PCI-E
slots appear.  PXB is something different - essentially different host
bridges as you say (though with some weird hacks to access config
space, which make it dependent on the primary bus in a way which spapr
PHBs are not).

I'll admit I'm pretty confused myself about the exact distinction
between root complex, root port and upstream and downstream ports.


I think we both need to get our terminology straight :)
I'm sure Marcel will be happy to point us in the right
direction.

My understanding is that the PCIe Root Complex is the piece
of hardware that exposes a PCIe Root Bus (pcie.0 in QEMU);


right


PXBs can be connected to slots in pcie.0 to create more buses
that behave, for the most part, like pcie.0 and are mostly
useful to bind devices to specific NUMA nodes.


right

 Same applies

to legacy PCI with the pxb (instead of pxb-pcie) device.



pxb should not be used for PCIe machines, only for legacy PCI ones.


In a similar fashion, PHBs are the hardware thingies that
expose a PCI Root Bus (pci.0 and so on), the main difference
being that they are truly independent: so a q35 guest will
always have a "primary" PCIe Root Bus and (optionally) a
bunch of "secondary" ones, but the same will not be the case
for pseries guests.



OK


I don't think the difference is that important though, at
least from libvirt's point of view: whether you're creating
a pseries guest with two PHBs, or a q35 guest with its
built-in PCIe Root Complex and an extra PCIe Expander Bus,
you will end up with two "top level" buses that you can plug
more devices into.


I agree

 If we had spapr-pcie-host-bridge, we

could treat them mostly the same - with caveats such as the
one described above, of course.


Whereas on pseries they'll
appear as siblings on a virtual bus (which makes no physical sense for
point-to-point PCI-E).


What is the virtual bus in question? Why would it matter
that they're siblings?


On pseries it won't.  But my understanding is that libvirt won't
create them that way on q35 - instead it will insert the RCs / P2P
bridges to allow them to be hotplugged.  Inserting that bridge may
confuse pseries guests which 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-07 Thread Andrea Bolognani
[Added Marcel to CC]

On Wed, 2016-12-07 at 15:11 +1100, David Gibson wrote:
> > Is the difference between q35 and pseries guests with
> > respect to PCIe only relevant when it comes to assigned
> > devices, or in general? I'm asking this because you seem to
> > focus entirely on assigned devices.
> 
> Well, in a sense that's up to us.  The only existing model we have is
> PowerVM, and PowerVM only does device passthrough, no emulated
> devices.  PAPR doesn't really distinguish one way or the other, but
> it's written from the perspective of assuming that all PCI devices
> correspond to physical devices on the host

Okay, that makes sense.

> > > On q35,
> > > you'd generally expect physically separate (different slot) devices to
> > > appear under separate root complexes.
> > 
> > This part I don't get at all, so please bear with me.
> > 
> > The way I read it you're claiming that eg. a SCSI controller
> > and a network adapter, being physically separate and assigned
> > to separate PCI slots, should have a dedicated PCIe Root
> > Complex each on a q35 guest.
> 
> Right, my understanding was that if the devices were slotted, rather
> than integrated, each one would sit under a separate root complex, the
> root complex being a pseudo PCI to PCI bridge.

I assume "slotted" means "plugged into a slot that's not one
of those provided by pcie.0" or something along those lines.

More on the root complex bit later.

> > That doesn't match with my experience, where you would simply
> > assign them to separate slots of the default PCIe Root Bus
> > (pcie.0), eg. 00:01.0 and 00:02.0.
> 
> The qemu default, or the libvirt default?

I'm talking about the libvirt default, which is supposed to
follows Marcel's PCIe Guidelines.

> I think this represents
> treating the devices as though they were integrated devices in the
> host bridge.  I believe on q35 they would not be hotpluggable

Yeah, that's indeed not quite what libvirt would do by
default: in reality, there would be a ioh3420 between the
pcie.0 slots and each device exactly to enable hotplug.

> but on
> pseries they would be (because we don't use the standard hot plug
> controller).

We can account for that in libvirt and avoid adding the
extra ioh3420s (or rather the upcoming generic PCIe Root
Ports) for pseries guests.

> > Maybe you're referring to the fact that you might want to
> > create multiple PCIe Root Complexes in order to assign the
> > host devices to separate guest NUMA nodes? How is creating
> > multiple PCIe Root Complexes on q35 using pxb-pcie different
> > than creating multiple PHBs using spapr-pci-host-bridge on
> > pseries?
> 
> Uh.. AIUI the root complex is the PCI to PCI bridge under which PCI-E
> slots appear.  PXB is something different - essentially different host
> bridges as you say (though with some weird hacks to access config
> space, which make it dependent on the primary bus in a way which spapr
> PHBs are not).
> 
> I'll admit I'm pretty confused myself about the exact distinction
> between root complex, root port and upstream and downstream ports.

I think we both need to get our terminology straight :)
I'm sure Marcel will be happy to point us in the right
direction.

My understanding is that the PCIe Root Complex is the piece
of hardware that exposes a PCIe Root Bus (pcie.0 in QEMU);
PXBs can be connected to slots in pcie.0 to create more buses
that behave, for the most part, like pcie.0 and are mostly
useful to bind devices to specific NUMA nodes. Same applies
to legacy PCI with the pxb (instead of pxb-pcie) device.

In a similar fashion, PHBs are the hardware thingies that
expose a PCI Root Bus (pci.0 and so on), the main difference
being that they are truly independent: so a q35 guest will
always have a "primary" PCIe Root Bus and (optionally) a
bunch of "secondary" ones, but the same will not be the case
for pseries guests.

I don't think the difference is that important though, at
least from libvirt's point of view: whether you're creating
a pseries guest with two PHBs, or a q35 guest with its
built-in PCIe Root Complex and an extra PCIe Expander Bus,
you will end up with two "top level" buses that you can plug
more devices into. If we had spapr-pcie-host-bridge, we
could treat them mostly the same - with caveats such as the
one described above, of course.

> > > Whereas on pseries they'll
> > > appear as siblings on a virtual bus (which makes no physical sense for
> > > point-to-point PCI-E).
> > 
> > What is the virtual bus in question? Why would it matter
> > that they're siblings?
> 
> On pseries it won't.  But my understanding is that libvirt won't
> create them that way on q35 - instead it will insert the RCs / P2P
> bridges to allow them to be hotplugged.  Inserting that bridge may
> confuse pseries guests which aren't expecting it.

libvirt will automatically add PCIe Root Ports to make the
devices hotpluggable on q35 guests, yes. But, as mentioned
above, we can teach it not to.

> > I'm possibly missing the 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-06 Thread David Gibson
On Tue, Dec 06, 2016 at 06:30:47PM +0100, Andrea Bolognani wrote:
> On Fri, 2016-12-02 at 15:18 +1100, David Gibson wrote:
> > > So, would the PCIe Root Bus in a pseries guest behave
> > > differently than the one in a q35 or mach-virt guest?
> > 
> > Yes.  I had a long discussion with BenH and got a somewhat better idea
> > about this.
> 
> Sorry, but I'm afraid you're going to have to break this
> down even further for me :(
> 
> > If only a single host PE (== iommu group) is passed through and there
> > are no emulated devices, the difference isn't too bad: basically on
> > pseries you'll see the subtree that would be below the root complex on
> > q35.
> > 
> > But if you pass through multiple groups, things get weird.
> 
> Is the difference between q35 and pseries guests with
> respect to PCIe only relevant when it comes to assigned
> devices, or in general? I'm asking this because you seem to
> focus entirely on assigned devices.

Well, in a sense that's up to us.  The only existing model we have is
PowerVM, and PowerVM only does device passthrough, no emulated
devices.  PAPR doesn't really distinguish one way or the other, but
it's written from the perspective of assuming that all PCI devices
correspond to physical devices on the host

> > On q35,
> > you'd generally expect physically separate (different slot) devices to
> > appear under separate root complexes.
> 
> This part I don't get at all, so please bear with me.
> 
> The way I read it you're claiming that eg. a SCSI controller
> and a network adapter, being physically separate and assigned
> to separate PCI slots, should have a dedicated PCIe Root
> Complex each on a q35 guest.

Right, my understanding was that if the devices were slotted, rather
than integrated, each one would sit under a separate root complex, the
root complex being a pseudo PCI to PCI bridge.

> That doesn't match with my experience, where you would simply
> assign them to separate slots of the default PCIe Root Bus
> (pcie.0), eg. 00:01.0 and 00:02.0.

The qemu default, or the libvirt default?  I think this represents
treating the devices as though they were integrated devices in the
host bridge.  I believe on q35 they would not be hotpluggable - but on
pseries they would be (because we don't use the standard hot plug
controller).

> Maybe you're referring to the fact that you might want to
> create multiple PCIe Root Complexes in order to assign the
> host devices to separate guest NUMA nodes? How is creating
> multiple PCIe Root Complexes on q35 using pxb-pcie different
> than creating multiple PHBs using spapr-pci-host-bridge on
> pseries?

Uh.. AIUI the root complex is the PCI to PCI bridge under which PCI-E
slots appear.  PXB is something different - essentially different host
bridges as you say (though with some weird hacks to access config
space, which make it dependent on the primary bus in a way which spapr
PHBs are not).

I'll admit I'm pretty confused myself about the exact distinction
between root complex, root port and upstream and downstream ports.

> > Whereas on pseries they'll
> > appear as siblings on a virtual bus (which makes no physical sense for
> > point-to-point PCI-E).
> 
> What is the virtual bus in question? Why would it matter
> that they're siblings?

On pseries it won't.  But my understanding is that libvirt won't
create them that way on q35 - instead it will insert the RCs / P2P
bridges to allow them to be hotplugged.  Inserting that bridge may
confuse pseries guests which aren't expecting it.

> I'm possibly missing the point entirely, but so far it
> looks to me like there are different configurations you
> might want to use depending on your goal, and both q35
> and pseries give you comparable tools to achieve such
> configurations.
> 
> > I suppose we could try treating all devices on pseries as though they
> > were chipset builtin devices on q35, which will appear on the root
> > PCI-E bus without root complex.  But I suspect that's likely to cause
> > trouble with hotplug, and it will certainly need different address
> > allocation from libvirt.
> 
> PCIe Integrated Endpoint Devices are not hotpluggable on
> q35, that's why libvirt will follow QEMU's PCIe topology
> recommendations and place a PCIe Root Port between them;
> I assume the same could be done for pseries guests as
> soon as QEMU grows support for generic PCIe Root Ports,
> something Marcel has already posted patches for.

Here you've hit on it.  No, we should not do that for pseries,
AFAICT.  PAPR doesn't really have the concept of integrated endpoint
devices, and all devices can be hotplugged via the PAPR mechanisms
(and none can via the PCI-E standard hotplug mechanism).

> Again, sorry for clearly misunderstanding your explanation,
> but I'm still not seeing the issue here. I'm sure it's very
> clear in your mind, but I'm afraid you're going to have to
> walk me through it :(

I wish it were entirely clear in my mind.  Like I say I'm still pretty
confused by exactly 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-06 Thread Andrea Bolognani
On Fri, 2016-12-02 at 15:18 +1100, David Gibson wrote:
> > So, would the PCIe Root Bus in a pseries guest behave
> > differently than the one in a q35 or mach-virt guest?
> 
> Yes.  I had a long discussion with BenH and got a somewhat better idea
> about this.

Sorry, but I'm afraid you're going to have to break this
down even further for me :(

> If only a single host PE (== iommu group) is passed through and there
> are no emulated devices, the difference isn't too bad: basically on
> pseries you'll see the subtree that would be below the root complex on
> q35.
> 
> But if you pass through multiple groups, things get weird.

Is the difference between q35 and pseries guests with
respect to PCIe only relevant when it comes to assigned
devices, or in general? I'm asking this because you seem to
focus entirely on assigned devices.

> On q35,
> you'd generally expect physically separate (different slot) devices to
> appear under separate root complexes.

This part I don't get at all, so please bear with me.

The way I read it you're claiming that eg. a SCSI controller
and a network adapter, being physically separate and assigned
to separate PCI slots, should have a dedicated PCIe Root
Complex each on a q35 guest.

That doesn't match with my experience, where you would simply
assign them to separate slots of the default PCIe Root Bus
(pcie.0), eg. 00:01.0 and 00:02.0.

Maybe you're referring to the fact that you might want to
create multiple PCIe Root Complexes in order to assign the
host devices to separate guest NUMA nodes? How is creating
multiple PCIe Root Complexes on q35 using pxb-pcie different
than creating multiple PHBs using spapr-pci-host-bridge on
pseries?

> Whereas on pseries they'll
> appear as siblings on a virtual bus (which makes no physical sense for
> point-to-point PCI-E).

What is the virtual bus in question? Why would it matter
that they're siblings?

I'm possibly missing the point entirely, but so far it
looks to me like there are different configurations you
might want to use depending on your goal, and both q35
and pseries give you comparable tools to achieve such
configurations.

> I suppose we could try treating all devices on pseries as though they
> were chipset builtin devices on q35, which will appear on the root
> PCI-E bus without root complex.  But I suspect that's likely to cause
> trouble with hotplug, and it will certainly need different address
> allocation from libvirt.

PCIe Integrated Endpoint Devices are not hotpluggable on
q35, that's why libvirt will follow QEMU's PCIe topology
recommendations and place a PCIe Root Port between them;
I assume the same could be done for pseries guests as
soon as QEMU grows support for generic PCIe Root Ports,
something Marcel has already posted patches for.

Again, sorry for clearly misunderstanding your explanation,
but I'm still not seeing the issue here. I'm sure it's very
clear in your mind, but I'm afraid you're going to have to
walk me through it :(

> > Regardless of how we decide to move forward with the
> > PCIe-enabled pseries machine type, libvirt will have to
> > know about this so it can behave appropriately.
> 
> So there are kind of two extremes of how to address this.  There are a
> variety of options in between, but I suspect they're going to be even
> more muddled and hideous than the extremes.
> 
> 1) Give up.  You said there's already a flag that says a PCI-E bus is
> able to accept vanilla-PCI devices.  We add a hack flag that says a
> vanilla-PCI bus is able to accept PCI-E devices.  We keep address
> allocation as it is now - the pseries topology really does resemble
> vanilla-PCI much better than it does PCI-E. But, we allow PCI-E
> devices, and PAPR has mechanisms for accessing the extended config
> space.  PCI-E standard hotplug and error reporting will never work,
> but PAPR provides its own mechanisms for those, so that should be ok.

We can definitely special-case pseries guests and take
the "anything goes" approach to PCI vs PCIe, but it would
certainly be nicer if we could avoid presenting our users
the head-scratching situation of PCIe devices being plugged
into legacy PCI slots and still showing up as PCIe in the
guest.

What about virtio devices, which present themselves either
as legacy PCI or PCIe depending on the kind of slot they
are plugged into? Would they show up as PCIe or legacy PCI
on a PCIe-enabled pseries guest?

> 2) Start exposing the PCI-E heirarchy for pseries guests much more
> like q35, root complexes and all.  It's not clear that PAPR actually
> *forbids* exposing the root complex, it just doesn't require it and
> that's not what PowerVM does.  But.. there are big questions about
> whether existing guests will cope with this or not.  When you start
> adding in multiple passed through devices and particularly virtual
> functions as well, things could get very ugly - we might need to
> construct multiple emulated virtual root complexes or other messes.
> 
> In the short to medium 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-02 Thread Alexey Kardashevskiy
On 03/12/16 08:41, Benjamin Herrenschmidt wrote:
> On Fri, 2016-12-02 at 16:50 +1100, David Gibson wrote:
>>
>> Uh.. I don't entirely follow you.  From the host point of view there
>> are multiple iommu groups (PEs), but from the guest point of view
>> there's only one.  On the guest side iommu granularity is always
>> per-vPHB.
> 
> Ok so the H_PUT_TCE calls affect all the underlying tables ?

Yes, it updates all hardware tables attached to the same guest's LIOBN
(which in reality can only be seen on power7 as power8 shares tables
between PEs).


-- 
Alexey



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-02 Thread Benjamin Herrenschmidt
On Fri, 2016-12-02 at 16:50 +1100, David Gibson wrote:
> 
> Uh.. I don't entirely follow you.  From the host point of view there
> are multiple iommu groups (PEs), but from the guest point of view
> there's only one.  On the guest side iommu granularity is always
> per-vPHB.

Ok so the H_PUT_TCE calls affect all the underlying tables ?

Cheers,
Ben.




Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-02 Thread David Gibson
On Fri, Dec 02, 2016 at 04:17:50PM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2016-12-02 at 15:18 +1100, David Gibson wrote:
> > But if you pass through multiple groups, things get weird.  On q35,
> > you'd generally expect physically separate (different slot) devices to
> > appear under separate root complexes.  Whereas on pseries they'll
> > appear as siblings on a virtual bus (which makes no physical sense for
> > point-to-point PCI-E).
> 
> It's also somewhat broken if they aren't in the same iommu domain
> because the way we pass the iommu buid is via the parent node, so
> a given iommu domain must reside below a common parent and not share
> it.

Uh.. I don't entirely follow you.  From the host point of view there
are multiple iommu groups (PEs), but from the guest point of view
there's only one.  On the guest side iommu granularity is always
per-vPHB.

> > I suppose we could try treating all devices on pseries as though they
> > were chipset builtin devices on q35, which will appear on the root
> > PCI-E bus without root complex.  But I suspect that's likely to cause
> > trouble with hotplug, and it will certainly need different address
> > allocation from libvirt.
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-01 Thread Benjamin Herrenschmidt
On Fri, 2016-12-02 at 15:18 +1100, David Gibson wrote:
> But if you pass through multiple groups, things get weird.  On q35,
> you'd generally expect physically separate (different slot) devices to
> appear under separate root complexes.  Whereas on pseries they'll
> appear as siblings on a virtual bus (which makes no physical sense for
> point-to-point PCI-E).

It's also somewhat broken if they aren't in the same iommu domain
because the way we pass the iommu buid is via the parent node, so
a given iommu domain must reside below a common parent and not share
it.

> I suppose we could try treating all devices on pseries as though they
> were chipset builtin devices on q35, which will appear on the root
> PCI-E bus without root complex.  But I suspect that's likely to cause
> trouble with hotplug, and it will certainly need different address
> allocation from libvirt.



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-01 Thread David Gibson
On Fri, Nov 25, 2016 at 03:36:25PM +0100, Andrea Bolognani wrote:
> On Wed, 2016-11-23 at 16:02 +1100, David Gibson wrote:
> > > > The change from OHCI to XHCI only affected the *default* USB
> > > > controller, which libvirt tries its best not to use anyway:
> > > > instead, it will prefer to use '-M ...,usb=off' along with
> > > > '-device ...' and set both the controller model and its PCI
> > > > address explicitly, partially to shield its users from such
> > > > changes in QEMU.
> > > 
> > > Ok. Always likes this approach really. We should start moving to this
> > > direction with PHB - stop adding the default PHB at all when -nodefaults 
> > > is
> > > passed (or -machine pseries,pci=off ?) and let libvirt manage PHBs itself
> > > (and provide another spapr-phb type like spapr-pcie-host-bridge or add a
> > > "pcie_root_bus_type" property to the existing PHB type).
> > > 
> > > What will be wrong with this approach?
> > 
> > Hm, that's a good point.  If were removing the default PHB entirely,
> > that I would consider a possible case for a new machine type.  Putting
> > construction of the PHBs into libvirt's hand could make life simpler
> > there.  Although it would make it a bit of a pain for people starting
> > qemu by hand.
> > 
> > I think this option is worth some thought.
> 
> Note that libvirt always runs QEMU with -nodefaults, so we
> could just remove the default PHB if that flag is present,
> and leave it in if that's not the case.
> 
> The idea itself sounds good, but of course it will require
> more work from the libvirt side than simply making the PCIe
> machine type behave like q35 and mach-virt.

Yeah, but the latter basically just won't work.

> Moreover, we already have an RFE for supporting additional
> PHBs, we could solve both issues in one fell swoop and have
> the libvirt guest XML be a more faithful representation of
> the actual virtual hardware, which is always a win in my
> book.

Right.  And the general recomendation for PAPR guests is that each
passed through device (or rather, each passed through iommu group)
have a separate virtual PHB on the guest.  With this rework libvirt
could switch over to implementing that recommendation.

> That will be even more important if it turns out that the
> rules for PCIe device assignment (eg. what device/controller
> can be plugged into which slot) are different for PCIe PHBs
> than they are for q35/mach-virt PCIe Root Bus. I've asked
> for clarifications about this elsewhere in the thread.
> 
> -- 
> Andrea Bolognani / Red Hat / Virtualization
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-01 Thread David Gibson
On Fri, Nov 25, 2016 at 02:46:21PM +0100, Andrea Bolognani wrote:
> On Wed, 2016-11-23 at 16:00 +1100, David Gibson wrote:
> > > Existing libvirt versions assume that pseries guests have
> > > a legacy PCI root bus, and will base their PCI address
> > > allocation / PCI topology decisions on that fact: they
> > > will, for example, use legacy PCI bridges.
> > 
> > Um.. yeah.. trouble is libvirt's PCI-E address allocation probably
> > won't work for spapr PCI-E either, because of the weird PCI-E without
> > root complex presentation we get in PAPR.
> 
> So, would the PCIe Root Bus in a pseries guest behave
> differently than the one in a q35 or mach-virt guest?

Yes.  I had a long discussion with BenH and got a somewhat better idea
about this.

If only a single host PE (== iommu group) is passed through and there
are no emulated devices, the difference isn't too bad: basically on
pseries you'll see the subtree that would be below the root complex on
q35.

But if you pass through multiple groups, things get weird.  On q35,
you'd generally expect physically separate (different slot) devices to
appear under separate root complexes.  Whereas on pseries they'll
appear as siblings on a virtual bus (which makes no physical sense for
point-to-point PCI-E).

I suppose we could try treating all devices on pseries as though they
were chipset builtin devices on q35, which will appear on the root
PCI-E bus without root complex.  But I suspect that's likely to cause
trouble with hotplug, and it will certainly need different address
allocation from libvirt.

> Does it have a different number of slots, do we have to
> plug different controllers into them, ...?
> 
> Regardless of how we decide to move forward with the
> PCIe-enabled pseries machine type, libvirt will have to
> know about this so it can behave appropriately.

So there are kind of two extremes of how to address this.  There are a
variety of options in between, but I suspect they're going to be even
more muddled and hideous than the extremes.

1) Give up.  You said there's already a flag that says a PCI-E bus is
able to accept vanilla-PCI devices.  We add a hack flag that says a
vanilla-PCI bus is able to accept PCI-E devices.  We keep address
allocation as it is now - the pseries topology really does resemble
vanilla-PCI much better than it does PCI-E. But, we allow PCI-E
devices, and PAPR has mechanisms for accessing the extended config
space.  PCI-E standard hotplug and error reporting will never work,
but PAPR provides its own mechanisms for those, so that should be ok.

2) Start exposing the PCI-E heirarchy for pseries guests much more
like q35, root complexes and all.  It's not clear that PAPR actually
*forbids* exposing the root complex, it just doesn't require it and
that's not what PowerVM does.  But.. there are big questions about
whether existing guests will cope with this or not.  When you start
adding in multiple passed through devices and particularly virtual
functions as well, things could get very ugly - we might need to
construct multiple emulated virtual root complexes or other messes.

In the short to medium term, I'm thinking option (1) seems pretty
compelling.

> > > > I believe after we introduced the very first
> > > > pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.
> > > 
> > > Isn't i440fx still being updated despite the fact that q35
> > > exists? Granted, there are a lot more differences between
> > > those two machine types than just the root bus type.
> > 
> > Right, there are heaps of differences between i440fx and q35, and
> > reasons to keep both updated.  For pseries we have neither the impetus
> > nor the resources to maintain two different machine type variant,
> > where the only difference is between legacy PCI and weirdly presented
> > PCI-E.
> 
> Calling the PCIe machine type either pseries-2.8 or
> pseries-pcie-2.8 would result in the very same amount of
> work, and in both cases it would be understood that the
> legacy PCI machine type is no longer going to be updated,
> but can still be used to run existing guests.

So, I'm not sure if the idea of a new machine type has legs or not,
but let's think it through a bit further.  Suppose we have a new
machine type, let's call it 'papr'.  I'm thinking it would be (at
least with -nodefaults) basically a super-minimal version of pseries:
so each PHB would have to be explicitly created, the VIO bridge would
have to be explicitly created, likewise the NVRAM.  Not sure about the
"devices" which really represent firmware features - the RTC, RNG,
hypervisor event source and so forth.

Might have some advantages.  Then again, it doesn't really solve the
specific problem here.  It means libvirt (or the user) has to
explicitly choose a PCI or PCI-E PHB to put things on, but libvirt's
PCI-E address allocation will still be wrong in all probability.

Guh.


As an aside, here's a RANT.

libvirt address allocation.  Seriously, W. T. F!

libvirt insists on owning address 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-25 Thread Andrea Bolognani
On Wed, 2016-11-23 at 16:02 +1100, David Gibson wrote:
> > > The change from OHCI to XHCI only affected the *default* USB
> > > controller, which libvirt tries its best not to use anyway:
> > > instead, it will prefer to use '-M ...,usb=off' along with
> > > '-device ...' and set both the controller model and its PCI
> > > address explicitly, partially to shield its users from such
> > > changes in QEMU.
> > 
> > Ok. Always likes this approach really. We should start moving to this
> > direction with PHB - stop adding the default PHB at all when -nodefaults is
> > passed (or -machine pseries,pci=off ?) and let libvirt manage PHBs itself
> > (and provide another spapr-phb type like spapr-pcie-host-bridge or add a
> > "pcie_root_bus_type" property to the existing PHB type).
> > 
> > What will be wrong with this approach?
> 
> Hm, that's a good point.  If were removing the default PHB entirely,
> that I would consider a possible case for a new machine type.  Putting
> construction of the PHBs into libvirt's hand could make life simpler
> there.  Although it would make it a bit of a pain for people starting
> qemu by hand.
> 
> I think this option is worth some thought.

Note that libvirt always runs QEMU with -nodefaults, so we
could just remove the default PHB if that flag is present,
and leave it in if that's not the case.

The idea itself sounds good, but of course it will require
more work from the libvirt side than simply making the PCIe
machine type behave like q35 and mach-virt.

Moreover, we already have an RFE for supporting additional
PHBs, we could solve both issues in one fell swoop and have
the libvirt guest XML be a more faithful representation of
the actual virtual hardware, which is always a win in my
book.

That will be even more important if it turns out that the
rules for PCIe device assignment (eg. what device/controller
can be plugged into which slot) are different for PCIe PHBs
than they are for q35/mach-virt PCIe Root Bus. I've asked
for clarifications about this elsewhere in the thread.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-25 Thread Andrea Bolognani
On Wed, 2016-11-23 at 16:00 +1100, David Gibson wrote:
> > Existing libvirt versions assume that pseries guests have
> > a legacy PCI root bus, and will base their PCI address
> > allocation / PCI topology decisions on that fact: they
> > will, for example, use legacy PCI bridges.
> 
> Um.. yeah.. trouble is libvirt's PCI-E address allocation probably
> won't work for spapr PCI-E either, because of the weird PCI-E without
> root complex presentation we get in PAPR.

So, would the PCIe Root Bus in a pseries guest behave
differently than the one in a q35 or mach-virt guest?

Does it have a different number of slots, do we have to
plug different controllers into them, ...?

Regardless of how we decide to move forward with the
PCIe-enabled pseries machine type, libvirt will have to
know about this so it can behave appropriately.

> > > I believe after we introduced the very first
> > > pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.
> > 
> > Isn't i440fx still being updated despite the fact that q35
> > exists? Granted, there are a lot more differences between
> > those two machine types than just the root bus type.
> 
> Right, there are heaps of differences between i440fx and q35, and
> reasons to keep both updated.  For pseries we have neither the impetus
> nor the resources to maintain two different machine type variant,
> where the only difference is between legacy PCI and weirdly presented
> PCI-E.

Calling the PCIe machine type either pseries-2.8 or
pseries-pcie-2.8 would result in the very same amount of
work, and in both cases it would be understood that the
legacy PCI machine type is no longer going to be updated,
but can still be used to run existing guests.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-22 Thread David Gibson
On Tue, Nov 22, 2016 at 01:26:49PM +1100, Alexey Kardashevskiy wrote:
> On 22/11/16 00:08, Andrea Bolognani wrote:
> > On Mon, 2016-11-21 at 13:12 +1100, Alexey Kardashevskiy wrote:
> > 1) switch to PCI Express on newer machine types, and
> >expose some sort of capability through QMP so that
> >libvirt can know about the switch
> >  
> > [...]
> > Option 1) would break horribly with existing libvirt
> > versions, and so would Option 2) if we default to using
>   
>  How exactly 1) will break libvirt? Migrating from pseries-2.7 to
>  pseries-2.8 does not work anyway, and machines are allowed to behave
>  different from version to version, what distinct difference will using
>  "pseries-pcie-X.Y" make?
> >>>  
> >>> Existing libvirt versions assume that pseries guests have
> >>  
> >> If libvirt is using just "pseries" (without a version), then having a
> >> virtual PCIe-PCI bridge (and "pci.0" always available by default) will do 
> >> it.
> > 
> > Please don't. Any device that is included in the guest
> > by default and can't be turned off makes libvirt's life
> > significantly more difficult (see below).
> >
> >> If libvirt is using a specific version of pseries, then it already knows
> >> that <=2.7 has pci.0 as a root, pcie.0 otherwise. libvirt has a knowledge
> >> what QEMU version has what, right?
> > 
> > It doesn't yet, that's the point :)
> > 
> > We *could* add such knowledge to libvirt[1], but *existing*
> > libvirt versions would still not know about it, which means
> > that upgrading QEMU withough upgrading libvirt will result
> > in failure to create new guests.
> >
> > 
> >> In what scenario will an additional machine type help?
> > 
> > Because then libvirt could learn that
> > 
> >   pseries-x.y   <->  pci.0
> >   pseries-pcie-x.y  <->  pcie.0
> > 
> > the same way it already knows that
> > 
> >   pc-i440fx-x.y <->  pci.0
> >   pc-q35-x.y<->  pcie.0
> > 
> > and choosing between one or the other would be, I think,
> > much easier for the user as well.
> >
> >>> a legacy PCI root bus, and will base their PCI address
> >>> allocation / PCI topology decisions on that fact: they
> >>> will, for example, use legacy PCI bridges.
> >>>  
> >>> So if you used a new QEMU binary with a libvirt version
> >>> that doesn't know about the change, new guests would end up
> >>> using the wrong controllers. Existing guests would not be
> >>> affected as they would stick with the older machine types,
> >>> of course.
> >>>  
>  I believe after we introduced the very first
>  pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.
> >>>  
> >>> Isn't i440fx still being updated despite the fact that q35
> >>> exists? Granted, there are a lot more differences between
> >>> those two machine types than just the root bus type.
> >>  
> >> I do not know about i440<->q35 but in pseries the difference is going to be
> >> very simple.
> >>  
> >> For example, we did not change the machine type when we switched from
> >> default OHCI to XHCI, switching from PCI to PCIe does not sound like we
> >> need a whole new machine type for this either.
> > 
> > The change from OHCI to XHCI only affected the *default* USB
> > controller, which libvirt tries its best not to use anyway:
> > instead, it will prefer to use '-M ...,usb=off' along with
> > '-device ...' and set both the controller model and its PCI
> > address explicitly, partially to shield its users from such
> > changes in QEMU.
> 
> 
> Ok. Always likes this approach really. We should start moving to this
> direction with PHB - stop adding the default PHB at all when -nodefaults is
> passed (or -machine pseries,pci=off ?) and let libvirt manage PHBs itself
> (and provide another spapr-phb type like spapr-pcie-host-bridge or add a
> "pcie_root_bus_type" property to the existing PHB type).
> 
> What will be wrong with this approach?

Hm, that's a good point.  If were removing the default PHB entirely,
that I would consider a possible case for a new machine type.  Putting
construction of the PHBs into libvirt's hand could make life simpler
there.  Although it would make it a bit of a pain for people starting
qemu by hand.

I think this option is worth some thought.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-22 Thread David Gibson
On Fri, Nov 18, 2016 at 09:17:22AM +0100, Andrea Bolognani wrote:
> On Thu, 2016-11-17 at 13:02 +1100, Alexey Kardashevskiy wrote:
> > > That said, considering that a big part of the PCI address
> > > allocation logic is based off whether the specific machine
> > > type exposes a legay PCI Root Bus or a PCI Express Root Bus,
> > > libvirt will need a way to be able to tell which one is which.
> > > 
> > > Version checks are pretty much out of the question, as they
> > > fail as soon as downstream releases enter the picture. A
> > > few ways we could deal with the situation:
> > > 
> > >   1) switch to PCI Express on newer machine types, and
> > >  expose some sort of capability through QMP so that
> > >  libvirt can know about the switch
> 
> [...]
> > > Option 1) would break horribly with existing libvirt
> > > versions, and so would Option 2) if we default to using
> > 
> > How exactly 1) will break libvirt? Migrating from pseries-2.7 to
> > pseries-2.8 does not work anyway, and machines are allowed to behave
> > different from version to version, what distinct difference will using
> > "pseries-pcie-X.Y" make?
> 
> Existing libvirt versions assume that pseries guests have
> a legacy PCI root bus, and will base their PCI address
> allocation / PCI topology decisions on that fact: they
> will, for example, use legacy PCI bridges.

Um.. yeah.. trouble is libvirt's PCI-E address allocation probably
won't work for spapr PCI-E either, because of the weird PCI-E without
root complex presentation we get in PAPR.

> So if you used a new QEMU binary with a libvirt version
> that doesn't know about the change, new guests would end up
> using the wrong controllers. Existing guests would not be
> affected as they would stick with the older machine types,
> of course.
> 
> > I believe after we introduced the very first
> > pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.
> 
> Isn't i440fx still being updated despite the fact that q35
> exists? Granted, there are a lot more differences between
> those two machine types than just the root bus type.

Right, there are heaps of differences between i440fx and q35, and
reasons to keep both updated.  For pseries we have neither the impetus
nor the resources to maintain two different machine type variant,
where the only difference is between legacy PCI and weirdly presented
PCI-E.

> Even if no newer pseries-x.y were to be added after
> introducing pseries-pcie, you could still easily create
> guests that use either root bus, so no loss in functionality.
> 
> -- 
> Andrea Bolognani / Red Hat / Virtualization
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-21 Thread Alexey Kardashevskiy
On 22/11/16 00:08, Andrea Bolognani wrote:
> On Mon, 2016-11-21 at 13:12 +1100, Alexey Kardashevskiy wrote:
> 1) switch to PCI Express on newer machine types, and
>expose some sort of capability through QMP so that
>libvirt can know about the switch
>  
> [...]
> Option 1) would break horribly with existing libvirt
> versions, and so would Option 2) if we default to using
  
 How exactly 1) will break libvirt? Migrating from pseries-2.7 to
 pseries-2.8 does not work anyway, and machines are allowed to behave
 different from version to version, what distinct difference will using
 "pseries-pcie-X.Y" make?
>>>  
>>> Existing libvirt versions assume that pseries guests have
>>  
>> If libvirt is using just "pseries" (without a version), then having a
>> virtual PCIe-PCI bridge (and "pci.0" always available by default) will do it.
> 
> Please don't. Any device that is included in the guest
> by default and can't be turned off makes libvirt's life
> significantly more difficult (see below).
>
>> If libvirt is using a specific version of pseries, then it already knows
>> that <=2.7 has pci.0 as a root, pcie.0 otherwise. libvirt has a knowledge
>> what QEMU version has what, right?
> 
> It doesn't yet, that's the point :)
> 
> We *could* add such knowledge to libvirt[1], but *existing*
> libvirt versions would still not know about it, which means
> that upgrading QEMU withough upgrading libvirt will result
> in failure to create new guests.
>
> 
>> In what scenario will an additional machine type help?
> 
> Because then libvirt could learn that
> 
>   pseries-x.y   <->  pci.0
>   pseries-pcie-x.y  <->  pcie.0
> 
> the same way it already knows that
> 
>   pc-i440fx-x.y <->  pci.0
>   pc-q35-x.y<->  pcie.0
> 
> and choosing between one or the other would be, I think,
> much easier for the user as well.
>
>>> a legacy PCI root bus, and will base their PCI address
>>> allocation / PCI topology decisions on that fact: they
>>> will, for example, use legacy PCI bridges.
>>>  
>>> So if you used a new QEMU binary with a libvirt version
>>> that doesn't know about the change, new guests would end up
>>> using the wrong controllers. Existing guests would not be
>>> affected as they would stick with the older machine types,
>>> of course.
>>>  
 I believe after we introduced the very first
 pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.
>>>  
>>> Isn't i440fx still being updated despite the fact that q35
>>> exists? Granted, there are a lot more differences between
>>> those two machine types than just the root bus type.
>>  
>> I do not know about i440<->q35 but in pseries the difference is going to be
>> very simple.
>>  
>> For example, we did not change the machine type when we switched from
>> default OHCI to XHCI, switching from PCI to PCIe does not sound like we
>> need a whole new machine type for this either.
> 
> The change from OHCI to XHCI only affected the *default* USB
> controller, which libvirt tries its best not to use anyway:
> instead, it will prefer to use '-M ...,usb=off' along with
> '-device ...' and set both the controller model and its PCI
> address explicitly, partially to shield its users from such
> changes in QEMU.


Ok. Always likes this approach really. We should start moving to this
direction with PHB - stop adding the default PHB at all when -nodefaults is
passed (or -machine pseries,pci=off ?) and let libvirt manage PHBs itself
(and provide another spapr-phb type like spapr-pcie-host-bridge or add a
"pcie_root_bus_type" property to the existing PHB type).

What will be wrong with this approach?



> 
> Let's not forget that libvirt is a management layer, and as
> such it needs to have as much control as possible over the
> virtual hardware presented to the guest, mostly for guest ABI
> compatibility purposes. Defaulting to XHCI instead of OHCI,
> same as pretty much all other defaults, is good for people who
> run QEMU directly, but libvirt needs to be able to control
> such knobs itself in order to effectively perform the task it
> was designed for.
> 
> Moreover, we're talking about a more fundamental change here:
> the PCI Root Bus is not just any random device, it's the one
> fundation upon which the entire PCI hierarchy is built. Based
> on whether the machine exposes a PCI Express Root Bus or a
> legacy PCI Express Root Bus, libvirt will create very
> different PCI hierarchies, eg. using several ioh3420 devices
> instead of a single pci-bridge device.
> 
> I'm still not sure if that enough to warrant an entirely new
> machine type, but it definitely has a more far-reaching impact
> than simply flipping the default USB controller from OHCI to
> XHCI.
> 
>>> Even if no newer pseries-x.y were to be added after
>>> introducing pseries-pcie, you could still easily create
>>> guests that use either root bus, so no loss in functionality.
>>  
>> I could do this with the existing 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-21 Thread Andrea Bolognani
On Mon, 2016-11-21 at 13:12 +1100, Alexey Kardashevskiy wrote:
> > > >1) switch to PCI Express on newer machine types, and
> > > >   expose some sort of capability through QMP so that
> > > >   libvirt can know about the switch
> > > > 
> > > > [...]
> > > > Option 1) would break horribly with existing libvirt
> > > > versions, and so would Option 2) if we default to using
> > > 
> > > How exactly 1) will break libvirt? Migrating from pseries-2.7 to
> > > pseries-2.8 does not work anyway, and machines are allowed to behave
> > > different from version to version, what distinct difference will using
> > > "pseries-pcie-X.Y" make?
> > 
> > Existing libvirt versions assume that pseries guests have
> 
> If libvirt is using just "pseries" (without a version), then having a
> virtual PCIe-PCI bridge (and "pci.0" always available by default) will do it.

Please don't. Any device that is included in the guest
by default and can't be turned off makes libvirt's life
significantly more difficult (see below).

> If libvirt is using a specific version of pseries, then it already knows
> that <=2.7 has pci.0 as a root, pcie.0 otherwise. libvirt has a knowledge
> what QEMU version has what, right?

It doesn't yet, that's the point :)

We *could* add such knowledge to libvirt[1], but *existing*
libvirt versions would still not know about it, which means
that upgrading QEMU withough upgrading libvirt will result
in failure to create new guests.

> In what scenario will an additional machine type help?

Because then libvirt could learn that

  pseries-x.y   <->  pci.0
  pseries-pcie-x.y  <->  pcie.0

the same way it already knows that

  pc-i440fx-x.y <->  pci.0
  pc-q35-x.y<->  pcie.0

and choosing between one or the other would be, I think,
much easier for the user as well.

> > a legacy PCI root bus, and will base their PCI address
> > allocation / PCI topology decisions on that fact: they
> > will, for example, use legacy PCI bridges.
> > 
> > So if you used a new QEMU binary with a libvirt version
> > that doesn't know about the change, new guests would end up
> > using the wrong controllers. Existing guests would not be
> > affected as they would stick with the older machine types,
> > of course.
> > 
> > > I believe after we introduced the very first
> > > pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.
> > 
> > Isn't i440fx still being updated despite the fact that q35
> > exists? Granted, there are a lot more differences between
> > those two machine types than just the root bus type.
> 
> I do not know about i440<->q35 but in pseries the difference is going to be
> very simple.
> 
> For example, we did not change the machine type when we switched from
> default OHCI to XHCI, switching from PCI to PCIe does not sound like we
> need a whole new machine type for this either.

The change from OHCI to XHCI only affected the *default* USB
controller, which libvirt tries its best not to use anyway:
instead, it will prefer to use '-M ...,usb=off' along with
'-device ...' and set both the controller model and its PCI
address explicitly, partially to shield its users from such
changes in QEMU.

Let's not forget that libvirt is a management layer, and as
such it needs to have as much control as possible over the
virtual hardware presented to the guest, mostly for guest ABI
compatibility purposes. Defaulting to XHCI instead of OHCI,
same as pretty much all other defaults, is good for people who
run QEMU directly, but libvirt needs to be able to control
such knobs itself in order to effectively perform the task it
was designed for.

Moreover, we're talking about a more fundamental change here:
the PCI Root Bus is not just any random device, it's the one
fundation upon which the entire PCI hierarchy is built. Based
on whether the machine exposes a PCI Express Root Bus or a
legacy PCI Express Root Bus, libvirt will create very
different PCI hierarchies, eg. using several ioh3420 devices
instead of a single pci-bridge device.

I'm still not sure if that enough to warrant an entirely new
machine type, but it definitely has a more far-reaching impact
than simply flipping the default USB controller from OHCI to
XHCI.

> > Even if no newer pseries-x.y were to be added after
> > introducing pseries-pcie, you could still easily create
> > guests that use either root bus, so no loss in functionality.
> 
> I could do this with the existing pseries if the machine had a "root but
> type" property.

That was indeed one of my original proposals ;)

Just note, once again, that the default for this property
would have to be "off" or we would run into the same issues
described above.


[1] Even though, as I mentioned earlier in the thread,
performing version checks on machine types is frowned
upon as it turns into a minefield as soon as backports
and downstream machine types enter the picture... It's
much better to expose some flag through QMP for libvirt
to introspect
-- 
Andrea 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-20 Thread Alexey Kardashevskiy
On 18/11/16 19:17, Andrea Bolognani wrote:
> On Thu, 2016-11-17 at 13:02 +1100, Alexey Kardashevskiy wrote:
>>> That said, considering that a big part of the PCI address
>>> allocation logic is based off whether the specific machine
>>> type exposes a legay PCI Root Bus or a PCI Express Root Bus,
>>> libvirt will need a way to be able to tell which one is which.
>>>  
>>> Version checks are pretty much out of the question, as they
>>> fail as soon as downstream releases enter the picture. A
>>> few ways we could deal with the situation:
>>>  
>>>1) switch to PCI Express on newer machine types, and
>>>   expose some sort of capability through QMP so that
>>>   libvirt can know about the switch
> 
> [...]
>>> Option 1) would break horribly with existing libvirt
>>> versions, and so would Option 2) if we default to using
>>  
>> How exactly 1) will break libvirt? Migrating from pseries-2.7 to
>> pseries-2.8 does not work anyway, and machines are allowed to behave
>> different from version to version, what distinct difference will using
>> "pseries-pcie-X.Y" make?
> 
> Existing libvirt versions assume that pseries guests have

If libvirt is using just "pseries" (without a version), then having a
virtual PCIe-PCI bridge (and "pci.0" always available by default) will do it.

If libvirt is using a specific version of pseries, then it already knows
that <=2.7 has pci.0 as a root, pcie.0 otherwise. libvirt has a knowledge
what QEMU version has what, right?

In what scenario will an additional machine type help?

> a legacy PCI root bus, and will base their PCI address
> allocation / PCI topology decisions on that fact: they
> will, for example, use legacy PCI bridges.
> 
> So if you used a new QEMU binary with a libvirt version
> that doesn't know about the change, new guests would end up
> using the wrong controllers. Existing guests would not be
> affected as they would stick with the older machine types,
> of course.
> 
>> I believe after we introduced the very first
>> pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.
> 
> Isn't i440fx still being updated despite the fact that q35
> exists? Granted, there are a lot more differences between
> those two machine types than just the root bus type.

I do not know about i440<->q35 but in pseries the difference is going to be
very simple.

For example, we did not change the machine type when we switched from
default OHCI to XHCI, switching from PCI to PCIe does not sound like we
need a whole new machine type for this either.

> Even if no newer pseries-x.y were to be added after
> introducing pseries-pcie, you could still easily create
> guests that use either root bus, so no loss in functionality.

I could do this with the existing pseries if the machine had a "root but
type" property.


-- 
Alexey



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-18 Thread David Gibson
On Thu, Nov 17, 2016 at 01:02:57PM +1100, Alexey Kardashevskiy wrote:
> On 16/11/16 01:02, Andrea Bolognani wrote:
> > On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote:
> >> On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote:
> >>>  
> >>> On 31/10/16 13:53, David Gibson wrote:
>   
>  On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
> >  
> > On Fri, 28 Oct 2016 18:56:40 +1100
> > Alexey Kardashevskiy  wrote:
> >  
> >>  
> >> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
> >> This means that vfio-pci devices attached to it (and this is
> >> a default behaviour) hide PCIe extended capabilities as
> >> the bus does not pass a pci_bus_is_express(pdev->bus) check.
> >>  
> >> This changes adds a default PCI bus type property to sPAPR PHB
> >> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
> >> for backward compatibility as a bus type is used in the bus name
> >> so the root bus name becomes "pcie.0" instead of "pci.0".
> >>  
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>  
> >> What can possibly go wrong with such change of a name?
> >> From devices prospective, I cannot see any.
> >>  
> >> libvirt might get upset as "pci.0" will not be available,
> >> will it make sense to create pcie.0 as a root bus and always
> >> add a PCIe->PCI bridge and name its bus "pci.0"?
> >>  
> >> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
> >> pci_register_bus() can do this.
> >>  
> >>  
> >> ---
> >>   hw/ppc/spapr.c  | 5 +
> >>   hw/ppc/spapr_pci.c  | 5 -
> >>   include/hw/pci-host/spapr.h | 1 +
> >>   3 files changed, 10 insertions(+), 1 deletion(-)
> >>  
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 0b3820b..a268511 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
> >>   .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> >>   .property = "mem64_win_size",   \
> >>   .value= "0",\
> >> +},  \
> >> +{   \
> >> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> >> +.property = "root_bus_type",\
> >> +.value= TYPE_PCI_BUS,   \
> >>   },
> >>   
> >>   static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t 
> >> index,
> >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >> index 7cde30e..2fa1f22 100644
> >> --- a/hw/ppc/spapr_pci.c
> >> +++ b/hw/ppc/spapr_pci.c
> >> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, 
> >> Error **errp)
> >>   bus = pci_register_bus(dev, NULL,
> >>  pci_spapr_set_irq, pci_spapr_map_irq, 
> >> sphb,
> >>  >memspace, >iospace,
> >> -   PCI_DEVFN(0, 0), PCI_NUM_PINS, 
> >> TYPE_PCI_BUS);
> >> +   PCI_DEVFN(0, 0), PCI_NUM_PINS,
> >> +   sphb->root_bus_type ? sphb->root_bus_type :
> >> +   TYPE_PCIE_BUS);
> >  
> > Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
> > TYPE_PCI_BUS ?
>   
>  Yes, I think so.  In fact, I think it would be better to make the
>  property a boolean that just selects PCI-E, rather than this which
>  exposes qemu (semi-)internal type names on the comamnd line.
> >>>  
> >>> Sure, a "pcie-root" boolean property should do.
> >>>  
> >>> However this is not my main concern, I rather wonder if we have to have
> >>> pci.0 when we pick PCIe for the root.
> >>  
> >> Right.
> >>  
> >> I've added Andrea Bologna to the CC list to get a libvirt perspective.
> > 
> > Thanks for doing so: changes such as this one can have quite
> > an impact on the upper layers of the stack, so the earliest
> > libvirt is involved in the discussion the better.
> > 
> > I'm going to go a step further and cross-post to libvir-list
> > in order to give other libvirt contributors a chance to chime
> > in too.
> > 
> >> Andrea,
> >>  
> >> To summarise the issue here:
> >> * As I've said before the PAPR spec kinda-sorta abstracts the
> >>   difference between vanilla PCI and PCI-E
> >> * However, because within qemu we're declaring the bus as PCI that
> >>   means some PCI-E devices aren't working right
> >> * In particular it means that PCI-E extended config space isn't
> >>   available
> >>  
> >> The proposal is to change (on newer machine types) the spapr PHB code
> >> to declare a PCI-E bus 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-18 Thread Andrea Bolognani
On Thu, 2016-11-17 at 13:02 +1100, Alexey Kardashevskiy wrote:
> > That said, considering that a big part of the PCI address
> > allocation logic is based off whether the specific machine
> > type exposes a legay PCI Root Bus or a PCI Express Root Bus,
> > libvirt will need a way to be able to tell which one is which.
> > 
> > Version checks are pretty much out of the question, as they
> > fail as soon as downstream releases enter the picture. A
> > few ways we could deal with the situation:
> > 
> >   1) switch to PCI Express on newer machine types, and
> >  expose some sort of capability through QMP so that
> >  libvirt can know about the switch

[...]
> > Option 1) would break horribly with existing libvirt
> > versions, and so would Option 2) if we default to using
> 
> How exactly 1) will break libvirt? Migrating from pseries-2.7 to
> pseries-2.8 does not work anyway, and machines are allowed to behave
> different from version to version, what distinct difference will using
> "pseries-pcie-X.Y" make?

Existing libvirt versions assume that pseries guests have
a legacy PCI root bus, and will base their PCI address
allocation / PCI topology decisions on that fact: they
will, for example, use legacy PCI bridges.

So if you used a new QEMU binary with a libvirt version
that doesn't know about the change, new guests would end up
using the wrong controllers. Existing guests would not be
affected as they would stick with the older machine types,
of course.

> I believe after we introduced the very first
> pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.

Isn't i440fx still being updated despite the fact that q35
exists? Granted, there are a lot more differences between
those two machine types than just the root bus type.

Even if no newer pseries-x.y were to be added after
introducing pseries-pcie, you could still easily create
guests that use either root bus, so no loss in functionality.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-16 Thread Alexey Kardashevskiy
On 16/11/16 01:02, Andrea Bolognani wrote:
> On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote:
>> On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote:
>>>  
>>> On 31/10/16 13:53, David Gibson wrote:
  
 On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
>  
> On Fri, 28 Oct 2016 18:56:40 +1100
> Alexey Kardashevskiy  wrote:
>  
>>  
>> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
>> This means that vfio-pci devices attached to it (and this is
>> a default behaviour) hide PCIe extended capabilities as
>> the bus does not pass a pci_bus_is_express(pdev->bus) check.
>>  
>> This changes adds a default PCI bus type property to sPAPR PHB
>> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
>> for backward compatibility as a bus type is used in the bus name
>> so the root bus name becomes "pcie.0" instead of "pci.0".
>>  
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  
>> What can possibly go wrong with such change of a name?
>> From devices prospective, I cannot see any.
>>  
>> libvirt might get upset as "pci.0" will not be available,
>> will it make sense to create pcie.0 as a root bus and always
>> add a PCIe->PCI bridge and name its bus "pci.0"?
>>  
>> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
>> pci_register_bus() can do this.
>>  
>>  
>> ---
>>   hw/ppc/spapr.c  | 5 +
>>   hw/ppc/spapr_pci.c  | 5 -
>>   include/hw/pci-host/spapr.h | 1 +
>>   3 files changed, 10 insertions(+), 1 deletion(-)
>>  
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 0b3820b..a268511 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
>>   .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
>>   .property = "mem64_win_size",   \
>>   .value= "0",\
>> +},  \
>> +{   \
>> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
>> +.property = "root_bus_type",\
>> +.value= TYPE_PCI_BUS,   \
>>   },
>>   
>>   static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index 7cde30e..2fa1f22 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, 
>> Error **errp)
>>   bus = pci_register_bus(dev, NULL,
>>  pci_spapr_set_irq, pci_spapr_map_irq, sphb,
>>  >memspace, >iospace,
>> -   PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>> +   PCI_DEVFN(0, 0), PCI_NUM_PINS,
>> +   sphb->root_bus_type ? sphb->root_bus_type :
>> +   TYPE_PCIE_BUS);
>  
> Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
> TYPE_PCI_BUS ?
  
 Yes, I think so.  In fact, I think it would be better to make the
 property a boolean that just selects PCI-E, rather than this which
 exposes qemu (semi-)internal type names on the comamnd line.
>>>  
>>> Sure, a "pcie-root" boolean property should do.
>>>  
>>> However this is not my main concern, I rather wonder if we have to have
>>> pci.0 when we pick PCIe for the root.
>>  
>> Right.
>>  
>> I've added Andrea Bologna to the CC list to get a libvirt perspective.
> 
> Thanks for doing so: changes such as this one can have quite
> an impact on the upper layers of the stack, so the earliest
> libvirt is involved in the discussion the better.
> 
> I'm going to go a step further and cross-post to libvir-list
> in order to give other libvirt contributors a chance to chime
> in too.
> 
>> Andrea,
>>  
>> To summarise the issue here:
>> * As I've said before the PAPR spec kinda-sorta abstracts the
>>   difference between vanilla PCI and PCI-E
>> * However, because within qemu we're declaring the bus as PCI that
>>   means some PCI-E devices aren't working right
>> * In particular it means that PCI-E extended config space isn't
>>   available
>>  
>> The proposal is to change (on newer machine types) the spapr PHB code
>> to declare a PCI-E bus instead.  AIUI this still won't make the root
>> complex guest visible (which it's not supposed to be under PAPR), and
>> the guest shouldn't see a difference in most cases - it will still see
>> the PAPR abstracted PCIish bus, but will now be able to get extended
>> config space.
>>  
>> The possible problem from 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-15 Thread Andrea Bolognani
On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote:
> On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote:
> > 
> > On 31/10/16 13:53, David Gibson wrote:
> > > 
> > > On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
> > > > 
> > > > On Fri, 28 Oct 2016 18:56:40 +1100
> > > > Alexey Kardashevskiy  wrote:
> > > > 
> > > > > 
> > > > > At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
> > > > > This means that vfio-pci devices attached to it (and this is
> > > > > a default behaviour) hide PCIe extended capabilities as
> > > > > the bus does not pass a pci_bus_is_express(pdev->bus) check.
> > > > > 
> > > > > This changes adds a default PCI bus type property to sPAPR PHB
> > > > > and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
> > > > > for backward compatibility as a bus type is used in the bus name
> > > > > so the root bus name becomes "pcie.0" instead of "pci.0".
> > > > > 
> > > > > Signed-off-by: Alexey Kardashevskiy 
> > > > > ---
> > > > > 
> > > > > What can possibly go wrong with such change of a name?
> > > > > From devices prospective, I cannot see any.
> > > > > 
> > > > > libvirt might get upset as "pci.0" will not be available,
> > > > > will it make sense to create pcie.0 as a root bus and always
> > > > > add a PCIe->PCI bridge and name its bus "pci.0"?
> > > > > 
> > > > > Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
> > > > > pci_register_bus() can do this.
> > > > > 
> > > > > 
> > > > > ---
> > > > >  hw/ppc/spapr.c  | 5 +
> > > > >  hw/ppc/spapr_pci.c  | 5 -
> > > > >  include/hw/pci-host/spapr.h | 1 +
> > > > >  3 files changed, 10 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index 0b3820b..a268511 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
> > > > >  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> > > > >  .property = "mem64_win_size",   \
> > > > >  .value= "0",\
> > > > > +},  \
> > > > > +{   \
> > > > > +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> > > > > +.property = "root_bus_type",\
> > > > > +.value= TYPE_PCI_BUS,   \
> > > > >  },
> > > > >  
> > > > >  static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t 
> > > > >index,
> > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > > > index 7cde30e..2fa1f22 100644
> > > > > --- a/hw/ppc/spapr_pci.c
> > > > > +++ b/hw/ppc/spapr_pci.c
> > > > > @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, 
> > > > > Error **errp)
> > > > >  bus = pci_register_bus(dev, NULL,
> > > > > pci_spapr_set_irq, pci_spapr_map_irq, 
> > > > >sphb,
> > > > > >memspace, >iospace,
> > > > > -   PCI_DEVFN(0, 0), PCI_NUM_PINS, 
> > > > > TYPE_PCI_BUS);
> > > > > +   PCI_DEVFN(0, 0), PCI_NUM_PINS,
> > > > > +   sphb->root_bus_type ? sphb->root_bus_type 
> > > > > :
> > > > > +   TYPE_PCIE_BUS);
> > > > 
> > > > Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
> > > > TYPE_PCI_BUS ?
> > > 
> > > Yes, I think so.  In fact, I think it would be better to make the
> > > property a boolean that just selects PCI-E, rather than this which
> > > exposes qemu (semi-)internal type names on the comamnd line.
> > 
> > Sure, a "pcie-root" boolean property should do.
> > 
> > However this is not my main concern, I rather wonder if we have to have
> > pci.0 when we pick PCIe for the root.
> 
> Right.
> 
> I've added Andrea Bologna to the CC list to get a libvirt perspective.

Thanks for doing so: changes such as this one can have quite
an impact on the upper layers of the stack, so the earliest
libvirt is involved in the discussion the better.

I'm going to go a step further and cross-post to libvir-list
in order to give other libvirt contributors a chance to chime
in too.

> Andrea,
> 
> To summarise the issue here:
>* As I've said before the PAPR spec kinda-sorta abstracts the
>  difference between vanilla PCI and PCI-E
>* However, because within qemu we're declaring the bus as PCI that
>  means some PCI-E devices aren't working right
>* In particular it means that PCI-E extended config space isn't
>  available
> 
> The proposal is to change (on newer machine types) the spapr PHB code
> to declare a PCI-E bus instead.  AIUI this still won't make the root
> complex guest visible (which it's not supposed to be under PAPR), and
> the guest shouldn't see a difference in most cases - it will still see
> 

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-10-31 Thread David Gibson
On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote:
> On 31/10/16 13:53, David Gibson wrote:
> > On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
> >> On Fri, 28 Oct 2016 18:56:40 +1100
> >> Alexey Kardashevskiy  wrote:
> >>
> >>> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
> >>> This means that vfio-pci devices attached to it (and this is
> >>> a default behaviour) hide PCIe extended capabilities as
> >>> the bus does not pass a pci_bus_is_express(pdev->bus) check.
> >>>
> >>> This changes adds a default PCI bus type property to sPAPR PHB
> >>> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
> >>> for backward compatibility as a bus type is used in the bus name
> >>> so the root bus name becomes "pcie.0" instead of "pci.0".
> >>>
> >>> Signed-off-by: Alexey Kardashevskiy 
> >>> ---
> >>>
> >>> What can possibly go wrong with such change of a name?
> >>> From devices prospective, I cannot see any.
> >>>
> >>> libvirt might get upset as "pci.0" will not be available,
> >>> will it make sense to create pcie.0 as a root bus and always
> >>> add a PCIe->PCI bridge and name its bus "pci.0"?
> >>>
> >>> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
> >>> pci_register_bus() can do this.
> >>>
> >>>
> >>> ---
> >>>  hw/ppc/spapr.c  | 5 +
> >>>  hw/ppc/spapr_pci.c  | 5 -
> >>>  include/hw/pci-host/spapr.h | 1 +
> >>>  3 files changed, 10 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>> index 0b3820b..a268511 100644
> >>> --- a/hw/ppc/spapr.c
> >>> +++ b/hw/ppc/spapr.c
> >>> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
> >>>  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> >>>  .property = "mem64_win_size",   \
> >>>  .value= "0",\
> >>> +},  \
> >>> +{   \
> >>> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> >>> +.property = "root_bus_type",\
> >>> +.value= TYPE_PCI_BUS,   \
> >>>  },
> >>>  
> >>>  static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
> >>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >>> index 7cde30e..2fa1f22 100644
> >>> --- a/hw/ppc/spapr_pci.c
> >>> +++ b/hw/ppc/spapr_pci.c
> >>> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, 
> >>> Error **errp)
> >>>  bus = pci_register_bus(dev, NULL,
> >>> pci_spapr_set_irq, pci_spapr_map_irq, sphb,
> >>> >memspace, >iospace,
> >>> -   PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
> >>> +   PCI_DEVFN(0, 0), PCI_NUM_PINS,
> >>> +   sphb->root_bus_type ? sphb->root_bus_type :
> >>> +   TYPE_PCIE_BUS);
> >>
> >> Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
> >> TYPE_PCI_BUS ?
> > 
> > Yes, I think so.  In fact, I think it would be better to make the
> > property a boolean that just selects PCI-E, rather than this which
> > exposes qemu (semi-)internal type names on the comamnd line.
> 
> 
> Sure, a "pcie-root" boolean property should do.
> 
> However this is not my main concern, I rather wonder if we have to have
> pci.0 when we pick PCIe for the root.

Right.

I've added Andrea Bologna to the CC list to get a libvirt perspective.

Andrea,

To summarise the issue here:
   * As I've said before the PAPR spec kinda-sorta abstracts the
 difference between vanilla PCI and PCI-E
   * However, because within qemu we're declaring the bus as PCI that
 means some PCI-E devices aren't working right
   * In particular it means that PCI-E extended config space isn't
 available

The proposal is to change (on newer machine types) the spapr PHB code
to declare a PCI-E bus instead.  AIUI this still won't make the root
complex guest visible (which it's not supposed to be under PAPR), and
the guest shouldn't see a difference in most cases - it will still see
the PAPR abstracted PCIish bus, but will now be able to get extended
config space.

The possible problem from a libvirt perspective is that doing this in
the simplest way in qemu would change the name of the default bus from
pci.0 to pcie.0.  We have two suggested ways to mitigate this:
   1) Automatically create a PCI-E to PCI bridge, so that new machine
  types will have both a pcie.0 and pci.0 bus
   2) Force the name of the bus to be pci.0, even though it's treated
  as PCI-E in other ways.

We're trying to work out exactly what will and won't cause trouble for
libvirt.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
  

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-10-30 Thread Alexey Kardashevskiy
On 31/10/16 13:53, David Gibson wrote:
> On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
>> On Fri, 28 Oct 2016 18:56:40 +1100
>> Alexey Kardashevskiy  wrote:
>>
>>> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
>>> This means that vfio-pci devices attached to it (and this is
>>> a default behaviour) hide PCIe extended capabilities as
>>> the bus does not pass a pci_bus_is_express(pdev->bus) check.
>>>
>>> This changes adds a default PCI bus type property to sPAPR PHB
>>> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
>>> for backward compatibility as a bus type is used in the bus name
>>> so the root bus name becomes "pcie.0" instead of "pci.0".
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>
>>> What can possibly go wrong with such change of a name?
>>> From devices prospective, I cannot see any.
>>>
>>> libvirt might get upset as "pci.0" will not be available,
>>> will it make sense to create pcie.0 as a root bus and always
>>> add a PCIe->PCI bridge and name its bus "pci.0"?
>>>
>>> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
>>> pci_register_bus() can do this.
>>>
>>>
>>> ---
>>>  hw/ppc/spapr.c  | 5 +
>>>  hw/ppc/spapr_pci.c  | 5 -
>>>  include/hw/pci-host/spapr.h | 1 +
>>>  3 files changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 0b3820b..a268511 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
>>>  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
>>>  .property = "mem64_win_size",   \
>>>  .value= "0",\
>>> +},  \
>>> +{   \
>>> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
>>> +.property = "root_bus_type",\
>>> +.value= TYPE_PCI_BUS,   \
>>>  },
>>>  
>>>  static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>> index 7cde30e..2fa1f22 100644
>>> --- a/hw/ppc/spapr_pci.c
>>> +++ b/hw/ppc/spapr_pci.c
>>> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, Error 
>>> **errp)
>>>  bus = pci_register_bus(dev, NULL,
>>> pci_spapr_set_irq, pci_spapr_map_irq, sphb,
>>> >memspace, >iospace,
>>> -   PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>>> +   PCI_DEVFN(0, 0), PCI_NUM_PINS,
>>> +   sphb->root_bus_type ? sphb->root_bus_type :
>>> +   TYPE_PCIE_BUS);
>>
>> Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
>> TYPE_PCI_BUS ?
> 
> Yes, I think so.  In fact, I think it would be better to make the
> property a boolean that just selects PCI-E, rather than this which
> exposes qemu (semi-)internal type names on the comamnd line.


Sure, a "pcie-root" boolean property should do.

However this is not my main concern, I rather wonder if we have to have
pci.0 when we pick PCIe for the root.



>> Otherwise, if we pass something like:
>>
>> -global spapr-pci-host-bridge.root_bus_type=pcie
>>
>> we get the following not really explicit error:
>>
>> ERROR:/home/greg/Work/qemu/qemu-spapr/qom/object.c:474:object_new_with_type: 
>> assertion failed: (type != NULL)
>>
>>>  phb->bus = bus;
>>>  qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
>>>  
>>> @@ -1590,6 +1592,7 @@ static Property spapr_phb_properties[] = {
>>>  DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask,
>>> (1ULL << 12) | (1ULL << 16)),
>>>  DEFINE_PROP_UINT32("numa_node", sPAPRPHBState, numa_node, -1),
>>> +DEFINE_PROP_STRING("root_bus_type", sPAPRPHBState, root_bus_type),
>>>  DEFINE_PROP_END_OF_LIST(),
>>>  };
>>>  
>>> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
>>> index b92c1b5..a4ee873 100644
>>> --- a/include/hw/pci-host/spapr.h
>>> +++ b/include/hw/pci-host/spapr.h
>>> @@ -79,6 +79,7 @@ struct sPAPRPHBState {
>>>  uint64_t dma64_win_addr;
>>>  
>>>  uint32_t numa_node;
>>> +char *root_bus_type;
>>>  };
>>>  
>>>  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x8000ULL
>>
> 


-- 
Alexey



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-10-30 Thread David Gibson
On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
> On Fri, 28 Oct 2016 18:56:40 +1100
> Alexey Kardashevskiy  wrote:
> 
> > At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
> > This means that vfio-pci devices attached to it (and this is
> > a default behaviour) hide PCIe extended capabilities as
> > the bus does not pass a pci_bus_is_express(pdev->bus) check.
> > 
> > This changes adds a default PCI bus type property to sPAPR PHB
> > and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
> > for backward compatibility as a bus type is used in the bus name
> > so the root bus name becomes "pcie.0" instead of "pci.0".
> > 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> > 
> > What can possibly go wrong with such change of a name?
> > From devices prospective, I cannot see any.
> > 
> > libvirt might get upset as "pci.0" will not be available,
> > will it make sense to create pcie.0 as a root bus and always
> > add a PCIe->PCI bridge and name its bus "pci.0"?
> > 
> > Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
> > pci_register_bus() can do this.
> > 
> > 
> > ---
> >  hw/ppc/spapr.c  | 5 +
> >  hw/ppc/spapr_pci.c  | 5 -
> >  include/hw/pci-host/spapr.h | 1 +
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 0b3820b..a268511 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
> >  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> >  .property = "mem64_win_size",   \
> >  .value= "0",\
> > +},  \
> > +{   \
> > +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> > +.property = "root_bus_type",\
> > +.value= TYPE_PCI_BUS,   \
> >  },
> >  
> >  static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 7cde30e..2fa1f22 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> > **errp)
> >  bus = pci_register_bus(dev, NULL,
> > pci_spapr_set_irq, pci_spapr_map_irq, sphb,
> > >memspace, >iospace,
> > -   PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
> > +   PCI_DEVFN(0, 0), PCI_NUM_PINS,
> > +   sphb->root_bus_type ? sphb->root_bus_type :
> > +   TYPE_PCIE_BUS);
> 
> Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
> TYPE_PCI_BUS ?

Yes, I think so.  In fact, I think it would be better to make the
property a boolean that just selects PCI-E, rather than this which
exposes qemu (semi-)internal type names on the comamnd line.

> Otherwise, if we pass something like:
> 
> -global spapr-pci-host-bridge.root_bus_type=pcie
> 
> we get the following not really explicit error:
> 
> ERROR:/home/greg/Work/qemu/qemu-spapr/qom/object.c:474:object_new_with_type: 
> assertion failed: (type != NULL)
> 
> >  phb->bus = bus;
> >  qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
> >  
> > @@ -1590,6 +1592,7 @@ static Property spapr_phb_properties[] = {
> >  DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask,
> > (1ULL << 12) | (1ULL << 16)),
> >  DEFINE_PROP_UINT32("numa_node", sPAPRPHBState, numa_node, -1),
> > +DEFINE_PROP_STRING("root_bus_type", sPAPRPHBState, root_bus_type),
> >  DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> > index b92c1b5..a4ee873 100644
> > --- a/include/hw/pci-host/spapr.h
> > +++ b/include/hw/pci-host/spapr.h
> > @@ -79,6 +79,7 @@ struct sPAPRPHBState {
> >  uint64_t dma64_win_addr;
> >  
> >  uint32_t numa_node;
> > +char *root_bus_type;
> >  };
> >  
> >  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x8000ULL
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-10-28 Thread Greg Kurz
On Fri, 28 Oct 2016 18:56:40 +1100
Alexey Kardashevskiy  wrote:

> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
> This means that vfio-pci devices attached to it (and this is
> a default behaviour) hide PCIe extended capabilities as
> the bus does not pass a pci_bus_is_express(pdev->bus) check.
> 
> This changes adds a default PCI bus type property to sPAPR PHB
> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
> for backward compatibility as a bus type is used in the bus name
> so the root bus name becomes "pcie.0" instead of "pci.0".
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> What can possibly go wrong with such change of a name?
> From devices prospective, I cannot see any.
> 
> libvirt might get upset as "pci.0" will not be available,
> will it make sense to create pcie.0 as a root bus and always
> add a PCIe->PCI bridge and name its bus "pci.0"?
> 
> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
> pci_register_bus() can do this.
> 
> 
> ---
>  hw/ppc/spapr.c  | 5 +
>  hw/ppc/spapr_pci.c  | 5 -
>  include/hw/pci-host/spapr.h | 1 +
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0b3820b..a268511 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
>  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
>  .property = "mem64_win_size",   \
>  .value= "0",\
> +},  \
> +{   \
> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> +.property = "root_bus_type",\
> +.value= TYPE_PCI_BUS,   \
>  },
>  
>  static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 7cde30e..2fa1f22 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  bus = pci_register_bus(dev, NULL,
> pci_spapr_set_irq, pci_spapr_map_irq, sphb,
> >memspace, >iospace,
> -   PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
> +   PCI_DEVFN(0, 0), PCI_NUM_PINS,
> +   sphb->root_bus_type ? sphb->root_bus_type :
> +   TYPE_PCIE_BUS);

Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
TYPE_PCI_BUS ?

Otherwise, if we pass something like:

-global spapr-pci-host-bridge.root_bus_type=pcie

we get the following not really explicit error:

ERROR:/home/greg/Work/qemu/qemu-spapr/qom/object.c:474:object_new_with_type: 
assertion failed: (type != NULL)

>  phb->bus = bus;
>  qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
>  
> @@ -1590,6 +1592,7 @@ static Property spapr_phb_properties[] = {
>  DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask,
> (1ULL << 12) | (1ULL << 16)),
>  DEFINE_PROP_UINT32("numa_node", sPAPRPHBState, numa_node, -1),
> +DEFINE_PROP_STRING("root_bus_type", sPAPRPHBState, root_bus_type),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index b92c1b5..a4ee873 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -79,6 +79,7 @@ struct sPAPRPHBState {
>  uint64_t dma64_win_addr;
>  
>  uint32_t numa_node;
> +char *root_bus_type;
>  };
>  
>  #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x8000ULL