Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-30 Thread Cédric Le Goater
On 07/27/2018 11:19 AM, Cédric Le Goater wrote:
> On 07/27/2018 10:43 AM, Benjamin Herrenschmidt wrote:
>> On Fri, 2018-07-27 at 10:25 +0200, Cédric Le Goater wrote:
>>> Each PHB creates a pci-bridge device and the PCI bus that comes with it. 
>>> It makes things easier to define PCI devices. 
>>>
>>> It is still quite complex ... Here is a sample :
>>>
>>> qemu-system-ppc64 -m 2G -machine powernv \
>>>   -cpu POWER8 -smp 2,cores=2,threads=1 -accel tcg,thread=multi \
>>>   -kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \
>>>   \
>>>   -device megasas,id=scsi0,bus=pci.0,addr=0x1 \
>>>   -drive 
>>> file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
>>>  \
>>>   -device 
>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
>>>  \
>>>   \
>>>   -device ich9-ahci,id=sata0,bus=pci.1,addr=0x1 \
>>>   -drive 
>>> file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \
>>>   -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \
>>>   -device e1000,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pci.1,addr=0x2 \
>>>   -netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \
>>>   -device nec-usb-xhci,bus=pci.1,addr=0x7 \
>>
>> I don't understand why. That means you can't put emulated (or real)
>> PCIe device below it ?
> 
> Well, skiboot does seem to find them. But that's not a good reason.
> I will dig in.

Nothing is wrong in skiboot, that was my misunderstanding of the overall 
device layout.

Each PHB3 device exposes a single PCIE root port on which a *single* device 
can be plugged. It can be a any adapter or a bridge/switch. The QEMU device 
hierarchy is rather deep if you want to have multiple devices on a single 
PHB3 :

dev: pnv-phb3
  bus: phb3-root-bus
dev: phb3-root-port
  bus: pcie 
 dev: pcie-bridge (only one device allowed) 
bus: pcie 
   dev: e1000e
   dev: ...


So it makes the command line rather heavy :/ or you need to have 
more phb3s to define an adapter on each. How about 3 per chip ?


Here is a quick survey,

On OpenPower systems: 

  palmetto:  3 PHBs - one 16x, one 8x, one 4x backed by a switch
 with the BMC devices 

  habanero:  3 PHBs - one 16x, one 8x, one 8x backed by a switch 
 with the BMC devices 

  firestone: socket 0 : 2 16x PHBs
 socket 1 : 3 PHBs - one 16x, one 8x, one 8x backed  
by a switch with the BMC devices  

  garrison:  socket 0 : 1 16x PHB, 2 8x PHB linked to the GPUs, 1 8x PHB, 
 socket 1 : 1 16x PHB, 2 8x PHB linked to the GPUs, 1 8x PHB
backed by a switch with the BMC devices  


On IBM systems :

  tuletas:   2 16x PHB, 2 8x PHB, backed with a switch. 
  

Thanks,

C. 



Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-27 Thread Benjamin Herrenschmidt
On Fri, 2018-07-27 at 15:32 +1000, David Gibson wrote:
> 
> > > What is this pci bridge representing?  I know PCI-e PHBs typically
> > > have a pseudo P2P bridge right under them, but isn't that represnted
> > > by the root complex above?
> > 
> > This is the legacy pci bridge under the pcie bus.
> 
> Ah, ok.  Didn't realise there was a vanilla PCI bridge built in.

There isn't. That should probably be created by the machine.

> > Here is is the qdev hierarchy :
> > 
> >   dev: pnv-phb3, id ""
> > index = 0 (0x0)
> > chip-id = 0 (0x0)
> > bus: phb3-root-bus
> >   type pnv-phb3-root-bus
> >   dev: pnv-phb3-rc, id ""
> > power_controller_present = true
> > chassis = 0 (0x0)
> > slot = 1 (0x1)
> > port = 0 (0x0)
> > aer_log_max = 8 (0x8)
> > addr = 00.0
> > romfile = ""
> > rombar = 1 (0x1)
> > multifunction = false
> > command_serr_enable = true
> > x-pcie-lnksta-dllla = true
> > x-pcie-extcap-init = true
> > class PCI bridge, addr 00:00.0, pci id 1014:03dc (sub :)
> > bus: pcie.0
> >   type PCIE
> >   dev: pci-bridge, id ""
> > chassis_nr = 128 (0x80)
> > msi = "off"
> > shpc = false
> > addr = 00.0
> > romfile = ""
> > rombar = 1 (0x1)
> > multifunction = false
> > command_serr_enable = true
> > x-pcie-lnksta-dllla = true
> > x-pcie-extcap-init = true
> > class PCI bridge, addr 00:00.0, pci id 1b36:0001 (sub :)
> > bus: pci.0
> >   type PCI
> > 
> > [ ... ]
> > 
> > >> +static const PropertyInfo pnv_phb3_phb_id_propinfo = {
> > >> +.name = "irq",
> > >> +.get = pnv_phb3_get_phb_id,
> > >> +.set = pnv_phb3_set_phb_id,
> > >> +};
> > > 
> > > Can't you use a static DeviceProps style property for this, which is a
> > > bit simpler?
> > 
> > OK. We will address user creatable PHBs in some other way. Most
> > certainly in the realize routine like you suggested.
> > 
> > Thanks,
> > 
> > C. 
> > 
> 




Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-27 Thread Cédric Le Goater
On 07/27/2018 10:43 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2018-07-27 at 10:25 +0200, Cédric Le Goater wrote:
>> Each PHB creates a pci-bridge device and the PCI bus that comes with it. 
>> It makes things easier to define PCI devices. 
>>
>> It is still quite complex ... Here is a sample :
>>
>> qemu-system-ppc64 -m 2G -machine powernv \
>>   -cpu POWER8 -smp 2,cores=2,threads=1 -accel tcg,thread=multi \
>>   -kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \
>>   \
>>   -device megasas,id=scsi0,bus=pci.0,addr=0x1 \
>>   -drive 
>> file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
>>  \
>>   -device 
>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
>>  \
>>   \
>>   -device ich9-ahci,id=sata0,bus=pci.1,addr=0x1 \
>>   -drive 
>> file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \
>>   -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \
>>   -device e1000,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pci.1,addr=0x2 \
>>   -netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \
>>   -device nec-usb-xhci,bus=pci.1,addr=0x7 \
> 
> I don't understand why. That means you can't put emulated (or real)
> PCIe device below it ?

Well, skiboot does seem to find them. But that's not a good reason.
I will dig in.

> Why force them down the path of having a bridge to legacy PCI always ?
>
> My original intent was to have one such bridge in a machine for the
> various default PCI devices qemu has that aren't (yet) PCIe (and also
> for testing :-) but I never thought we'd throw one onto every PCIe bus.

OK. Let's get of rid it and let's find why the FW doesn't see the PCIe
devices.

Thanks,

C.



Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-27 Thread Benjamin Herrenschmidt
On Fri, 2018-07-27 at 10:25 +0200, Cédric Le Goater wrote:
> Each PHB creates a pci-bridge device and the PCI bus that comes with it. 
> It makes things easier to define PCI devices. 
> 
> It is still quite complex ... Here is a sample :
> 
> qemu-system-ppc64 -m 2G -machine powernv \
>   -cpu POWER8 -smp 2,cores=2,threads=1 -accel tcg,thread=multi \
>   -kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \
>   \
>   -device megasas,id=scsi0,bus=pci.0,addr=0x1 \
>   -drive 
> file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
>  \
>   -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
>  \
>   \
>   -device ich9-ahci,id=sata0,bus=pci.1,addr=0x1 \
>   -drive 
> file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \
>   -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \
>   -device e1000,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pci.1,addr=0x2 \
>   -netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \
>   -device nec-usb-xhci,bus=pci.1,addr=0x7 \

I don't understand why. That means you can't put emulated (or real)
PCIe device below it ? Why force them down the path of having a bridge
to legacy PCI always ?

My original intent was to have one such bridge in a machine for the
various default PCI devices qemu has that aren't (yet) PCIe (and also
for testing :-) but I never thought we'd throw one onto every PCIe bus.

Ben.





Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-27 Thread Cédric Le Goater
On 07/27/2018 10:08 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2018-07-27 at 15:32 +1000, David Gibson wrote:
>>
 What is this pci bridge representing?  I know PCI-e PHBs typically
 have a pseudo P2P bridge right under them, but isn't that represnted
 by the root complex above?
>>>
>>> This is the legacy pci bridge under the pcie bus.
>>
>> Ah, ok.  Didn't realise there was a vanilla PCI bridge built in.
> 
> There isn't. That should probably be created by the machine.

Each PHB creates a pci-bridge device and the PCI bus that comes with it. 
It makes things easier to define PCI devices. 

It is still quite complex ... Here is a sample :

qemu-system-ppc64 -m 2G -machine powernv \
  -cpu POWER8 -smp 2,cores=2,threads=1 -accel tcg,thread=multi \
  -kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \
  \
  -device megasas,id=scsi0,bus=pci.0,addr=0x1 \
  -drive 
file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none 
\
  -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
 \
  \
  -device ich9-ahci,id=sata0,bus=pci.1,addr=0x1 \
  -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \
  -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \
  -device e1000,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pci.1,addr=0x2 \
  -netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \
  -device nec-usb-xhci,bus=pci.1,addr=0x7 \


 


>>> Here is is the qdev hierarchy :
>>>
>>>   dev: pnv-phb3, id ""
>>> index = 0 (0x0)
>>> chip-id = 0 (0x0)
>>> bus: phb3-root-bus
>>>   type pnv-phb3-root-bus
>>>   dev: pnv-phb3-rc, id ""
>>> power_controller_present = true
>>> chassis = 0 (0x0)
>>> slot = 1 (0x1)
>>> port = 0 (0x0)
>>> aer_log_max = 8 (0x8)
>>> addr = 00.0
>>> romfile = ""
>>> rombar = 1 (0x1)
>>> multifunction = false
>>> command_serr_enable = true
>>> x-pcie-lnksta-dllla = true
>>> x-pcie-extcap-init = true
>>> class PCI bridge, addr 00:00.0, pci id 1014:03dc (sub :)
>>> bus: pcie.0
>>>   type PCIE
>>>   dev: pci-bridge, id ""
>>> chassis_nr = 128 (0x80)
>>> msi = "off"
>>> shpc = false
>>> addr = 00.0
>>> romfile = ""
>>> rombar = 1 (0x1)
>>> multifunction = false
>>> command_serr_enable = true
>>> x-pcie-lnksta-dllla = true
>>> x-pcie-extcap-init = true
>>> class PCI bridge, addr 00:00.0, pci id 1b36:0001 (sub :)
>>> bus: pci.0
>>>   type PCI
>>>
>>> [ ... ]
>>>
> +static const PropertyInfo pnv_phb3_phb_id_propinfo = {
> +.name = "irq",
> +.get = pnv_phb3_get_phb_id,
> +.set = pnv_phb3_set_phb_id,
> +};

 Can't you use a static DeviceProps style property for this, which is a
 bit simpler?
>>>
>>> OK. We will address user creatable PHBs in some other way. Most
>>> certainly in the realize routine like you suggested.
>>>
>>> Thanks,
>>>
>>> C. 
>>>
>>
> 




Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-27 Thread Benjamin Herrenschmidt
On Fri, 2018-07-27 at 09:16 +0200, Cédric Le Goater wrote:
> > I'd have to remember how PQ works on P8 ... my gut feeling is that we
> > should resend if P=1 but I'm no 100% certain.
> 
> This is not exactly what the code does. To force a resend, it ignores 
> P but if Q=1, it bails out without doing anything, like a normal trigger 
> would do. So I think that in the resend case we should ignore Q as well.

Not too sure. We might need to swallow the resend if PQ became 01.

I don't know for sure what the HW does. Does it unconditionally resend
or does it re-evaluate the priority etc... it's unclear to me.

But yes, we should still resend if PQ=11

Cheers,
Ben.





Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-27 Thread Cédric Le Goater
>>> That said... shouldn't you filter our invalid or read-only regs before
>>> updating the cache?
>>
>> hmm, the current model relies on the fact that some registers have
>> their value implicitly updated. But I guess, we can dump which are
>> accessed and implement a default behavior for these. I will take a
>> look.
> 
> I don't really follow what you're saying here, but I guess I'll see
> what it looks like in te next spin.

My idea was to printf the registers being used and populate the switch 
statement, but there are too many of them and it would only pollute 
the code for not much benefit I think 

I will keep it as it is and may be fix a few things if possible.

C.



Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-27 Thread Cédric Le Goater
On 07/27/2018 12:36 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2018-07-26 at 11:03 +0200, Cédric Le Goater wrote:
>> Ben,
>>
>> I have found out recently that the QEMU PowerNV could hang while accessing
>> the disk. 
>>
>> The issue seems to be the phb3_msi_try_send() routine when called from 
>> the resend() handler. The 'P' is ignored in that case but not the 'Q' 
>> bit which means that no interrupt will be resent if P|Q are set. 
> 
> I'd have to remember how PQ works on P8 ... my gut feeling is that we
> should resend if P=1 but I'm no 100% certain.

This is not exactly what the code does. To force a resend, it ignores 
P but if Q=1, it bails out without doing anything, like a normal trigger 
would do. So I think that in the resend case we should ignore Q as well.

Thanks,

C.


> 
>> See the log extract below  :
>>
>>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200500fd eff pq=0 
>> prio=5 server=8 ignore_p=0
>>   PHB3(phb3_msi_set_p): MSI 0: setting P
>>   PHB3(phb3_msi_set_p):  IVE readback: 0x200501fd
>>   PHB3(phb3_msi_reject): MSI 0 rejected
>>   PHB3(phb3_msi_resend): MSI resend...
>>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200501fd eff pq=0 
>> prio=5 server=8 ignore_p=1
>>   PHB3(phb3_msi_set_p): MSI 0: setting P
>>   PHB3(phb3_msi_set_p):  IVE readback: 0x200501fd
>>   PHB3(phb3_msi_reject): MSI 0 rejected
>>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200501fd eff pq=2 
>> prio=5 server=8 ignore_p=0
>>   PHB3(phb3_msi_set_q): MSI 0: setting Q
>>   PHB3(phb3_msi_set_q):  IVE readback: 0x2005010100fd
>>   PHB3(phb3_msi_resend): MSI resend...
>>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=1 
>> prio=5 server=8 ignore_p=1
>>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 
>> prio=5 server=8 ignore_p=0
>>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 
>> prio=5 server=8 ignore_p=0
>>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 
>> prio=5 server=8 ignore_p=0
>>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 
>> prio=5 server=8 ignore_p=0
>>   ... goes on and on ...
>>   hangs
>>
>> I have added the relevant code at the bottom of the email. 
>>
>>
>> If the 'Q' bit is ignored also, the results are good with a SATA drive 
>> or a SCSI drive using the megasas model. Do you think this is correct ?
>> I would say so but I am still discovering that part.
>>
>> I have no idea why it didn't show up before. May be because we mostly 
>> used virtio-blk.
>>
>> Thanks,
>>
>> C. 
>>
>>> +static void phb3_msi_try_send(Phb3MsiState *msi, int srcno, bool ignore_p)
>>> +{
>>> +ICSState *ics = ICS_BASE(msi);
>>> +uint64_t ive;
>>> +uint64_t server, prio, pq, gen;
>>> +
>>> +if (!phb3_msi_read_ive(msi->phb, srcno, )) {
>>> +return;
>>> +}
>>> +
>>> +server = GETFIELD(IODA2_IVT_SERVER, ive);
>>> +prio = GETFIELD(IODA2_IVT_PRIORITY, ive);
>>> +pq = GETFIELD(IODA2_IVT_Q, ive);
>>> +if (!ignore_p) {
>>> +pq |= GETFIELD(IODA2_IVT_P, ive) << 1;
>>> +}
>>> +gen = GETFIELD(IODA2_IVT_GEN, ive);
>>> +
>>> +/*
>>> + * The low order 2 bits are the link pointer (Type II interrupts).
>>> + * Shift back to get a valid IRQ server.
>>> + */
>>> +server >>= 2;
>>> +
>>> +switch (pq) {
>>> +case 0: /* 00 */
>>> +if (prio == 0xff) {
>>> +/* Masked, set Q */
>>> +phb3_msi_set_q(msi, srcno);
>>> +} else {
>>> +/* Enabled, set P and send */
>>> +phb3_msi_set_p(msi, srcno, gen);
>>> +icp_irq(ics, server, srcno + ics->offset, prio);
>>> +}
>>> +break;
>>> +case 2: /* 10 */
>>> +/* Already pending, set Q */
>>> +phb3_msi_set_q(msi, srcno);
>>> +break;
>>> +case 1: /* 01 */
>>> +case 3: /* 11 */
>>> +default:
>>> +/* Just drop stuff if Q already set */
>>> +break;
>>> +}
>>> +}
>>> +
>>> +static void phb3_msi_set_irq(void *opaque, int srcno, int val)
>>> +{
>>> +Phb3MsiState *msi = PHB3_MSI(opaque);
>>> +
>>> +if (val) {
>>> +phb3_msi_try_send(msi, srcno, false);
>>> +}
>>> +}
>>
>> [ ... ]
>>  
>>> +static void phb3_msi_resend(ICSState *ics)
>>> +{
>>> +Phb3MsiState *msi = PHB3_MSI(ics);
>>> +unsigned int i, j;
>>> +
>>> +if (msi->rba_sum == 0) {
>>> +return;
>>> +}
>>> +
>>> +for (i = 0; i < 32; i++) {
>>> +if ((msi->rba_sum & (1u << i)) == 0) {
>>> +continue;
>>> +}
>>> +msi->rba_sum &= ~(1u << i);
>>> +for (j = 0; j < 64; j++) {
>>> +if ((msi->rba[i] & (1ull << j)) == 0) {
>>> +continue;
>>> +}
>>> +msi->rba[i] &= ~(1u << j);
>>> +phb3_msi_try_send(msi, i * 64 + j, true);
>>> +}
>>> +}
>>> +}
>>
>>
> 




Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-26 Thread David Gibson
On Tue, Jul 24, 2018 at 02:18:30PM +0200, Cédric Le Goater wrote:
> Hello,
> 
> >From the discussion on the XICS MSI object, I gather that exporting
> icp_irq is fine. 
> 
> Some more comments below, I have tried to answer the parts that were 
> not addressed yet.

[snip]
> > Could you use memory_region_set_enabled() rather than having to add
> > and delete the subregion and keep track of it in this separate
> > variable?
> 
> pnv_phb3_check_m32/m64 remap the regions.

Ah, ok.

[snip]
> > That said... shouldn't you filter our invalid or read-only regs before
> > updating the cache?
> 
> hmm, the current model relies on the fact that some registers have
> their value implicitly updated. But I guess, we can dump which are
> accessed and implement a default behavior for these. I will take a
> look.

I don't really follow what you're saying here, but I guess I'll see
what it looks like in te next spin.

> 
> [ ... ]
> 
> >> +static const MemoryRegionOps pnv_phb3_msi_ops = {
> >> +/* There is no .read as the read result is undefined by PCI spec */
> > 
> > What will qemu do if it hits a NULL read here?  The behaviour may be
> > undefind, but we'd prefer it didn't crash qemu..
> 
> I will add one with a standard error message. It is better to have
> some output anyway.
> 
> [ ... ]
> 
> >> +static void pnv_phb3_pci_create(PnvPHB3 *phb, Error **errp)
> >> +{
> >> +PCIHostState *pcih = PCI_HOST_BRIDGE(phb);
> >> +PCIDevice *brdev;
> >> +PCIDevice *pdev;
> >> +PCIBus *parent;
> >> +uint8_t chassis = phb->chip_id * 4 + phb->phb_id;
> >> +uint8_t chassis_nr = 128;
> >> +
> >> +/* Add root complex */
> >> +pdev = pci_create(pcih->bus, 0, TYPE_PNV_PHB3_RC);
> >> +object_property_add_child(OBJECT(phb), "phb3-rc", OBJECT(pdev), NULL);
> >> +qdev_prop_set_uint8(DEVICE(pdev), "chassis", chassis);
> >> +qdev_prop_set_uint16(DEVICE(pdev), "slot", 1);
> >> +qdev_init_nofail(DEVICE(pdev));
> >> +
> >> +/* Setup bus for that chip */
> >> +parent = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
> >> +
> >> +brdev = pci_create(parent, 0, "pci-bridge");
> > 
> > What is this pci bridge representing?  I know PCI-e PHBs typically
> > have a pseudo P2P bridge right under them, but isn't that represnted
> > by the root complex above?
> 
> This is the legacy pci bridge under the pcie bus.

Ah, ok.  Didn't realise there was a vanilla PCI bridge built in.

> Here is is the qdev hierarchy :
> 
>   dev: pnv-phb3, id ""
> index = 0 (0x0)
> chip-id = 0 (0x0)
> bus: phb3-root-bus
>   type pnv-phb3-root-bus
>   dev: pnv-phb3-rc, id ""
> power_controller_present = true
> chassis = 0 (0x0)
> slot = 1 (0x1)
> port = 0 (0x0)
> aer_log_max = 8 (0x8)
> addr = 00.0
> romfile = ""
> rombar = 1 (0x1)
> multifunction = false
> command_serr_enable = true
> x-pcie-lnksta-dllla = true
> x-pcie-extcap-init = true
> class PCI bridge, addr 00:00.0, pci id 1014:03dc (sub :)
> bus: pcie.0
>   type PCIE
>   dev: pci-bridge, id ""
> chassis_nr = 128 (0x80)
> msi = "off"
> shpc = false
> addr = 00.0
> romfile = ""
> rombar = 1 (0x1)
> multifunction = false
> command_serr_enable = true
> x-pcie-lnksta-dllla = true
> x-pcie-extcap-init = true
> class PCI bridge, addr 00:00.0, pci id 1b36:0001 (sub :)
> bus: pci.0
>   type PCI
> 
> [ ... ]
> 
> >> +static const PropertyInfo pnv_phb3_phb_id_propinfo = {
> >> +.name = "irq",
> >> +.get = pnv_phb3_get_phb_id,
> >> +.set = pnv_phb3_set_phb_id,
> >> +};
> > 
> > Can't you use a static DeviceProps style property for this, which is a
> > bit simpler?
> 
> OK. We will address user creatable PHBs in some other way. Most
> certainly in the realize routine like you suggested.
> 
> Thanks,
> 
> C. 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-26 Thread Benjamin Herrenschmidt
On Thu, 2018-07-26 at 11:03 +0200, Cédric Le Goater wrote:
> Ben,
> 
> I have found out recently that the QEMU PowerNV could hang while accessing
> the disk. 
> 
> The issue seems to be the phb3_msi_try_send() routine when called from 
> the resend() handler. The 'P' is ignored in that case but not the 'Q' 
> bit which means that no interrupt will be resent if P|Q are set. 

I'd have to remember how PQ works on P8 ... my gut feeling is that we
should resend if P=1 but I'm no 100% certain.

> See the log extract below  :
> 
>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200500fd eff pq=0 
> prio=5 server=8 ignore_p=0
>   PHB3(phb3_msi_set_p): MSI 0: setting P
>   PHB3(phb3_msi_set_p):  IVE readback: 0x200501fd
>   PHB3(phb3_msi_reject): MSI 0 rejected
>   PHB3(phb3_msi_resend): MSI resend...
>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200501fd eff pq=0 
> prio=5 server=8 ignore_p=1
>   PHB3(phb3_msi_set_p): MSI 0: setting P
>   PHB3(phb3_msi_set_p):  IVE readback: 0x200501fd
>   PHB3(phb3_msi_reject): MSI 0 rejected
>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200501fd eff pq=2 
> prio=5 server=8 ignore_p=0
>   PHB3(phb3_msi_set_q): MSI 0: setting Q
>   PHB3(phb3_msi_set_q):  IVE readback: 0x2005010100fd
>   PHB3(phb3_msi_resend): MSI resend...
>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=1 
> prio=5 server=8 ignore_p=1
>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 
> prio=5 server=8 ignore_p=0
>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 
> prio=5 server=8 ignore_p=0
>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 
> prio=5 server=8 ignore_p=0
>   PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 
> prio=5 server=8 ignore_p=0
>   ... goes on and on ...
>   hangs
> 
> I have added the relevant code at the bottom of the email. 
> 
> 
> If the 'Q' bit is ignored also, the results are good with a SATA drive 
> or a SCSI drive using the megasas model. Do you think this is correct ?
> I would say so but I am still discovering that part.
> 
> I have no idea why it didn't show up before. May be because we mostly 
> used virtio-blk.
> 
> Thanks,
> 
> C. 
> 
> > +static void phb3_msi_try_send(Phb3MsiState *msi, int srcno, bool ignore_p)
> > +{
> > +ICSState *ics = ICS_BASE(msi);
> > +uint64_t ive;
> > +uint64_t server, prio, pq, gen;
> > +
> > +if (!phb3_msi_read_ive(msi->phb, srcno, )) {
> > +return;
> > +}
> > +
> > +server = GETFIELD(IODA2_IVT_SERVER, ive);
> > +prio = GETFIELD(IODA2_IVT_PRIORITY, ive);
> > +pq = GETFIELD(IODA2_IVT_Q, ive);
> > +if (!ignore_p) {
> > +pq |= GETFIELD(IODA2_IVT_P, ive) << 1;
> > +}
> > +gen = GETFIELD(IODA2_IVT_GEN, ive);
> > +
> > +/*
> > + * The low order 2 bits are the link pointer (Type II interrupts).
> > + * Shift back to get a valid IRQ server.
> > + */
> > +server >>= 2;
> > +
> > +switch (pq) {
> > +case 0: /* 00 */
> > +if (prio == 0xff) {
> > +/* Masked, set Q */
> > +phb3_msi_set_q(msi, srcno);
> > +} else {
> > +/* Enabled, set P and send */
> > +phb3_msi_set_p(msi, srcno, gen);
> > +icp_irq(ics, server, srcno + ics->offset, prio);
> > +}
> > +break;
> > +case 2: /* 10 */
> > +/* Already pending, set Q */
> > +phb3_msi_set_q(msi, srcno);
> > +break;
> > +case 1: /* 01 */
> > +case 3: /* 11 */
> > +default:
> > +/* Just drop stuff if Q already set */
> > +break;
> > +}
> > +}
> > +
> > +static void phb3_msi_set_irq(void *opaque, int srcno, int val)
> > +{
> > +Phb3MsiState *msi = PHB3_MSI(opaque);
> > +
> > +if (val) {
> > +phb3_msi_try_send(msi, srcno, false);
> > +}
> > +}
> 
> [ ... ]
>  
> > +static void phb3_msi_resend(ICSState *ics)
> > +{
> > +Phb3MsiState *msi = PHB3_MSI(ics);
> > +unsigned int i, j;
> > +
> > +if (msi->rba_sum == 0) {
> > +return;
> > +}
> > +
> > +for (i = 0; i < 32; i++) {
> > +if ((msi->rba_sum & (1u << i)) == 0) {
> > +continue;
> > +}
> > +msi->rba_sum &= ~(1u << i);
> > +for (j = 0; j < 64; j++) {
> > +if ((msi->rba[i] & (1ull << j)) == 0) {
> > +continue;
> > +}
> > +msi->rba[i] &= ~(1u << j);
> > +phb3_msi_try_send(msi, i * 64 + j, true);
> > +}
> > +}
> > +}
> 
> 




Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-26 Thread Cédric Le Goater
Ben,

I have found out recently that the QEMU PowerNV could hang while accessing
the disk. 

The issue seems to be the phb3_msi_try_send() routine when called from 
the resend() handler. The 'P' is ignored in that case but not the 'Q' 
bit which means that no interrupt will be resent if P|Q are set. 

See the log extract below  :

  PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200500fd eff pq=0 prio=5 
server=8 ignore_p=0
  PHB3(phb3_msi_set_p): MSI 0: setting P
  PHB3(phb3_msi_set_p):  IVE readback: 0x200501fd
  PHB3(phb3_msi_reject): MSI 0 rejected
  PHB3(phb3_msi_resend): MSI resend...
  PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200501fd eff pq=0 prio=5 
server=8 ignore_p=1
  PHB3(phb3_msi_set_p): MSI 0: setting P
  PHB3(phb3_msi_set_p):  IVE readback: 0x200501fd
  PHB3(phb3_msi_reject): MSI 0 rejected
  PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x200501fd eff pq=2 prio=5 
server=8 ignore_p=0
  PHB3(phb3_msi_set_q): MSI 0: setting Q
  PHB3(phb3_msi_set_q):  IVE readback: 0x2005010100fd
  PHB3(phb3_msi_resend): MSI resend...
  PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=1 prio=5 
server=8 ignore_p=1
  PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 prio=5 
server=8 ignore_p=0
  PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 prio=5 
server=8 ignore_p=0
  PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 prio=5 
server=8 ignore_p=0
  PHB3(phb3_msi_try_send): MSI 0: try_send, ive=0x2005010100fd eff pq=3 prio=5 
server=8 ignore_p=0
  ... goes on and on ...
  hangs

I have added the relevant code at the bottom of the email. 


If the 'Q' bit is ignored also, the results are good with a SATA drive 
or a SCSI drive using the megasas model. Do you think this is correct ?
I would say so but I am still discovering that part.

I have no idea why it didn't show up before. May be because we mostly 
used virtio-blk.

Thanks,

C. 

> +static void phb3_msi_try_send(Phb3MsiState *msi, int srcno, bool ignore_p)
> +{
> +ICSState *ics = ICS_BASE(msi);
> +uint64_t ive;
> +uint64_t server, prio, pq, gen;
> +
> +if (!phb3_msi_read_ive(msi->phb, srcno, )) {
> +return;
> +}
> +
> +server = GETFIELD(IODA2_IVT_SERVER, ive);
> +prio = GETFIELD(IODA2_IVT_PRIORITY, ive);
> +pq = GETFIELD(IODA2_IVT_Q, ive);
> +if (!ignore_p) {
> +pq |= GETFIELD(IODA2_IVT_P, ive) << 1;
> +}
> +gen = GETFIELD(IODA2_IVT_GEN, ive);
> +
> +/*
> + * The low order 2 bits are the link pointer (Type II interrupts).
> + * Shift back to get a valid IRQ server.
> + */
> +server >>= 2;
> +
> +switch (pq) {
> +case 0: /* 00 */
> +if (prio == 0xff) {
> +/* Masked, set Q */
> +phb3_msi_set_q(msi, srcno);
> +} else {
> +/* Enabled, set P and send */
> +phb3_msi_set_p(msi, srcno, gen);
> +icp_irq(ics, server, srcno + ics->offset, prio);
> +}
> +break;
> +case 2: /* 10 */
> +/* Already pending, set Q */
> +phb3_msi_set_q(msi, srcno);
> +break;
> +case 1: /* 01 */
> +case 3: /* 11 */
> +default:
> +/* Just drop stuff if Q already set */
> +break;
> +}
> +}
> +
> +static void phb3_msi_set_irq(void *opaque, int srcno, int val)
> +{
> +Phb3MsiState *msi = PHB3_MSI(opaque);
> +
> +if (val) {
> +phb3_msi_try_send(msi, srcno, false);
> +}
> +}

[ ... ]
 
> +static void phb3_msi_resend(ICSState *ics)
> +{
> +Phb3MsiState *msi = PHB3_MSI(ics);
> +unsigned int i, j;
> +
> +if (msi->rba_sum == 0) {
> +return;
> +}
> +
> +for (i = 0; i < 32; i++) {
> +if ((msi->rba_sum & (1u << i)) == 0) {
> +continue;
> +}
> +msi->rba_sum &= ~(1u << i);
> +for (j = 0; j < 64; j++) {
> +if ((msi->rba[i] & (1ull << j)) == 0) {
> +continue;
> +}
> +msi->rba[i] &= ~(1u << j);
> +phb3_msi_try_send(msi, i * 64 + j, true);
> +}
> +}
> +}






Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-24 Thread Cédric Le Goater
Hello,

>From the discussion on the XICS MSI object, I gather that exporting
icp_irq is fine. 

Some more comments below, I have tried to answer the parts that were 
not addressed yet. 

[ ... ]

>> +/*
>> + * The CONFIG_DATA register expects little endian accesses, but as the
>> + * region is big endian, we have to swap the value.
>> + */
>> +static void pnv_phb3_config_write(PnvPHB3 *phb, unsigned off,
>> +  unsigned size, uint64_t val)
>> +{
>> +uint32_t cfg_addr, limit;
>> +PCIDevice *pdev;
>> +
>> +pdev = pnv_phb3_find_cfg_dev(phb);
>> +if (!pdev) {
>> +return;
>> +}
>> +cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xfff;
>> +cfg_addr |= off;
> 
> This looks weird - there are callers of this that appear to have low
> bits in 'off', then you're ORing it with overlapping low bits.

Here is an extract of the specs of the PHB_CONFIG_ADDRESS register :

  20:29 Register Number(00:09) RW 0

  Register Number to use for the configuration cycle access. 

  30:31 Register Number(10:11) RO 0

  Don’t Cares. The value written will be dropped, and zeros are
  always returned for a read.  

So, yes, we shoud be using 0xffc, just like in read.

>> +limit = pci_config_size(pdev);
>> +if (limit <= cfg_addr) {
>> +/* conventional pci device can be behind pcie-to-pci bridge.
>> +   256 <= addr < 4K has no effects. */
>> +return;
>> +}
>> +switch (size) {
>> +case 1:
>> +break;
>> +case 2:
>> +val = bswap16(val);
>> +break;
>> +case 4:
>> +val = bswap32(val);
>> +break;
>> +default:
>> +g_assert_not_reached();
>> +}
>> +pci_host_config_write_common(pdev, cfg_addr, limit, val, size);
>> +}
>> +
>> +static uint64_t pnv_phb3_config_read(PnvPHB3 *phb, unsigned off,
>> + unsigned size)
>> +{
>> +uint32_t cfg_addr, limit;
>> +PCIDevice *pdev;
>> +uint64_t val;
>> +
>> +pdev = pnv_phb3_find_cfg_dev(phb);
>> +if (!pdev) {
>> +return ~0ull;
>> +}
>> +cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
>> +cfg_addr |= off;
> 
> This looks better, should it be 0xffc in the write path as well?

yes.

>> +limit = pci_config_size(pdev);
>> +if (limit <= cfg_addr) {
>> +/* conventional pci device can be behind pcie-to-pci bridge.
>> +   256 <= addr < 4K has no effects. */
>> +return ~0ull;
>> +}
>> +val = pci_host_config_read_common(pdev, cfg_addr, limit, size);
>> +switch (size) {
>> +case 1:
>> +return val;
>> +case 2:
>> +return bswap16(val);
>> +case 4:
>> +return bswap32(val);
>> +default:
>> +g_assert_not_reached();
>> +}
>> +}
>> +
>> +static void pnv_phb3_check_m32(PnvPHB3 *phb)
>> +{
>> +uint64_t base, start, size;
>> +MemoryRegion *parent;
>> +PnvPBCQState *pbcq = >pbcq;
>> +
>> +if (phb->m32_mapped) {
>> +memory_region_del_subregion(phb->mr_m32.container, >mr_m32);
>> +phb->m32_mapped = false;
> 
> Could you use memory_region_set_enabled() rather than having to add
> and delete the subregion and keep track of it in this separate
> variable?

pnv_phb3_check_m32/m64 remap the regions. 

[ ... ]

>> +case IODA2_TBL_PEEV:
>> +tptr = phb->ioda_PEEV;
>> +mask = 3;
>> +break;
>> +default:
>> +return NULL;
>> +}
>> +index &= mask;
> 
> Do we want to silently mask, rather than logging an error if there's
> an access out of bounds for a particular table?

I will add an error.

[ ... ]

>> +/* Handle masking */
>> +switch (off) {
>> +case PHB_M64_UPPER_BITS:
> 
> Couldn't you handle this in the switch below - it should be ok to
> momentarily have the invalid bits set in your reg case, as long as you
> mask them before applying the side-effects, yes?

yes.

> That said... shouldn't you filter our invalid or read-only regs before
> updating the cache?

hmm, the current model relies on the fact that some registers have
their value implicitly updated. But I guess, we can dump which are
accessed and implement a default behavior for these. I will take a
look.

[ ... ]

>> +static const MemoryRegionOps pnv_phb3_msi_ops = {
>> +/* There is no .read as the read result is undefined by PCI spec */
> 
> What will qemu do if it hits a NULL read here?  The behaviour may be
> undefind, but we'd prefer it didn't crash qemu..

I will add one with a standard error message. It is better to have
some output anyway.

[ ... ]

>> +static void pnv_phb3_pci_create(PnvPHB3 *phb, Error **errp)
>> +{
>> +PCIHostState *pcih = PCI_HOST_BRIDGE(phb);
>> +PCIDevice *brdev;
>> +PCIDevice *pdev;
>> +PCIBus *parent;
>> +uint8_t chassis = phb->chip_id * 4 + phb->phb_id;
>> +uint8_t chassis_nr = 128;
>> +
>> +/* Add root complex */
>> +pdev = pci_create(pcih->bus, 0, 

Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-23 Thread David Gibson
On Tue, Jul 24, 2018 at 01:49:32PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2018-07-24 at 12:14 +1000, David Gibson wrote:
> > > I don't know, is there much shared logic ? And the shared bits are the
> > > subclassing, that's handled that way...
> > > 
> > > This is really a different piece of HW, a separate ICS implementation,
> > > that has its own quirks, is configured via different registers, does
> > > EOI differently etc...
> > > 
> > > Even the resend stuff is done differently, the resend bitmap is
> > > actually SW visible via an IODA table.
> > > 
> > > I mean, we could (maybe we do these days not sure) have an ICS
> > > superclass wrapper on the actual icp_irq() but that's really all there
> > > is to it I think.
> > 
> > Hm, yeah, fair enough.  I realized what I was suggesting would
> > actually need another layer of qirqs than we have, so it's not a good
> > idea after all.  Ok, I'm happy proceeding with exposing icp_irq().
> 
> The only idea I have if you want to keep icp_* private is to put a
> 'helper' in the ICS superclass that the subclass uses to encapsulate
> the ICP call, but at this point it's just churn.

I concur.

> 
> But yeah you really don't want a layer of qirqs, it wouldn't work any
> way, it's really an ICS, it will send messages to the ICP with a XIRR
> value in them etc... just like an ICS, it's not somethign you want
> qirq's for .
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-23 Thread Benjamin Herrenschmidt
On Tue, 2018-07-24 at 12:14 +1000, David Gibson wrote:
> > I don't know, is there much shared logic ? And the shared bits are the
> > subclassing, that's handled that way...
> > 
> > This is really a different piece of HW, a separate ICS implementation,
> > that has its own quirks, is configured via different registers, does
> > EOI differently etc...
> > 
> > Even the resend stuff is done differently, the resend bitmap is
> > actually SW visible via an IODA table.
> > 
> > I mean, we could (maybe we do these days not sure) have an ICS
> > superclass wrapper on the actual icp_irq() but that's really all there
> > is to it I think.
> 
> Hm, yeah, fair enough.  I realized what I was suggesting would
> actually need another layer of qirqs than we have, so it's not a good
> idea after all.  Ok, I'm happy proceeding with exposing icp_irq().

The only idea I have if you want to keep icp_* private is to put a
'helper' in the ICS superclass that the subclass uses to encapsulate
the ICP call, but at this point it's just churn.

But yeah you really don't want a layer of qirqs, it wouldn't work any
way, it's really an ICS, it will send messages to the ICP with a XIRR
value in them etc... just like an ICS, it's not somethign you want
qirq's for .

Cheers,
Ben.





Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-23 Thread David Gibson
On Tue, Jul 24, 2018 at 09:55:53AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-07-23 at 14:16 +1000, David Gibson wrote:
> > > 
> > > Now, this is an ICS subclass, so why shouldn't it directly poke at the
> > > target ICP ?
> > 
> > That's ok in theory, but causing it to expose the icp interface to a
> > new module isn't great.
> > 
> > > It's an alternate to the normal ICS since it behaves a bit
> > > differently (PQ bits & all).
> > 
> > AFAICT the PQ bits are effectively another filtering layer on top of
> > the same ICS logic as elsewhere.  I think it's better if we can share
> > that shared logic, rather than replicating it.
> 
> I don't know, is there much shared logic ? And the shared bits are the
> subclassing, that's handled that way...
> 
> This is really a different piece of HW, a separate ICS implementation,
> that has its own quirks, is configured via different registers, does
> EOI differently etc...
> 
> Even the resend stuff is done differently, the resend bitmap is
> actually SW visible via an IODA table.
> 
> I mean, we could (maybe we do these days not sure) have an ICS
> superclass wrapper on the actual icp_irq() but that's really all there
> is to it I think.

Hm, yeah, fair enough.  I realized what I was suggesting would
actually need another layer of qirqs than we have, so it's not a good
idea after all.  Ok, I'm happy proceeding with exposing icp_irq().

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-23 Thread David Gibson
On Mon, Jul 23, 2018 at 11:37:06PM +0200, Cédric Le Goater wrote:
> On 07/18/2018 08:12 AM, David Gibson wrote:
> >> +static void pnv_phb3_get_phb_id(Object *obj, Visitor *v, const char *name,
> >> +  void *opaque, Error **errp)
> >> +{
> >> +Property *prop = opaque;
> >> +uint32_t *ptr = qdev_get_prop_ptr(DEVICE(obj), prop);
> >> +
> >> +visit_type_uint32(v, name, ptr, errp);
> >> +}
> >> +
> >> +static void pnv_phb3_set_phb_id(Object *obj, Visitor *v, const char *name,
> >> +  void *opaque, Error **errp)
> >> +{
> >> +PnvPHB3 *phb = PNV_PHB3(obj);
> >> +uint32_t phb_id;
> >> +Error *local_err = NULL;
> >> +
> >> +visit_type_uint32(v, name, _id, _err);
> >> +if (local_err) {
> >> +error_propagate(errp, local_err);
> >> +return;
> >> +}
> >> +
> >> +/*
> >> + * Limit to a maximum of 6 PHBs per chip
> >> + */
> >> +if (phb_id >= PNV8_CHIP_PHB3_MAX) {
> >> +error_setg(errp, "invalid PHB index: '%d'", phb_id);
> >> +return;
> >> +}
> >> +
> >> +phb->phb_id = phb_id;
> >> +}
> >> +
> >> +static const PropertyInfo pnv_phb3_phb_id_propinfo = {
> >> +.name = "irq",
> >> +.get = pnv_phb3_get_phb_id,
> >> +.set = pnv_phb3_set_phb_id,
> >> +};
> > Can't you use a static DeviceProps style property for this, which is a
> > bit simpler?
> 
> yes but we will need these ops for user creatable devices. This is where
> we will check the chip id.

Why?  Can't we check that during realize()?  That's a common pattern
for static properties.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-23 Thread Benjamin Herrenschmidt
On Mon, 2018-07-23 at 14:16 +1000, David Gibson wrote:
> > 
> > Now, this is an ICS subclass, so why shouldn't it directly poke at the
> > target ICP ?
> 
> That's ok in theory, but causing it to expose the icp interface to a
> new module isn't great.
> 
> > It's an alternate to the normal ICS since it behaves a bit
> > differently (PQ bits & all).
> 
> AFAICT the PQ bits are effectively another filtering layer on top of
> the same ICS logic as elsewhere.  I think it's better if we can share
> that shared logic, rather than replicating it.

I don't know, is there much shared logic ? And the shared bits are the
subclassing, that's handled that way...

This is really a different piece of HW, a separate ICS implementation,
that has its own quirks, is configured via different registers, does
EOI differently etc...

Even the resend stuff is done differently, the resend bitmap is
actually SW visible via an IODA table.

I mean, we could (maybe we do these days not sure) have an ICS
superclass wrapper on the actual icp_irq() but that's really all there
is to it I think.

Cheers,
Ben.





Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-23 Thread Cédric Le Goater
On 07/18/2018 08:12 AM, David Gibson wrote:
>> +/* Setup LSI offset */
>> +ics->offset = comp + global;
> Oh.. changing ICS offset at runtime.  I hadn't considered that case..

Yes. The PowerNV FW defines the IRQ layout in the overall number space.

C. 




Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-23 Thread Cédric Le Goater
On 07/18/2018 08:12 AM, David Gibson wrote:
>> +static void pnv_phb3_get_phb_id(Object *obj, Visitor *v, const char *name,
>> +  void *opaque, Error **errp)
>> +{
>> +Property *prop = opaque;
>> +uint32_t *ptr = qdev_get_prop_ptr(DEVICE(obj), prop);
>> +
>> +visit_type_uint32(v, name, ptr, errp);
>> +}
>> +
>> +static void pnv_phb3_set_phb_id(Object *obj, Visitor *v, const char *name,
>> +  void *opaque, Error **errp)
>> +{
>> +PnvPHB3 *phb = PNV_PHB3(obj);
>> +uint32_t phb_id;
>> +Error *local_err = NULL;
>> +
>> +visit_type_uint32(v, name, _id, _err);
>> +if (local_err) {
>> +error_propagate(errp, local_err);
>> +return;
>> +}
>> +
>> +/*
>> + * Limit to a maximum of 6 PHBs per chip
>> + */
>> +if (phb_id >= PNV8_CHIP_PHB3_MAX) {
>> +error_setg(errp, "invalid PHB index: '%d'", phb_id);
>> +return;
>> +}
>> +
>> +phb->phb_id = phb_id;
>> +}
>> +
>> +static const PropertyInfo pnv_phb3_phb_id_propinfo = {
>> +.name = "irq",
>> +.get = pnv_phb3_get_phb_id,
>> +.set = pnv_phb3_set_phb_id,
>> +};
> Can't you use a static DeviceProps style property for this, which is a
> bit simpler?

yes but we will need these ops for user creatable devices. This is where
we will check the chip id.

C. 






Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-23 Thread Cédric Le Goater
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 6ac8a9392da6..966a996c2eac 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -194,6 +194,7 @@ void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
>>  uint32_t icp_accept(ICPState *ss);
>>  uint32_t icp_ipoll(ICPState *ss, uint32_t *mfrr);
>>  void icp_eoi(ICPState *icp, uint32_t xirr);
>> +void icp_irq(ICSState *ics, int server, int nr, uint8_t priority);
> 
> Hrm... are you sure you need to expose this?
> 
>>  void ics_simple_write_xive(ICSState *ics, int nr, int server,
>> uint8_t priority, uint8_t saved_priority);
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index b9f1a3c97214..59e2a5217dcc 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -230,7 +230,7 @@ void icp_eoi(ICPState *icp, uint32_t xirr)
>>  }
>>  }
>>  
>> -static void icp_irq(ICSState *ics, int server, int nr, uint8_t priority)
>> +void icp_irq(ICSState *ics, int server, int nr, uint8_t priority)
>>  {
>>  ICPState *icp = xics_icp_get(ics->xics, server);
>>  

[ ... ]

>> +
>> +static void phb3_msi_try_send(Phb3MsiState *msi, int srcno, bool ignore_p)
>> +{
>> +ICSState *ics = ICS_BASE(msi);
>> +uint64_t ive;
>> +uint64_t server, prio, pq, gen;
>> +
>> +if (!phb3_msi_read_ive(msi->phb, srcno, )) {
>> +return;
>> +}
>> +
>> +server = GETFIELD(IODA2_IVT_SERVER, ive);
>> +prio = GETFIELD(IODA2_IVT_PRIORITY, ive);
>> +pq = GETFIELD(IODA2_IVT_Q, ive);
>> +if (!ignore_p) {
>> +pq |= GETFIELD(IODA2_IVT_P, ive) << 1;
>> +}
>> +gen = GETFIELD(IODA2_IVT_GEN, ive);
>> +
>> +/*
>> + * The low order 2 bits are the link pointer (Type II interrupts).
>> + * Shift back to get a valid IRQ server.
>> + */
>> +server >>= 2;
>> +
>> +switch (pq) {
>> +case 0: /* 00 */
>> +if (prio == 0xff) {
>> +/* Masked, set Q */
>> +phb3_msi_set_q(msi, srcno);
>> +} else {
>> +/* Enabled, set P and send */
>> +phb3_msi_set_p(msi, srcno, gen);
>> +icp_irq(ics, server, srcno + ics->offset, prio);
> 
> Can't you just pulse the right qirq here, which will use the core ICS
> logic to propagate to the ICP?

The interrupt vector entries are maintained in the guest memory. We don't 
have an entry in the device model (hcall, mmio on a register) to update 
an IRQ state. So we need to find the target when the IRQ is triggered.  

The interrupt source unit logic of the phb3 device is closer to the Type 3 
architecture (XIVE) in fact.

C.





Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-22 Thread David Gibson
On Wed, Jul 18, 2018 at 06:03:29PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2018-07-18 at 16:12 +1000, David Gibson wrote:
> > On Thu, Jun 28, 2018 at 10:36:32AM +0200, Cédric Le Goater wrote:
> > > From: Benjamin Herrenschmidt 
> 
> Can you trim your replies ? It's really hard to find your comments in
> such a long patch...
> 
> > > diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> > > index 6ac8a9392da6..966a996c2eac 100644
> > > --- a/include/hw/ppc/xics.h
> > > +++ b/include/hw/ppc/xics.h
> > > @@ -194,6 +194,7 @@ void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
> > >  uint32_t icp_accept(ICPState *ss);
> > >  uint32_t icp_ipoll(ICPState *ss, uint32_t *mfrr);
> > >  void icp_eoi(ICPState *icp, uint32_t xirr);
> > > +void icp_irq(ICSState *ics, int server, int nr, uint8_t priority);
> > 
> > Hrm... are you sure you need to expose this?

[snip]
> > > +limit = pci_config_size(pdev);
> > > +if (limit <= cfg_addr) {
> > > +/* conventional pci device can be behind pcie-to-pci bridge.
> > > +   256 <= addr < 4K has no effects. */
> > > +return ~0ull;
> > > +}
> > > +val = pci_host_config_read_common(pdev, cfg_addr, limit, size);
> > > +switch (size) {
> > > +case 1:
> > > +return val;
> > > +case 2:
> > > +return bswap16(val);
> > > +case 4:
> > > +return bswap32(val);
> > > +default:
> > > +g_assert_not_reached();
> > > +}
> > > +}
> > > +
> > > +static void pnv_phb3_check_m32(PnvPHB3 *phb)
> > > +{
> > > +uint64_t base, start, size;
> > > +MemoryRegion *parent;
> > > +PnvPBCQState *pbcq = >pbcq;
> > > +
> > > +if (phb->m32_mapped) {
> > > +memory_region_del_subregion(phb->mr_m32.container, >mr_m32);
> > > +phb->m32_mapped = false;
> > 
> > Could you use memory_region_set_enabled() rather than having to add
> > and delete the subregion and keep track of it in this separate
> > variable?
> 
> There was a reason for that but it's years old and I forgot... I think
> when re-enabled it moves around, among other things. So it's not more
> efficient.

Hm, ok.  It'd be good to know what this is.

> > > +}
> > > +
> > > +/* Disabled ? move on with life ... */
> > 
> > Trivial: "nothing to do" seems to be the standard way to express this.
> > Even trivialler: usual English rules don't put a space before '?' or
> > '!' punctuation.
> 
> No, that's the tasteless English rule :-) It shall be overridden by
> anybody interested in making things actually readable :-)
> 
> > > +
> > > +static void pnv_phb3_lxivt_write(PnvPHB3 *phb, unsigned idx, uint64_t 
> > > val)
> > > +{
> > > +ICSState *ics = >lsis;
> > > +uint8_t server, prio;
> > > +
> > > +phb->ioda_LXIVT[idx] = val & (IODA2_LXIVT_SERVER |
> > > +  IODA2_LXIVT_PRIORITY |
> > > +  IODA2_LXIVT_NODE_ID);
> > > +server = GETFIELD(IODA2_LXIVT_SERVER, val);
> > > +prio = GETFIELD(IODA2_LXIVT_PRIORITY, val);
> > > +
> > > +/*
> > > + * The low order 2 bits are the link pointer (Type II interrupts).
> > 
> > I don't think we've currently implemented link pointers in our xics
> > code.  Do we need to do that?
> 
> Not until somebody uses them (other than pHyp).
> 
> > > + * Shift back to get a valid IRQ server.
> > > + */
> > > +server >>= 2;
> > > +
> > > +ics_simple_write_xive(ics, idx, server, prio, prio);
> > > +}
> > > +
> > > +static uint64_t *pnv_phb3_ioda_access(PnvPHB3 *phb,
> > > +  unsigned *out_table, unsigned 
> > > *out_idx)
> > > +{
> > > +uint64_t adreg = phb->regs[PHB_IODA_ADDR >> 3];
> > > +unsigned int index = GETFIELD(PHB_IODA_AD_TADR, adreg);
> > > +unsigned int table = GETFIELD(PHB_IODA_AD_TSEL, adreg);
> > > +unsigned int mask;
> > > +uint64_t *tptr = NULL;
> > > +
> > > +switch (table) {
> > > +case IODA2_TBL_LIST:
> > > +tptr = phb->ioda_LIST;
> > > +mask = 7;
> > 
> > I'd suggest hex for the masks.
> 
> This is more readable when matched with the documentation but not a big
> deal.

Matching the docs is a good enough reason to keep decimal.

> > > +break;
> > > +case IODA2_TBL_LXIVT:
> > > +tptr = phb->ioda_LXIVT;
> > > +mask = 7;
> > > +break;
> > > +case IODA2_TBL_IVC_CAM:
> > > +case IODA2_TBL_RBA:
> > > +mask = 31;
> > > +break;
> > > +case IODA2_TBL_RCAM:
> > > +mask = 63;
> > > +break;
> > > +case IODA2_TBL_MRT:
> > > +mask = 7;
> > > +break;
> > > +case IODA2_TBL_PESTA:
> > > +case IODA2_TBL_PESTB:
> > > +mask = 255;
> > > +break;
> > > +case IODA2_TBL_TVT:
> > > +tptr = phb->ioda_TVT;
> > > +mask = 511;
> > > +break;
> > > +case IODA2_TBL_TCAM:
> > > +case IODA2_TBL_TDR:
> > > +mask = 63;
> > > +break;
> > > +case IODA2_TBL_M64BT:

Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-18 Thread Benjamin Herrenschmidt
On Wed, 2018-07-18 at 16:12 +1000, David Gibson wrote:
> On Thu, Jun 28, 2018 at 10:36:32AM +0200, Cédric Le Goater wrote:
> > From: Benjamin Herrenschmidt 

Can you trim your replies ? It's really hard to find your comments in
such a long patch...

> > diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> > index 6ac8a9392da6..966a996c2eac 100644
> > --- a/include/hw/ppc/xics.h
> > +++ b/include/hw/ppc/xics.h
> > @@ -194,6 +194,7 @@ void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
> >  uint32_t icp_accept(ICPState *ss);
> >  uint32_t icp_ipoll(ICPState *ss, uint32_t *mfrr);
> >  void icp_eoi(ICPState *icp, uint32_t xirr);
> > +void icp_irq(ICSState *ics, int server, int nr, uint8_t priority);
> 
> Hrm... are you sure you need to expose this?

See further down...

> > +/*
> > + * The CONFIG_DATA register expects little endian accesses, but as the
> > + * region is big endian, we have to swap the value.
> > + */
> > +static void pnv_phb3_config_write(PnvPHB3 *phb, unsigned off,
> > +  unsigned size, uint64_t val)
> > +{
> > +uint32_t cfg_addr, limit;
> > +PCIDevice *pdev;
> > +
> > +pdev = pnv_phb3_find_cfg_dev(phb);
> > +if (!pdev) {
> > +return;
> > +}
> > +cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xfff;
> > +cfg_addr |= off;
> 
> This looks weird - there are callers of this that appear to have low
> bits in 'off', then you're ORing it with overlapping low bits.

Should be ffc like the read case.

> 
> > +limit = pci_config_size(pdev);
> > +if (limit <= cfg_addr) {
> > +/* conventional pci device can be behind pcie-to-pci bridge.
> > +   256 <= addr < 4K has no effects. */
> > +return;
> > +}
> > +switch (size) {
> > +case 1:
> > +break;
> > +case 2:
> > +val = bswap16(val);
> > +break;
> > +case 4:
> > +val = bswap32(val);
> > +break;
> > +default:
> > +g_assert_not_reached();
> > +}
> > +pci_host_config_write_common(pdev, cfg_addr, limit, val, size);
> > +}
> > +
> > +static uint64_t pnv_phb3_config_read(PnvPHB3 *phb, unsigned off,
> > + unsigned size)
> > +{
> > +uint32_t cfg_addr, limit;
> > +PCIDevice *pdev;
> > +uint64_t val;
> > +
> > +pdev = pnv_phb3_find_cfg_dev(phb);
> > +if (!pdev) {
> > +return ~0ull;
> > +}
> > +cfg_addr = (phb->regs[PHB_CONFIG_ADDRESS >> 3] >> 32) & 0xffc;
> > +cfg_addr |= off;
> 
> This looks better, should it be 0xffc in the write path as well?
> 
> > +limit = pci_config_size(pdev);
> > +if (limit <= cfg_addr) {
> > +/* conventional pci device can be behind pcie-to-pci bridge.
> > +   256 <= addr < 4K has no effects. */
> > +return ~0ull;
> > +}
> > +val = pci_host_config_read_common(pdev, cfg_addr, limit, size);
> > +switch (size) {
> > +case 1:
> > +return val;
> > +case 2:
> > +return bswap16(val);
> > +case 4:
> > +return bswap32(val);
> > +default:
> > +g_assert_not_reached();
> > +}
> > +}
> > +
> > +static void pnv_phb3_check_m32(PnvPHB3 *phb)
> > +{
> > +uint64_t base, start, size;
> > +MemoryRegion *parent;
> > +PnvPBCQState *pbcq = >pbcq;
> > +
> > +if (phb->m32_mapped) {
> > +memory_region_del_subregion(phb->mr_m32.container, >mr_m32);
> > +phb->m32_mapped = false;
> 
> Could you use memory_region_set_enabled() rather than having to add
> and delete the subregion and keep track of it in this separate
> variable?

There was a reason for that but it's years old and I forgot... I think
when re-enabled it moves around, among other things. So it's not more
efficient.

> > +}
> > +
> > +/* Disabled ? move on with life ... */
> 
> Trivial: "nothing to do" seems to be the standard way to express this.
> Even trivialler: usual English rules don't put a space before '?' or
> '!' punctuation.

No, that's the tasteless English rule :-) It shall be overridden by
anybody interested in making things actually readable :-)

> > +
> > +static void pnv_phb3_lxivt_write(PnvPHB3 *phb, unsigned idx, uint64_t val)
> > +{
> > +ICSState *ics = >lsis;
> > +uint8_t server, prio;
> > +
> > +phb->ioda_LXIVT[idx] = val & (IODA2_LXIVT_SERVER |
> > +  IODA2_LXIVT_PRIORITY |
> > +  IODA2_LXIVT_NODE_ID);
> > +server = GETFIELD(IODA2_LXIVT_SERVER, val);
> > +prio = GETFIELD(IODA2_LXIVT_PRIORITY, val);
> > +
> > +/*
> > + * The low order 2 bits are the link pointer (Type II interrupts).
> 
> I don't think we've currently implemented link pointers in our xics
> code.  Do we need to do that?

Not until somebody uses them (other than pHyp).

> > + * Shift back to get a valid IRQ server.
> > + */
> > +server >>= 2;
> > +
> > +ics_simple_write_xive(ics, idx, server, prio, prio);
> > +}
> > 

Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-18 Thread David Gibson
On Thu, Jun 28, 2018 at 10:36:32AM +0200, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> This is a model of the PCIe Host Bridge (PHB3) found on a Power8
> processor. It includes the PowerBus logic interface (PBCQ), IOMMU
> support, PCIe root complex, the XICS MSI and LSI interrupt sources.
> 
> The Power8 processor comes in different flavors: Venice, Murano,
> Naple, each having a different number of PHBs. All are initialized but
> the machine only needs one PCI bus to plug a couple of devices (net,
> storage, usbs). The other busses are provided to test complex
> configurations. Some platforms, like the Firestone, can also couple
> PHBs on the first chip to provide more bandwidth but this is too
> specific to model in QEMU.
> 
> No default device layout is provided and PCI devices can be added on
> any of the available PCI busses (pci.0..2 on a Power8 chip) using
> command line options such as :
> 
>   -device e1000,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pci.0,addr=0x2
>   -netdev 
> bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
> 
>   -device megasas,id=scsi0,bus=pci.0,addr=0x1
>   -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
>   -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
> 
> Multi chip is supported, each chip adding its set of PHB3 controllers
> and its PCI busses. The model doesn't emulate the EEH error handling
> (and may never do).
> 
> Signed-off-by: Benjamin Herrenschmidt 
> [clg: rewrote the QOM models
>   misc fixes
>   lots of love and care]
> Signed-off-by: Cédric Le Goater 
> ---
>  default-configs/ppc64-softmmu.mak   |1 +
>  include/hw/pci-host/pnv_phb3.h  |  161 +
>  include/hw/pci-host/pnv_phb3_regs.h |  467 ++
>  include/hw/ppc/pnv.h|   22 +
>  include/hw/ppc/pnv_xscom.h  |9 +
>  include/hw/ppc/xics.h   |1 +
>  hw/intc/xics.c  |2 +-
>  hw/pci-host/pnv_phb3.c  | 1219 
> +++
>  hw/pci-host/pnv_phb3_msi.c  |  316 +
>  hw/pci-host/pnv_phb3_pbcq.c |  347 ++
>  hw/ppc/pnv.c|   75 ++-
>  hw/ppc/pnv_xscom.c  |6 +-
>  hw/pci-host/Makefile.objs   |1 +
>  13 files changed, 2622 insertions(+), 5 deletions(-)
>  create mode 100644 include/hw/pci-host/pnv_phb3.h
>  create mode 100644 include/hw/pci-host/pnv_phb3_regs.h
>  create mode 100644 hw/pci-host/pnv_phb3.c
>  create mode 100644 hw/pci-host/pnv_phb3_msi.c
>  create mode 100644 hw/pci-host/pnv_phb3_pbcq.c
> 
> diff --git a/default-configs/ppc64-softmmu.mak 
> b/default-configs/ppc64-softmmu.mak
> index b94af6c7c62a..deebba2b044a 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -9,6 +9,7 @@ CONFIG_IPMI=y
>  CONFIG_IPMI_LOCAL=y
>  CONFIG_IPMI_EXTERN=y
>  CONFIG_ISA_IPMI_BT=y
> +CONFIG_PCIE_PORT=y
>  
>  # For pSeries
>  CONFIG_PSERIES=y
> diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
> new file mode 100644
> index ..459994b751ca
> --- /dev/null
> +++ b/include/hw/pci-host/pnv_phb3.h
> @@ -0,0 +1,161 @@
> +/*
> + * QEMU PowerPC PowerNV PHB3 model
> + *
> + * Copyright (c) 2014-2018, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PCI_HOST_PNV_PHB3_H
> +#define PCI_HOST_PNV_PHB3_H
> +
> +#include "hw/pci/pci_host.h"
> +#include "hw/ppc/xics.h"
> +
> +typedef struct PnvPHB3 PnvPHB3;
> +typedef struct PnvChip PnvChip;
> +
> +/*
> + * PHB3 XICS Source for MSIs
> + */
> +#define TYPE_PHB3_MSI "phb3-msi"
> +#define PHB3_MSI(obj) OBJECT_CHECK(Phb3MsiState, (obj), TYPE_PHB3_MSI)
> +
> +#define PHB3_MAX_MSI 2048
> +
> +typedef struct Phb3MsiState {
> +ICSState ics;
> +
> +PnvPHB3 *phb;
> +uint64_t rba[PHB3_MAX_MSI / 64];
> +uint32_t rba_sum;
> +} Phb3MsiState;
> +
> +void pnv_phb3_msi_update_config(Phb3MsiState *msis, uint32_t base,
> +uint32_t count);
> +void pnv_phb3_msi_send(Phb3MsiState *msis, uint64_t addr, uint16_t data,
> +   int32_t dev_pe);
> +void pnv_phb3_msi_ffi(Phb3MsiState *msis, uint64_t val);
> +
> +
> +/* We have one such address space wrapper per possible device
> + * under the PHB since they need to be assigned statically at
> + * qemu device creation time. The relationship to a PE is done
> + * later dynamically. This means we can potentially create a lot
> + * of these guys. Q35 stores them as some kind of radix tree but
> + * we never really need to do fast lookups so instead we simply
> + * keep a QLIST of them for now, we can add the radix if needed
> + * later on.
> + *
> + * We do cache the PE number to speed things up a bit though.
> + */
> +typedef struct PnvPhb3DMASpace {
> +PCIBus *bus;
> +

Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-18 Thread David Gibson
On Mon, Jul 09, 2018 at 09:22:59AM +0200, Cédric Le Goater wrote:
> On 06/28/2018 10:36 AM, Cédric Le Goater wrote:
> > From: Benjamin Herrenschmidt 
> > 
> > This is a model of the PCIe Host Bridge (PHB3) found on a Power8
> > processor. It includes the PowerBus logic interface (PBCQ), IOMMU
> > support, PCIe root complex, the XICS MSI and LSI interrupt sources.
> > 
> > The Power8 processor comes in different flavors: Venice, Murano,
> > Naple, each having a different number of PHBs. All are initialized but
> > the machine only needs one PCI bus to plug a couple of devices (net,
> > storage, usbs). The other busses are provided to test complex
> > configurations. Some platforms, like the Firestone, can also couple
> > PHBs on the first chip to provide more bandwidth but this is too
> > specific to model in QEMU.
> > 
> > No default device layout is provided and PCI devices can be added on
> > any of the available PCI busses (pci.0..2 on a Power8 chip) using
> > command line options such as :
> > 
> >   -device e1000,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pci.0,addr=0x2
> >   -netdev 
> > bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
> > 
> >   -device megasas,id=scsi0,bus=pci.0,addr=0x1
> >   -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
> >   -device 
> > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
> > 
> > Multi chip is supported, each chip adding its set of PHB3 controllers
> > and its PCI busses. The model doesn't emulate the EEH error handling
> > (and may never do).
> Some comments,
> 
> The model should initialize *two* default PHBs per chip for all 
> processor flavors. Extra ones would come with user creatable PHBs,
> which can be added later.
> 
> Can someone take a closer look at pnv_phb3_pci_create() ? This is 
> where the PCI busses, bridges, etc are created.

I had a bit of a look, and made a comment, though I think it's mostly
sane.

> PBCQ is a logic unit acting as a bridge on the PowerBus but the model 
> might be adding some extra unnecessary complexity. may be the PHB3 and 
> the PBCQ models could be merged. This is minor I think.
> 
> In which sub-tree should this patchset be merged ? PCI or PPC ?

Through the ppc tree, I think.  Although review by the PCI folks would
be great if we can get it.

> 
> Thanks,
> 
> C.  
>  
> 
> > Signed-off-by: Benjamin Herrenschmidt 
> > [clg: rewrote the QOM models
> >   misc fixes
> >   lots of love and care]
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  default-configs/ppc64-softmmu.mak   |1 +
> >  include/hw/pci-host/pnv_phb3.h  |  161 +
> >  include/hw/pci-host/pnv_phb3_regs.h |  467 ++
> >  include/hw/ppc/pnv.h|   22 +
> >  include/hw/ppc/pnv_xscom.h  |9 +
> >  include/hw/ppc/xics.h   |1 +
> >  hw/intc/xics.c  |2 +-
> >  hw/pci-host/pnv_phb3.c  | 1219 
> > +++
> >  hw/pci-host/pnv_phb3_msi.c  |  316 +
> >  hw/pci-host/pnv_phb3_pbcq.c |  347 ++
> >  hw/ppc/pnv.c|   75 ++-
> >  hw/ppc/pnv_xscom.c  |6 +-
> >  hw/pci-host/Makefile.objs   |1 +
> >  13 files changed, 2622 insertions(+), 5 deletions(-)
> >  create mode 100644 include/hw/pci-host/pnv_phb3.h
> >  create mode 100644 include/hw/pci-host/pnv_phb3_regs.h
> >  create mode 100644 hw/pci-host/pnv_phb3.c
> >  create mode 100644 hw/pci-host/pnv_phb3_msi.c
> >  create mode 100644 hw/pci-host/pnv_phb3_pbcq.c
> > 
> > diff --git a/default-configs/ppc64-softmmu.mak 
> > b/default-configs/ppc64-softmmu.mak
> > index b94af6c7c62a..deebba2b044a 100644
> > --- a/default-configs/ppc64-softmmu.mak
> > +++ b/default-configs/ppc64-softmmu.mak
> > @@ -9,6 +9,7 @@ CONFIG_IPMI=y
> >  CONFIG_IPMI_LOCAL=y
> >  CONFIG_IPMI_EXTERN=y
> >  CONFIG_ISA_IPMI_BT=y
> > +CONFIG_PCIE_PORT=y
> >  
> >  # For pSeries
> >  CONFIG_PSERIES=y
> > diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
> > new file mode 100644
> > index ..459994b751ca
> > --- /dev/null
> > +++ b/include/hw/pci-host/pnv_phb3.h
> > @@ -0,0 +1,161 @@
> > +/*
> > + * QEMU PowerPC PowerNV PHB3 model
> > + *
> > + * Copyright (c) 2014-2018, IBM Corporation.
> > + *
> > + * This code is licensed under the GPL version 2 or later. See the
> > + * COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef PCI_HOST_PNV_PHB3_H
> > +#define PCI_HOST_PNV_PHB3_H
> > +
> > +#include "hw/pci/pci_host.h"
> > +#include "hw/ppc/xics.h"
> > +
> > +typedef struct PnvPHB3 PnvPHB3;
> > +typedef struct PnvChip PnvChip;
> > +
> > +/*
> > + * PHB3 XICS Source for MSIs
> > + */
> > +#define TYPE_PHB3_MSI "phb3-msi"
> > +#define PHB3_MSI(obj) OBJECT_CHECK(Phb3MsiState, (obj), TYPE_PHB3_MSI)
> > +
> > +#define PHB3_MAX_MSI 2048
> > +
> > +typedef struct Phb3MsiState {
> > +ICSState ics;
> 

Re: [Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-07-09 Thread Cédric Le Goater
On 06/28/2018 10:36 AM, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> This is a model of the PCIe Host Bridge (PHB3) found on a Power8
> processor. It includes the PowerBus logic interface (PBCQ), IOMMU
> support, PCIe root complex, the XICS MSI and LSI interrupt sources.
> 
> The Power8 processor comes in different flavors: Venice, Murano,
> Naple, each having a different number of PHBs. All are initialized but
> the machine only needs one PCI bus to plug a couple of devices (net,
> storage, usbs). The other busses are provided to test complex
> configurations. Some platforms, like the Firestone, can also couple
> PHBs on the first chip to provide more bandwidth but this is too
> specific to model in QEMU.
> 
> No default device layout is provided and PCI devices can be added on
> any of the available PCI busses (pci.0..2 on a Power8 chip) using
> command line options such as :
> 
>   -device e1000,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pci.0,addr=0x2
>   -netdev 
> bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
> 
>   -device megasas,id=scsi0,bus=pci.0,addr=0x1
>   -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
>   -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
> 
> Multi chip is supported, each chip adding its set of PHB3 controllers
> and its PCI busses. The model doesn't emulate the EEH error handling
> (and may never do).
Some comments,

The model should initialize *two* default PHBs per chip for all 
processor flavors. Extra ones would come with user creatable PHBs,
which can be added later.

Can someone take a closer look at pnv_phb3_pci_create() ? This is 
where the PCI busses, bridges, etc are created.

PBCQ is a logic unit acting as a bridge on the PowerBus but the model 
might be adding some extra unnecessary complexity. may be the PHB3 and 
the PBCQ models could be merged. This is minor I think.

In which sub-tree should this patchset be merged ? PCI or PPC ?

Thanks,

C.  
 

> Signed-off-by: Benjamin Herrenschmidt 
> [clg: rewrote the QOM models
>   misc fixes
>   lots of love and care]
> Signed-off-by: Cédric Le Goater 
> ---
>  default-configs/ppc64-softmmu.mak   |1 +
>  include/hw/pci-host/pnv_phb3.h  |  161 +
>  include/hw/pci-host/pnv_phb3_regs.h |  467 ++
>  include/hw/ppc/pnv.h|   22 +
>  include/hw/ppc/pnv_xscom.h  |9 +
>  include/hw/ppc/xics.h   |1 +
>  hw/intc/xics.c  |2 +-
>  hw/pci-host/pnv_phb3.c  | 1219 
> +++
>  hw/pci-host/pnv_phb3_msi.c  |  316 +
>  hw/pci-host/pnv_phb3_pbcq.c |  347 ++
>  hw/ppc/pnv.c|   75 ++-
>  hw/ppc/pnv_xscom.c  |6 +-
>  hw/pci-host/Makefile.objs   |1 +
>  13 files changed, 2622 insertions(+), 5 deletions(-)
>  create mode 100644 include/hw/pci-host/pnv_phb3.h
>  create mode 100644 include/hw/pci-host/pnv_phb3_regs.h
>  create mode 100644 hw/pci-host/pnv_phb3.c
>  create mode 100644 hw/pci-host/pnv_phb3_msi.c
>  create mode 100644 hw/pci-host/pnv_phb3_pbcq.c
> 
> diff --git a/default-configs/ppc64-softmmu.mak 
> b/default-configs/ppc64-softmmu.mak
> index b94af6c7c62a..deebba2b044a 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -9,6 +9,7 @@ CONFIG_IPMI=y
>  CONFIG_IPMI_LOCAL=y
>  CONFIG_IPMI_EXTERN=y
>  CONFIG_ISA_IPMI_BT=y
> +CONFIG_PCIE_PORT=y
>  
>  # For pSeries
>  CONFIG_PSERIES=y
> diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
> new file mode 100644
> index ..459994b751ca
> --- /dev/null
> +++ b/include/hw/pci-host/pnv_phb3.h
> @@ -0,0 +1,161 @@
> +/*
> + * QEMU PowerPC PowerNV PHB3 model
> + *
> + * Copyright (c) 2014-2018, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PCI_HOST_PNV_PHB3_H
> +#define PCI_HOST_PNV_PHB3_H
> +
> +#include "hw/pci/pci_host.h"
> +#include "hw/ppc/xics.h"
> +
> +typedef struct PnvPHB3 PnvPHB3;
> +typedef struct PnvChip PnvChip;
> +
> +/*
> + * PHB3 XICS Source for MSIs
> + */
> +#define TYPE_PHB3_MSI "phb3-msi"
> +#define PHB3_MSI(obj) OBJECT_CHECK(Phb3MsiState, (obj), TYPE_PHB3_MSI)
> +
> +#define PHB3_MAX_MSI 2048
> +
> +typedef struct Phb3MsiState {
> +ICSState ics;
> +
> +PnvPHB3 *phb;
> +uint64_t rba[PHB3_MAX_MSI / 64];
> +uint32_t rba_sum;
> +} Phb3MsiState;
> +
> +void pnv_phb3_msi_update_config(Phb3MsiState *msis, uint32_t base,
> +uint32_t count);
> +void pnv_phb3_msi_send(Phb3MsiState *msis, uint64_t addr, uint16_t data,
> +   int32_t dev_pe);
> +void pnv_phb3_msi_ffi(Phb3MsiState *msis, uint64_t val);
> +
> +
> +/* We have one such address space wrapper per possible device
> + * 

[Qemu-devel] [PATCH v2 1/2] ppc/pnv: Add model for Power8 PHB3 PCIe Host bridge

2018-06-28 Thread Cédric Le Goater
From: Benjamin Herrenschmidt 

This is a model of the PCIe Host Bridge (PHB3) found on a Power8
processor. It includes the PowerBus logic interface (PBCQ), IOMMU
support, PCIe root complex, the XICS MSI and LSI interrupt sources.

The Power8 processor comes in different flavors: Venice, Murano,
Naple, each having a different number of PHBs. All are initialized but
the machine only needs one PCI bus to plug a couple of devices (net,
storage, usbs). The other busses are provided to test complex
configurations. Some platforms, like the Firestone, can also couple
PHBs on the first chip to provide more bandwidth but this is too
specific to model in QEMU.

No default device layout is provided and PCI devices can be added on
any of the available PCI busses (pci.0..2 on a Power8 chip) using
command line options such as :

  -device e1000,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pci.0,addr=0x2
  -netdev 
bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0

  -device megasas,id=scsi0,bus=pci.0,addr=0x1
  -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
  -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2

Multi chip is supported, each chip adding its set of PHB3 controllers
and its PCI busses. The model doesn't emulate the EEH error handling
(and may never do).

Signed-off-by: Benjamin Herrenschmidt 
[clg: rewrote the QOM models
  misc fixes
  lots of love and care]
Signed-off-by: Cédric Le Goater 
---
 default-configs/ppc64-softmmu.mak   |1 +
 include/hw/pci-host/pnv_phb3.h  |  161 +
 include/hw/pci-host/pnv_phb3_regs.h |  467 ++
 include/hw/ppc/pnv.h|   22 +
 include/hw/ppc/pnv_xscom.h  |9 +
 include/hw/ppc/xics.h   |1 +
 hw/intc/xics.c  |2 +-
 hw/pci-host/pnv_phb3.c  | 1219 +++
 hw/pci-host/pnv_phb3_msi.c  |  316 +
 hw/pci-host/pnv_phb3_pbcq.c |  347 ++
 hw/ppc/pnv.c|   75 ++-
 hw/ppc/pnv_xscom.c  |6 +-
 hw/pci-host/Makefile.objs   |1 +
 13 files changed, 2622 insertions(+), 5 deletions(-)
 create mode 100644 include/hw/pci-host/pnv_phb3.h
 create mode 100644 include/hw/pci-host/pnv_phb3_regs.h
 create mode 100644 hw/pci-host/pnv_phb3.c
 create mode 100644 hw/pci-host/pnv_phb3_msi.c
 create mode 100644 hw/pci-host/pnv_phb3_pbcq.c

diff --git a/default-configs/ppc64-softmmu.mak 
b/default-configs/ppc64-softmmu.mak
index b94af6c7c62a..deebba2b044a 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -9,6 +9,7 @@ CONFIG_IPMI=y
 CONFIG_IPMI_LOCAL=y
 CONFIG_IPMI_EXTERN=y
 CONFIG_ISA_IPMI_BT=y
+CONFIG_PCIE_PORT=y
 
 # For pSeries
 CONFIG_PSERIES=y
diff --git a/include/hw/pci-host/pnv_phb3.h b/include/hw/pci-host/pnv_phb3.h
new file mode 100644
index ..459994b751ca
--- /dev/null
+++ b/include/hw/pci-host/pnv_phb3.h
@@ -0,0 +1,161 @@
+/*
+ * QEMU PowerPC PowerNV PHB3 model
+ *
+ * Copyright (c) 2014-2018, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PCI_HOST_PNV_PHB3_H
+#define PCI_HOST_PNV_PHB3_H
+
+#include "hw/pci/pci_host.h"
+#include "hw/ppc/xics.h"
+
+typedef struct PnvPHB3 PnvPHB3;
+typedef struct PnvChip PnvChip;
+
+/*
+ * PHB3 XICS Source for MSIs
+ */
+#define TYPE_PHB3_MSI "phb3-msi"
+#define PHB3_MSI(obj) OBJECT_CHECK(Phb3MsiState, (obj), TYPE_PHB3_MSI)
+
+#define PHB3_MAX_MSI 2048
+
+typedef struct Phb3MsiState {
+ICSState ics;
+
+PnvPHB3 *phb;
+uint64_t rba[PHB3_MAX_MSI / 64];
+uint32_t rba_sum;
+} Phb3MsiState;
+
+void pnv_phb3_msi_update_config(Phb3MsiState *msis, uint32_t base,
+uint32_t count);
+void pnv_phb3_msi_send(Phb3MsiState *msis, uint64_t addr, uint16_t data,
+   int32_t dev_pe);
+void pnv_phb3_msi_ffi(Phb3MsiState *msis, uint64_t val);
+
+
+/* We have one such address space wrapper per possible device
+ * under the PHB since they need to be assigned statically at
+ * qemu device creation time. The relationship to a PE is done
+ * later dynamically. This means we can potentially create a lot
+ * of these guys. Q35 stores them as some kind of radix tree but
+ * we never really need to do fast lookups so instead we simply
+ * keep a QLIST of them for now, we can add the radix if needed
+ * later on.
+ *
+ * We do cache the PE number to speed things up a bit though.
+ */
+typedef struct PnvPhb3DMASpace {
+PCIBus *bus;
+uint8_t devfn;
+int pe_num; /* Cached PE number */
+#define PHB_INVALID_PE (-1)
+PnvPHB3 *phb;
+AddressSpace dma_as;
+IOMMUMemoryRegion dma_mr;
+MemoryRegion msi32_mr;
+MemoryRegion msi64_mr;
+bool msi32_mapped;
+bool msi64_mapped;
+QLIST_ENTRY(PnvPhb3DMASpace) list;
+} PnvPhb3DMASpace;