On 12/11/18 1:38 AM, David Gibson wrote: > On Mon, Dec 10, 2018 at 08:53:17AM +0100, Cédric Le Goater wrote: >> On 12/10/18 7:39 AM, David Gibson wrote: >>> On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote: >>>> The XIVE interface for the guest is described in the device tree under >>>> the "interrupt-controller" node. A couple of new properties are >>>> specific to XIVE : >>>> >>>> - "reg" >>>> >>>> contains the base address and size of the thread interrupt >>>> managnement areas (TIMA), for the User level and for the Guest OS >>>> level. Only the Guest OS level is taken into account today. >>>> >>>> - "ibm,xive-eq-sizes" >>>> >>>> the size of the event queues. One cell per size supported, contains >>>> log2 of size, in ascending order. >>>> >>>> - "ibm,xive-lisn-ranges" >>>> >>>> the IRQ interrupt number ranges assigned to the guest for the IPIs. >>>> >>>> and also under the root node : >>>> >>>> - "ibm,plat-res-int-priorities" >>>> >>>> contains a list of priorities that the hypervisor has reserved for >>>> its own use. OPAL uses the priority 7 queue to automatically >>>> escalate interrupts for all other queues (DD2.X POWER9). So only >>>> priorities [0..6] are allowed for the guest. >>>> >>>> Extend the sPAPR IRQ backend with a new handler to populate the DT >>>> with the appropriate "interrupt-controller" node. >>>> >>>> Signed-off-by: Cédric Le Goater <c...@kaod.org> >>>> --- >>>> include/hw/ppc/spapr_irq.h | 2 ++ >>>> include/hw/ppc/spapr_xive.h | 2 ++ >>>> include/hw/ppc/xics.h | 4 +-- >>>> hw/intc/spapr_xive.c | 64 +++++++++++++++++++++++++++++++++++++ >>>> hw/intc/xics_spapr.c | 3 +- >>>> hw/ppc/spapr.c | 3 +- >>>> hw/ppc/spapr_irq.c | 3 ++ >>>> 7 files changed, 77 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h >>>> index 23cdb51b879e..e51e9f052f63 100644 >>>> --- a/include/hw/ppc/spapr_irq.h >>>> +++ b/include/hw/ppc/spapr_irq.h >>>> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq { >>>> void (*free)(sPAPRMachineState *spapr, int irq, int num); >>>> qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq); >>>> void (*print_info)(sPAPRMachineState *spapr, Monitor *mon); >>>> + void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers, >>>> + void *fdt, uint32_t phandle); >>>> } sPAPRIrq; >>>> >>>> extern sPAPRIrq spapr_irq_xics; >>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h >>>> index 9506a8f4d10a..728a5e8dc163 100644 >>>> --- a/include/hw/ppc/spapr_xive.h >>>> +++ b/include/hw/ppc/spapr_xive.h >>>> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn); >>>> typedef struct sPAPRMachineState sPAPRMachineState; >>>> >>>> void spapr_xive_hcall_init(sPAPRMachineState *spapr); >>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void >>>> *fdt, >>>> + uint32_t phandle); >>>> >>>> #endif /* PPC_SPAPR_XIVE_H */ >>>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h >>>> index 9958443d1984..14afda198cdb 100644 >>>> --- a/include/hw/ppc/xics.h >>>> +++ b/include/hw/ppc/xics.h >>>> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass { >>>> ICPState *(*icp_get)(XICSFabric *xi, int server); >>>> } XICSFabricClass; >>>> >>>> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle); >>>> - >>>> ICPState *xics_icp_get(XICSFabric *xi, int server); >>>> >>>> /* Internal XICS interfaces */ >>>> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss); >>>> >>>> typedef struct sPAPRMachineState sPAPRMachineState; >>>> >>>> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void >>>> *fdt, >>>> + uint32_t phandle); >>>> int xics_kvm_init(sPAPRMachineState *spapr, Error **errp); >>>> void xics_spapr_init(sPAPRMachineState *spapr); >>>> >>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c >>>> index 982ac6e17051..a6d854b07690 100644 >>>> --- a/hw/intc/spapr_xive.c >>>> +++ b/hw/intc/spapr_xive.c >>>> @@ -14,6 +14,7 @@ >>>> #include "target/ppc/cpu.h" >>>> #include "sysemu/cpus.h" >>>> #include "monitor/monitor.h" >>>> +#include "hw/ppc/fdt.h" >>>> #include "hw/ppc/spapr.h" >>>> #include "hw/ppc/spapr_xive.h" >>>> #include "hw/ppc/xive.h" >>>> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr) >>>> spapr_register_hypercall(H_INT_SYNC, h_int_sync); >>>> spapr_register_hypercall(H_INT_RESET, h_int_reset); >>>> } >>>> + >>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void >>>> *fdt, >>>> + uint32_t phandle) >>>> +{ >>>> + sPAPRXive *xive = spapr->xive; >>>> + int node; >>>> + uint64_t timas[2 * 2]; >>>> + /* Interrupt number ranges for the IPIs */ >>>> + uint32_t lisn_ranges[] = { >>>> + cpu_to_be32(0), >>>> + cpu_to_be32(nr_servers), >>>> + }; >>>> + uint32_t eq_sizes[] = { >>>> + cpu_to_be32(12), /* 4K */ >>>> + cpu_to_be32(16), /* 64K */ >>>> + cpu_to_be32(21), /* 2M */ >>>> + cpu_to_be32(24), /* 16M */ >>> >>> For KVM, are we going to need to clamp this list based on the >>> pagesizes the guest can use? >> >> I would say so. Is there a KVM service for that ? > > I'm not sure what you mean by a KVM service here.
I meant a routine giving a list of the possible backing pagesizes. >> Today, the OS scans the list and picks the size fitting its current >> PAGE_SIZE/SHIFT. But I suppose it would be better to advertise >> only the page sizes that QEMU/KVM supports. Or should we play safe >> and only export : 4K, 64K ? > > So, I'm guessing the limitation here is that when we use the KVM XIVE > acceleration, the EQ will need to be host-contiguous? The EQ should be a single page (from the specs). So we don't have a contiguity problem. > So, we can't have EQs larger than the backing pagesize for guest > memory. > > We try to avoid having guest visible differences based on whether > we're using KVM or not, so we don't want to change this list depending > on whether KVM is active etc. - or on whether we're backing the guest > with hugepages. > > We could reuse the cap-hpt-max-page-size machine parameter which also > relates to backing page size, but that might be confusing since it's > explicitly about HPT only whereas this limitation would apply to RPT > guests too, IIUC. > > I think our best bet is probably to limit to 64kiB queues > unconditionally. OK. That's the default. We can refine this list of page sizes later on when we introduce KVM support if needed. > AIUI, that still gives us 8192 slots in the queue, The entries are 32bits. So that's 16384 entries. > which is enough to store one of every possible irq, which should be > enough. yes. I don't think Linux measures how much entries are dequeued at once. It would give us a feel of how much pressure these EQs can sustain. C. > There's still an issue if we have a 4kiB pagesize host kernel, but we > could just disable kernel_irqchip in that situation, since we don't > expect people to be using that much in practice. > >>>> + }; >>>> + /* The following array is in sync with the reserved priorities >>>> + * defined by the 'spapr_xive_priority_is_reserved' routine. >>>> + */ >>>> + uint32_t plat_res_int_priorities[] = { >>>> + cpu_to_be32(7), /* start */ >>>> + cpu_to_be32(0xf8), /* count */ >>>> + }; >>>> + gchar *nodename; >>>> + >>>> + /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */ >>>> + timas[0] = cpu_to_be64(xive->tm_base + >>>> + XIVE_TM_USER_PAGE * (1ull << TM_SHIFT)); >>>> + timas[1] = cpu_to_be64(1ull << TM_SHIFT); >>>> + timas[2] = cpu_to_be64(xive->tm_base + >>>> + XIVE_TM_OS_PAGE * (1ull << TM_SHIFT)); >>>> + timas[3] = cpu_to_be64(1ull << TM_SHIFT); >>> >>> So this gives the MMIO address of the TIMA. >> >> Yes. It is considered to be a fixed "address" >> >>> Where does the guest get the MMIO address for the ESB pages from? >> >> with the hcall H_INT_GET_SOURCE_INFO. > > Ah, right. >