Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode

Cédric Le Goater Tue, 11 Dec 2018 01:07:30 -0800

On 12/11/18 1:38 AM, David Gibson wrote:
> On Mon, Dec 10, 2018 at 08:53:17AM +0100, Cédric Le Goater wrote:
>> On 12/10/18 7:39 AM, David Gibson wrote:
>>> On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote:
>>>> The XIVE interface for the guest is described in the device tree under
>>>> the "interrupt-controller" node. A couple of new properties are
>>>> specific to XIVE :
>>>>
>>>>  - "reg"
>>>>
>>>>    contains the base address and size of the thread interrupt
>>>>    managnement areas (TIMA), for the User level and for the Guest OS
>>>>    level. Only the Guest OS level is taken into account today.
>>>>
>>>>  - "ibm,xive-eq-sizes"
>>>>
>>>>    the size of the event queues. One cell per size supported, contains
>>>>    log2 of size, in ascending order.
>>>>
>>>>  - "ibm,xive-lisn-ranges"
>>>>
>>>>    the IRQ interrupt number ranges assigned to the guest for the IPIs.
>>>>
>>>> and also under the root node :
>>>>
>>>>  - "ibm,plat-res-int-priorities"
>>>>
>>>>    contains a list of priorities that the hypervisor has reserved for
>>>>    its own use. OPAL uses the priority 7 queue to automatically
>>>>    escalate interrupts for all other queues (DD2.X POWER9). So only
>>>>    priorities [0..6] are allowed for the guest.
>>>>
>>>> Extend the sPAPR IRQ backend with a new handler to populate the DT
>>>> with the appropriate "interrupt-controller" node.
>>>>
>>>> Signed-off-by: Cédric Le Goater <c...@kaod.org>
>>>> ---
>>>>  include/hw/ppc/spapr_irq.h  |  2 ++
>>>>  include/hw/ppc/spapr_xive.h |  2 ++
>>>>  include/hw/ppc/xics.h       |  4 +--
>>>>  hw/intc/spapr_xive.c        | 64 +++++++++++++++++++++++++++++++++++++
>>>>  hw/intc/xics_spapr.c        |  3 +-
>>>>  hw/ppc/spapr.c              |  3 +-
>>>>  hw/ppc/spapr_irq.c          |  3 ++
>>>>  7 files changed, 77 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>>>> index 23cdb51b879e..e51e9f052f63 100644
>>>> --- a/include/hw/ppc/spapr_irq.h
>>>> +++ b/include/hw/ppc/spapr_irq.h
>>>> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
>>>>      void (*free)(sPAPRMachineState *spapr, int irq, int num);
>>>>      qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
>>>>      void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
>>>> +    void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
>>>> +                        void *fdt, uint32_t phandle);
>>>>  } sPAPRIrq;
>>>>  
>>>>  extern sPAPRIrq spapr_irq_xics;
>>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>>>> index 9506a8f4d10a..728a5e8dc163 100644
>>>> --- a/include/hw/ppc/spapr_xive.h
>>>> +++ b/include/hw/ppc/spapr_xive.h
>>>> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
>>>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>>>  
>>>>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void 
>>>> *fdt,
>>>> +                   uint32_t phandle);
>>>>  
>>>>  #endif /* PPC_SPAPR_XIVE_H */
>>>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>>>> index 9958443d1984..14afda198cdb 100644
>>>> --- a/include/hw/ppc/xics.h
>>>> +++ b/include/hw/ppc/xics.h
>>>> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
>>>>      ICPState *(*icp_get)(XICSFabric *xi, int server);
>>>>  } XICSFabricClass;
>>>>  
>>>> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>>>> -
>>>>  ICPState *xics_icp_get(XICSFabric *xi, int server);
>>>>  
>>>>  /* Internal XICS interfaces */
>>>> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
>>>>  
>>>>  typedef struct sPAPRMachineState sPAPRMachineState;
>>>>  
>>>> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void 
>>>> *fdt,
>>>> +                   uint32_t phandle);
>>>>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>>>>  void xics_spapr_init(sPAPRMachineState *spapr);
>>>>  
>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>> index 982ac6e17051..a6d854b07690 100644
>>>> --- a/hw/intc/spapr_xive.c
>>>> +++ b/hw/intc/spapr_xive.c
>>>> @@ -14,6 +14,7 @@
>>>>  #include "target/ppc/cpu.h"
>>>>  #include "sysemu/cpus.h"
>>>>  #include "monitor/monitor.h"
>>>> +#include "hw/ppc/fdt.h"
>>>>  #include "hw/ppc/spapr.h"
>>>>  #include "hw/ppc/spapr_xive.h"
>>>>  #include "hw/ppc/xive.h"
>>>> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr)
>>>>      spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>>>>      spapr_register_hypercall(H_INT_RESET, h_int_reset);
>>>>  }
>>>> +
>>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void 
>>>> *fdt,
>>>> +                   uint32_t phandle)
>>>> +{
>>>> +    sPAPRXive *xive = spapr->xive;
>>>> +    int node;
>>>> +    uint64_t timas[2 * 2];
>>>> +    /* Interrupt number ranges for the IPIs */
>>>> +    uint32_t lisn_ranges[] = {
>>>> +        cpu_to_be32(0),
>>>> +        cpu_to_be32(nr_servers),
>>>> +    };
>>>> +    uint32_t eq_sizes[] = {
>>>> +        cpu_to_be32(12), /* 4K */
>>>> +        cpu_to_be32(16), /* 64K */
>>>> +        cpu_to_be32(21), /* 2M */
>>>> +        cpu_to_be32(24), /* 16M */
>>>
>>> For KVM, are we going to need to clamp this list based on the
>>> pagesizes the guest can use?
>>
>> I would say so. Is there a KVM service for that ?
> 
> I'm not sure what you mean by a KVM service here.


I meant a routine giving a list of the possible backing pagesizes.
 
>> Today, the OS scans the list and picks the size fitting its current 
>> PAGE_SIZE/SHIFT. But I suppose it would be better to advertise 
>> only the page sizes that QEMU/KVM supports. Or should we play safe 
>> and only export : 4K, 64K ? 
> 
> So, I'm guessing the limitation here is that when we use the KVM XIVE
> acceleration, the EQ will need to be host-contiguous?  

The EQ should be a single page (from the specs). So we don't have 
a contiguity problem. 

> So, we can't have EQs larger than the backing pagesize for guest 
> memory.
>
 > We try to avoid having guest visible differences based on whether
> we're using KVM or not, so we don't want to change this list depending
> on whether KVM is active etc. - or on whether we're backing the guest
> with hugepages.
> 
> We could reuse the cap-hpt-max-page-size machine parameter which also
> relates to backing page size, but that might be confusing since it's
> explicitly about HPT only whereas this limitation would apply to RPT
> guests too, IIUC.
> 
> I think our best bet is probably to limit to 64kiB queues
> unconditionally.  

OK. That's the default. We can refine this list of page sizes later on
when we introduce KVM support if needed.

> AIUI, that still gives us 8192 slots in the queue,

The entries are 32bits. So that's 16384 entries. 

> which is enough to store one of every possible irq, which should be
> enough.

yes. 

I don't think Linux measures how much entries are dequeued at once.
It would give us a feel of how much pressure these EQs can sustain.

C.
 
> There's still an issue if we have a 4kiB pagesize host kernel, but we
> could just disable kernel_irqchip in that situation, since we don't
> expect people to be using that much in practice.
> 
>>>> +    };
>>>> +    /* The following array is in sync with the reserved priorities
>>>> +     * defined by the 'spapr_xive_priority_is_reserved' routine.
>>>> +     */
>>>> +    uint32_t plat_res_int_priorities[] = {
>>>> +        cpu_to_be32(7),    /* start */
>>>> +        cpu_to_be32(0xf8), /* count */
>>>> +    };
>>>> +    gchar *nodename;
>>>> +
>>>> +    /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
>>>> +    timas[0] = cpu_to_be64(xive->tm_base +
>>>> +                           XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
>>>> +    timas[1] = cpu_to_be64(1ull << TM_SHIFT);
>>>> +    timas[2] = cpu_to_be64(xive->tm_base +
>>>> +                           XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
>>>> +    timas[3] = cpu_to_be64(1ull << TM_SHIFT);
>>>
>>> So this gives the MMIO address of the TIMA.  
>>
>> Yes. It is considered to be a fixed "address" 
>>
>>> Where does the guest get the MMIO address for the ESB pages from?
>>
>> with the hcall H_INT_GET_SOURCE_INFO.
> 
> Ah, right.
>

Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode

Reply via email to