Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources

2017-09-28 Thread Benjamin Herrenschmidt
On Wed, 2017-09-20 at 15:05 +0200, Cédric Le Goater wrote:
> > > +/*
> > > + * XIVE Interrupt Source MMIOs
> > > + */
> > > +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned 
> > > size)
> > > +{
> > > +sPAPRXive *xive = SPAPR_XIVE(opaque);
> > > +uint32_t offset = addr & 0xF00;
> > > +uint32_t srcno = addr >> xive->esb_shift;
> > > +XiveIVE *ive;
> > > +uint64_t ret = -1;
> > > +
> > > +ive = spapr_xive_get_ive(xive, srcno);
> > > +if (!ive || !(ive->w & IVE_VALID))  {
> > > +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
> > > +goto out;
> > 
> > Since there's a whole (4k) page for each source, I wonder if we should
> > actually map each one as a separate MMIO region to allow us to tweak
> > the mappings more flexibly
> 
> yes we could have a subregion for each source. In that case, 
> we should also handle IVE_VALID properly. That will require 
> a specific XIVE allocator which was difficult to do while
> keeping the compatibility with XICS for migration and CAS.

That will be a serious bloat with lots of interrupts. We also cannot
possibly have a KVM mm region per interrupt or even a vma.

I'm thinking of some kind of /dev/xive (or some other KVM or irqfd
orignated fd) that allows you to mmap a single big region whose content
is demand-faulted and invalidated by the kernel to map the various
interrupts.

So that it looks like a single VMA (and KVM memory block).

Ben.

> C.
> 
> 



Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources

2017-09-28 Thread Benjamin Herrenschmidt
On Wed, 2017-09-20 at 14:54 +0200, Cédric Le Goater wrote:
> On 09/19/2017 04:57 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
> > > Each interrupt source is associated with a two bit state machine
> > > called an Event State Buffer (ESB) which is controlled by MMIO to
> > > trigger events. See code for more details on the states and
> > > transitions.
> > > 
> > > The MMIO space for the ESB translation is 512GB large on baremetal
> > > (powernv) systems and the BAR depends on the chip id. In our model for
> > > the sPAPR machine, we choose to only map a sub memory region for the
> > > provisionned IRQ numbers and to use the mapping address of chip 0 on a
> > > real system. The OS will get the address of the MMIO page of the ESB
> > > entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
> > 
> > On bare metal, are the MMIOs for each irq source mapped contiguously?
> 
> yes. 

Sort-of...

There are several source "controllers" in the system. Each PHB gets a
range of numbers, and the XIVE itself has about half a million of
generic sources (aka 'IPIs') which we use for IPIs and virtual device
interrupts.

Each of those "controller" has its own MMIO area. So the MMIOs are only
mapped contiguously within a given controller.

With pass-through, things are a bit more complex because a given guest
visible source can become either an IPI (when not attached to the host
interrupt) or a real HW source. So we'll have to invalidate the GPA-
>HVA mapping and remap. Tricky (but doable). I have some ideas about
how to plumb all that but haven't really fully thought it out.

> > > For KVM support, we should think of a way to map this QEMU memory
> > > region in the host to trigger events directly.
> > 
> > This would rely on being able to map them without mapping those for
> > any other VM or the host.  Does that mean allocating a contiguous (and
> > aligned) hunk of irqs for a guest?
> 
> I think so yes, the IRQ and the memory regions are tied, and also being 
> able to pass the MMIO region from the host to the guest, a bit like VFIO 
> for the IOMMU regions I suppose. But I haven't dig the problem too much. 
> 
> This is an important part in the overall design. 

There are also MMIO regions associated with queues.

> > We're going to need to be careful about irq allocation here.
> > Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
> > MMIO addresses, 

> GET_SOURCE_INFO only retrieves the address of the MMIO region for 
> a 'lisn'.

An interrupt number as coming from the device-tree.

>  it is not dynamically mapped. In the KVM case, the initial
> information on the address would come from OPAL and then the host 
> kernel would translate this information for the guest.
> 
> > we need the MMIO addresses to be stable and consistent, because 
> > we can't have them change across migration.  
> 
> yes. I will catch my XIVE guru next week in Paris to clarify that
> part. 
> 
> > We need to have this consistent between in-qemu and in-KVM XIVE
> > implementations as well.
> 
> yes.
> 
> C.
> 
> > > 
> > > Signed-off-by: Cédric Le Goater 
> > > ---
> > >  hw/intc/spapr_xive.c| 255 
> > > 
> > >  include/hw/ppc/spapr_xive.h |   6 ++
> > >  2 files changed, 261 insertions(+)
> > > 
> > > diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> > > index 1ed7b6a286e9..8a85d64efc4c 100644
> > > --- a/hw/intc/spapr_xive.c
> > > +++ b/hw/intc/spapr_xive.c
> > > @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
> > >  }
> > >  
> > >  /*
> > > + * "magic" Event State Buffer (ESB) MMIO offsets.
> > > + *
> > > + * Each interrupt source has a 2-bit state machine called ESB
> > > + * which can be controlled by MMIO. It's made of 2 bits, P and
> > > + * Q. P indicates that an interrupt is pending (has been sent
> > > + * to a queue and is waiting for an EOI). Q indicates that the
> > > + * interrupt has been triggered while pending.
> > > + *
> > > + * This acts as a coalescing mechanism in order to guarantee
> > > + * that a given interrupt only occurs at most once in a queue.
> > > + *
> > > + * When doing an EOI, the Q bit will indicate if the interrupt
> > > + * needs to be re-triggered.
> > > + *
> > > + * The following offsets into the ESB MMIO allow to read or
> > > + * manipulate the PQ bits. They must be used with an 8-bytes
> > > + * load instruction. They all return the previous state of the
> > > + * interrupt (atomically).
> > > + *
> > > + * Additionally, some ESB pages support doing an EOI via a
> > > + * store at 0 and some ESBs support doing a trigger via a
> > > + * separate trigger page.
> > > + */
> > > +#define XIVE_ESB_GET0x800
> > > +#define XIVE_ESB_SET_PQ_00  0xc00
> > > +#define XIVE_ESB_SET_PQ_01  0xd00
> > > +#define XIVE_ESB_SET_PQ_10  0xe00
> > > +#define XIVE_ESB_SET_PQ_11  0xf00
> > > +
> > > +#define 

Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources

2017-09-22 Thread Cédric Le Goater
On 09/22/2017 12:58 PM, David Gibson wrote:
> On Wed, Sep 20, 2017 at 02:54:31PM +0200, Cédric Le Goater wrote:
>> On 09/19/2017 04:57 AM, David Gibson wrote:
>>> On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
 Each interrupt source is associated with a two bit state machine
 called an Event State Buffer (ESB) which is controlled by MMIO to
 trigger events. See code for more details on the states and
 transitions.

 The MMIO space for the ESB translation is 512GB large on baremetal
 (powernv) systems and the BAR depends on the chip id. In our model for
 the sPAPR machine, we choose to only map a sub memory region for the
 provisionned IRQ numbers and to use the mapping address of chip 0 on a
 real system. The OS will get the address of the MMIO page of the ESB
 entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
>>>
>>> On bare metal, are the MMIOs for each irq source mapped contiguously?
>>
>> yes. 
>>  
 For KVM support, we should think of a way to map this QEMU memory
 region in the host to trigger events directly.
>>>
>>> This would rely on being able to map them without mapping those for
>>> any other VM or the host.  Does that mean allocating a contiguous (and
>>> aligned) hunk of irqs for a guest?
>>
>> I think so yes, the IRQ and the memory regions are tied, and also being 
>> able to pass the MMIO region from the host to the guest, a bit like VFIO 
>> for the IOMMU regions I suppose. But I haven't dig the problem too much. 
>>
>> This is an important part in the overall design. 
>>
>>> We're going to need to be careful about irq allocation here.
>>> Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
>>> MMIO addresses, 
>>
>> GET_SOURCE_INFO only retrieves the address of the MMIO region for 
>> a 'lisn'. it is not dynamically mapped.
> 
> Ok... what's a "lisn"?

Logical Interrupt Source Number. This is the source number the OS guest 
manipulates in the hcalls. The OS registers also an EISN, Effective Interrupt 
Source Number, associated with the LISN, which will be stored in the event 
queue.

C. 

> 
> 
>> In the KVM case, the initial
>> information on the address would come from OPAL and then the host 
>> kernel would translate this information for the guest.
>>
>>> we need the MMIO addresses to be stable and consistent, because 
>>> we can't have them change across migration.  
>>
>> yes. I will catch my XIVE guru next week in Paris to clarify that
>> part. 
>>
>>> We need to have this consistent between in-qemu and in-KVM XIVE
>>> implementations as well.
>>
>> yes.
>>
>> C.
>>

 Signed-off-by: Cédric Le Goater 
 ---
  hw/intc/spapr_xive.c| 255 
 
  include/hw/ppc/spapr_xive.h |   6 ++
  2 files changed, 261 insertions(+)

 diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
 index 1ed7b6a286e9..8a85d64efc4c 100644
 --- a/hw/intc/spapr_xive.c
 +++ b/hw/intc/spapr_xive.c
 @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
  }
  
  /*
 + * "magic" Event State Buffer (ESB) MMIO offsets.
 + *
 + * Each interrupt source has a 2-bit state machine called ESB
 + * which can be controlled by MMIO. It's made of 2 bits, P and
 + * Q. P indicates that an interrupt is pending (has been sent
 + * to a queue and is waiting for an EOI). Q indicates that the
 + * interrupt has been triggered while pending.
 + *
 + * This acts as a coalescing mechanism in order to guarantee
 + * that a given interrupt only occurs at most once in a queue.
 + *
 + * When doing an EOI, the Q bit will indicate if the interrupt
 + * needs to be re-triggered.
 + *
 + * The following offsets into the ESB MMIO allow to read or
 + * manipulate the PQ bits. They must be used with an 8-bytes
 + * load instruction. They all return the previous state of the
 + * interrupt (atomically).
 + *
 + * Additionally, some ESB pages support doing an EOI via a
 + * store at 0 and some ESBs support doing a trigger via a
 + * separate trigger page.
 + */
 +#define XIVE_ESB_GET0x800
 +#define XIVE_ESB_SET_PQ_00  0xc00
 +#define XIVE_ESB_SET_PQ_01  0xd00
 +#define XIVE_ESB_SET_PQ_10  0xe00
 +#define XIVE_ESB_SET_PQ_11  0xf00
 +
 +#define XIVE_ESB_VAL_P  0x2
 +#define XIVE_ESB_VAL_Q  0x1
 +
 +#define XIVE_ESB_RESET  0x0
 +#define XIVE_ESB_PENDINGXIVE_ESB_VAL_P
 +#define XIVE_ESB_QUEUED (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
 +#define XIVE_ESB_OFFXIVE_ESB_VAL_Q
 +
 +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
 +{
 +uint32_t byte = idx / 4;
 +uint32_t bit  = (idx % 4) * 2;
 +
 +assert(byte < xive->sbe_size);
 +
 +  

Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources

2017-09-22 Thread David Gibson
On Wed, Sep 20, 2017 at 02:54:31PM +0200, Cédric Le Goater wrote:
> On 09/19/2017 04:57 AM, David Gibson wrote:
> > On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
> >> Each interrupt source is associated with a two bit state machine
> >> called an Event State Buffer (ESB) which is controlled by MMIO to
> >> trigger events. See code for more details on the states and
> >> transitions.
> >>
> >> The MMIO space for the ESB translation is 512GB large on baremetal
> >> (powernv) systems and the BAR depends on the chip id. In our model for
> >> the sPAPR machine, we choose to only map a sub memory region for the
> >> provisionned IRQ numbers and to use the mapping address of chip 0 on a
> >> real system. The OS will get the address of the MMIO page of the ESB
> >> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
> > 
> > On bare metal, are the MMIOs for each irq source mapped contiguously?
> 
> yes. 
>  
> >> For KVM support, we should think of a way to map this QEMU memory
> >> region in the host to trigger events directly.
> > 
> > This would rely on being able to map them without mapping those for
> > any other VM or the host.  Does that mean allocating a contiguous (and
> > aligned) hunk of irqs for a guest?
> 
> I think so yes, the IRQ and the memory regions are tied, and also being 
> able to pass the MMIO region from the host to the guest, a bit like VFIO 
> for the IOMMU regions I suppose. But I haven't dig the problem too much. 
> 
> This is an important part in the overall design. 
> 
> > We're going to need to be careful about irq allocation here.
> > Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
> > MMIO addresses, 
> 
> GET_SOURCE_INFO only retrieves the address of the MMIO region for 
> a 'lisn'. it is not dynamically mapped.

Ok... what's a "lisn"?


> In the KVM case, the initial
> information on the address would come from OPAL and then the host 
> kernel would translate this information for the guest.
> 
> > we need the MMIO addresses to be stable and consistent, because 
> > we can't have them change across migration.  
> 
> yes. I will catch my XIVE guru next week in Paris to clarify that
> part. 
> 
> > We need to have this consistent between in-qemu and in-KVM XIVE
> > implementations as well.
> 
> yes.
> 
> C.
> 
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  hw/intc/spapr_xive.c| 255 
> >> 
> >>  include/hw/ppc/spapr_xive.h |   6 ++
> >>  2 files changed, 261 insertions(+)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 1ed7b6a286e9..8a85d64efc4c 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
> >>  }
> >>  
> >>  /*
> >> + * "magic" Event State Buffer (ESB) MMIO offsets.
> >> + *
> >> + * Each interrupt source has a 2-bit state machine called ESB
> >> + * which can be controlled by MMIO. It's made of 2 bits, P and
> >> + * Q. P indicates that an interrupt is pending (has been sent
> >> + * to a queue and is waiting for an EOI). Q indicates that the
> >> + * interrupt has been triggered while pending.
> >> + *
> >> + * This acts as a coalescing mechanism in order to guarantee
> >> + * that a given interrupt only occurs at most once in a queue.
> >> + *
> >> + * When doing an EOI, the Q bit will indicate if the interrupt
> >> + * needs to be re-triggered.
> >> + *
> >> + * The following offsets into the ESB MMIO allow to read or
> >> + * manipulate the PQ bits. They must be used with an 8-bytes
> >> + * load instruction. They all return the previous state of the
> >> + * interrupt (atomically).
> >> + *
> >> + * Additionally, some ESB pages support doing an EOI via a
> >> + * store at 0 and some ESBs support doing a trigger via a
> >> + * separate trigger page.
> >> + */
> >> +#define XIVE_ESB_GET0x800
> >> +#define XIVE_ESB_SET_PQ_00  0xc00
> >> +#define XIVE_ESB_SET_PQ_01  0xd00
> >> +#define XIVE_ESB_SET_PQ_10  0xe00
> >> +#define XIVE_ESB_SET_PQ_11  0xf00
> >> +
> >> +#define XIVE_ESB_VAL_P  0x2
> >> +#define XIVE_ESB_VAL_Q  0x1
> >> +
> >> +#define XIVE_ESB_RESET  0x0
> >> +#define XIVE_ESB_PENDINGXIVE_ESB_VAL_P
> >> +#define XIVE_ESB_QUEUED (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
> >> +#define XIVE_ESB_OFFXIVE_ESB_VAL_Q
> >> +
> >> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
> >> +{
> >> +uint32_t byte = idx / 4;
> >> +uint32_t bit  = (idx % 4) * 2;
> >> +
> >> +assert(byte < xive->sbe_size);
> >> +
> >> +return (xive->sbe[byte] >> bit) & 0x3;
> >> +}
> >> +
> >> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t 
> >> pq)
> >> +{
> >> +uint32_t byte = idx / 4;
> >> +uint32_t bit  = (idx % 4) * 2;
> >> +uint8_t old, new;
> >> +
> >> +assert(byte < xive->sbe_size);
> >> 

Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources

2017-09-20 Thread Cédric Le Goater
On 09/19/2017 04:57 AM, David Gibson wrote:
> On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
>> Each interrupt source is associated with a two bit state machine
>> called an Event State Buffer (ESB) which is controlled by MMIO to
>> trigger events. See code for more details on the states and
>> transitions.
>>
>> The MMIO space for the ESB translation is 512GB large on baremetal
>> (powernv) systems and the BAR depends on the chip id. In our model for
>> the sPAPR machine, we choose to only map a sub memory region for the
>> provisionned IRQ numbers and to use the mapping address of chip 0 on a
>> real system. The OS will get the address of the MMIO page of the ESB
>> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.
> 
> On bare metal, are the MMIOs for each irq source mapped contiguously?

yes. 
 
>> For KVM support, we should think of a way to map this QEMU memory
>> region in the host to trigger events directly.
> 
> This would rely on being able to map them without mapping those for
> any other VM or the host.  Does that mean allocating a contiguous (and
> aligned) hunk of irqs for a guest?

I think so yes, the IRQ and the memory regions are tied, and also being 
able to pass the MMIO region from the host to the guest, a bit like VFIO 
for the IOMMU regions I suppose. But I haven't dig the problem too much. 

This is an important part in the overall design. 

> We're going to need to be careful about irq allocation here.
> Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
> MMIO addresses, 

GET_SOURCE_INFO only retrieves the address of the MMIO region for 
a 'lisn'. it is not dynamically mapped. In the KVM case, the initial
information on the address would come from OPAL and then the host 
kernel would translate this information for the guest.

> we need the MMIO addresses to be stable and consistent, because 
> we can't have them change across migration.  

yes. I will catch my XIVE guru next week in Paris to clarify that
part. 

> We need to have this consistent between in-qemu and in-KVM XIVE
> implementations as well.

yes.

C.

>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  hw/intc/spapr_xive.c| 255 
>> 
>>  include/hw/ppc/spapr_xive.h |   6 ++
>>  2 files changed, 261 insertions(+)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 1ed7b6a286e9..8a85d64efc4c 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>>  }
>>  
>>  /*
>> + * "magic" Event State Buffer (ESB) MMIO offsets.
>> + *
>> + * Each interrupt source has a 2-bit state machine called ESB
>> + * which can be controlled by MMIO. It's made of 2 bits, P and
>> + * Q. P indicates that an interrupt is pending (has been sent
>> + * to a queue and is waiting for an EOI). Q indicates that the
>> + * interrupt has been triggered while pending.
>> + *
>> + * This acts as a coalescing mechanism in order to guarantee
>> + * that a given interrupt only occurs at most once in a queue.
>> + *
>> + * When doing an EOI, the Q bit will indicate if the interrupt
>> + * needs to be re-triggered.
>> + *
>> + * The following offsets into the ESB MMIO allow to read or
>> + * manipulate the PQ bits. They must be used with an 8-bytes
>> + * load instruction. They all return the previous state of the
>> + * interrupt (atomically).
>> + *
>> + * Additionally, some ESB pages support doing an EOI via a
>> + * store at 0 and some ESBs support doing a trigger via a
>> + * separate trigger page.
>> + */
>> +#define XIVE_ESB_GET0x800
>> +#define XIVE_ESB_SET_PQ_00  0xc00
>> +#define XIVE_ESB_SET_PQ_01  0xd00
>> +#define XIVE_ESB_SET_PQ_10  0xe00
>> +#define XIVE_ESB_SET_PQ_11  0xf00
>> +
>> +#define XIVE_ESB_VAL_P  0x2
>> +#define XIVE_ESB_VAL_Q  0x1
>> +
>> +#define XIVE_ESB_RESET  0x0
>> +#define XIVE_ESB_PENDINGXIVE_ESB_VAL_P
>> +#define XIVE_ESB_QUEUED (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
>> +#define XIVE_ESB_OFFXIVE_ESB_VAL_Q
>> +
>> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
>> +{
>> +uint32_t byte = idx / 4;
>> +uint32_t bit  = (idx % 4) * 2;
>> +
>> +assert(byte < xive->sbe_size);
>> +
>> +return (xive->sbe[byte] >> bit) & 0x3;
>> +}
>> +
>> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
>> +{
>> +uint32_t byte = idx / 4;
>> +uint32_t bit  = (idx % 4) * 2;
>> +uint8_t old, new;
>> +
>> +assert(byte < xive->sbe_size);
>> +
>> +old = xive->sbe[byte];
>> +
>> +new = xive->sbe[byte] & ~(0x3 << bit);
>> +new |= (pq & 0x3) << bit;
>> +
>> +xive->sbe[byte] = new;
>> +
>> +return (old >> bit) & 0x3;
>> +}
>> +
>> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
>> +{
>> +uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
>> +
>> +

Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources

2017-09-20 Thread Cédric Le Goater

>> +/*
>> + * XIVE Interrupt Source MMIOs
>> + */
>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned 
>> size)
>> +{
>> +sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +uint32_t offset = addr & 0xF00;
>> +uint32_t srcno = addr >> xive->esb_shift;
>> +XiveIVE *ive;
>> +uint64_t ret = -1;
>> +
>> +ive = spapr_xive_get_ive(xive, srcno);
>> +if (!ive || !(ive->w & IVE_VALID))  {
>> +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %d\n", srcno);
>> +goto out;
> 
> Since there's a whole (4k) page for each source, I wonder if we should
> actually map each one as a separate MMIO region to allow us to tweak
> the mappings more flexibly
yes we could have a subregion for each source. In that case, 
we should also handle IVE_VALID properly. That will require 
a specific XIVE allocator which was difficult to do while
keeping the compatibility with XICS for migration and CAS.


C.






Re: [Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources

2017-09-19 Thread David Gibson
On Mon, Sep 11, 2017 at 07:12:21PM +0200, Cédric Le Goater wrote:
> Each interrupt source is associated with a two bit state machine
> called an Event State Buffer (ESB) which is controlled by MMIO to
> trigger events. See code for more details on the states and
> transitions.
> 
> The MMIO space for the ESB translation is 512GB large on baremetal
> (powernv) systems and the BAR depends on the chip id. In our model for
> the sPAPR machine, we choose to only map a sub memory region for the
> provisionned IRQ numbers and to use the mapping address of chip 0 on a
> real system. The OS will get the address of the MMIO page of the ESB
> entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.

On bare metal, are the MMIOs for each irq source mapped contiguously?

> For KVM support, we should think of a way to map this QEMU memory
> region in the host to trigger events directly.

This would rely on being able to map them without mapping those for
any other VM or the host.  Does that mean allocating a contiguous (and
aligned) hunk of irqs for a guest?

We're going to need to be careful about irq allocation here.
Even though GET_SOURCE_INFO allows dynamic mapping of irq numbers to
MMIO addresses, we need the MMIO addresses to be stable and
consistent, because we can't have them change across migration.  We
need to have this consistent between in-qemu and in-KVM XIVE
implementations as well.

> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/intc/spapr_xive.c| 255 
> 
>  include/hw/ppc/spapr_xive.h |   6 ++
>  2 files changed, 261 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 1ed7b6a286e9..8a85d64efc4c 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
>  }
>  
>  /*
> + * "magic" Event State Buffer (ESB) MMIO offsets.
> + *
> + * Each interrupt source has a 2-bit state machine called ESB
> + * which can be controlled by MMIO. It's made of 2 bits, P and
> + * Q. P indicates that an interrupt is pending (has been sent
> + * to a queue and is waiting for an EOI). Q indicates that the
> + * interrupt has been triggered while pending.
> + *
> + * This acts as a coalescing mechanism in order to guarantee
> + * that a given interrupt only occurs at most once in a queue.
> + *
> + * When doing an EOI, the Q bit will indicate if the interrupt
> + * needs to be re-triggered.
> + *
> + * The following offsets into the ESB MMIO allow to read or
> + * manipulate the PQ bits. They must be used with an 8-bytes
> + * load instruction. They all return the previous state of the
> + * interrupt (atomically).
> + *
> + * Additionally, some ESB pages support doing an EOI via a
> + * store at 0 and some ESBs support doing a trigger via a
> + * separate trigger page.
> + */
> +#define XIVE_ESB_GET0x800
> +#define XIVE_ESB_SET_PQ_00  0xc00
> +#define XIVE_ESB_SET_PQ_01  0xd00
> +#define XIVE_ESB_SET_PQ_10  0xe00
> +#define XIVE_ESB_SET_PQ_11  0xf00
> +
> +#define XIVE_ESB_VAL_P  0x2
> +#define XIVE_ESB_VAL_Q  0x1
> +
> +#define XIVE_ESB_RESET  0x0
> +#define XIVE_ESB_PENDINGXIVE_ESB_VAL_P
> +#define XIVE_ESB_QUEUED (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
> +#define XIVE_ESB_OFFXIVE_ESB_VAL_Q
> +
> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
> +{
> +uint32_t byte = idx / 4;
> +uint32_t bit  = (idx % 4) * 2;
> +
> +assert(byte < xive->sbe_size);
> +
> +return (xive->sbe[byte] >> bit) & 0x3;
> +}
> +
> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
> +{
> +uint32_t byte = idx / 4;
> +uint32_t bit  = (idx % 4) * 2;
> +uint8_t old, new;
> +
> +assert(byte < xive->sbe_size);
> +
> +old = xive->sbe[byte];
> +
> +new = xive->sbe[byte] & ~(0x3 << bit);
> +new |= (pq & 0x3) << bit;
> +
> +xive->sbe[byte] = new;
> +
> +return (old >> bit) & 0x3;
> +}
> +
> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
> +{
> +uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
> +
> +switch (old_pq) {
> +case XIVE_ESB_RESET:
> +spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
> +return false;
> +case XIVE_ESB_PENDING:
> +spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
> +return false;
> +case XIVE_ESB_QUEUED:
> +spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
> +return true;
> +case XIVE_ESB_OFF:
> +spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
> +return false;
> +default:
> + g_assert_not_reached();
> +}
> +}
> +
> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)
> +{
> +uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
> +
> +switch (old_pq) {
> +case XIVE_ESB_RESET:
> +spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
> +

[Qemu-devel] [RFC PATCH v2 07/21] ppc/xive: add MMIO handlers for the XIVE interrupt sources

2017-09-11 Thread Cédric Le Goater
Each interrupt source is associated with a two bit state machine
called an Event State Buffer (ESB) which is controlled by MMIO to
trigger events. See code for more details on the states and
transitions.

The MMIO space for the ESB translation is 512GB large on baremetal
(powernv) systems and the BAR depends on the chip id. In our model for
the sPAPR machine, we choose to only map a sub memory region for the
provisionned IRQ numbers and to use the mapping address of chip 0 on a
real system. The OS will get the address of the MMIO page of the ESB
entry associated with an IRQ using the H_INT_GET_SOURCE_INFO hcall.

For KVM support, we should think of a way to map this QEMU memory
region in the host to trigger events directly.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/spapr_xive.c| 255 
 include/hw/ppc/spapr_xive.h |   6 ++
 2 files changed, 261 insertions(+)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 1ed7b6a286e9..8a85d64efc4c 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -33,6 +33,218 @@ static void spapr_xive_irq(sPAPRXive *xive, int srcno)
 }
 
 /*
+ * "magic" Event State Buffer (ESB) MMIO offsets.
+ *
+ * Each interrupt source has a 2-bit state machine called ESB
+ * which can be controlled by MMIO. It's made of 2 bits, P and
+ * Q. P indicates that an interrupt is pending (has been sent
+ * to a queue and is waiting for an EOI). Q indicates that the
+ * interrupt has been triggered while pending.
+ *
+ * This acts as a coalescing mechanism in order to guarantee
+ * that a given interrupt only occurs at most once in a queue.
+ *
+ * When doing an EOI, the Q bit will indicate if the interrupt
+ * needs to be re-triggered.
+ *
+ * The following offsets into the ESB MMIO allow to read or
+ * manipulate the PQ bits. They must be used with an 8-bytes
+ * load instruction. They all return the previous state of the
+ * interrupt (atomically).
+ *
+ * Additionally, some ESB pages support doing an EOI via a
+ * store at 0 and some ESBs support doing a trigger via a
+ * separate trigger page.
+ */
+#define XIVE_ESB_GET0x800
+#define XIVE_ESB_SET_PQ_00  0xc00
+#define XIVE_ESB_SET_PQ_01  0xd00
+#define XIVE_ESB_SET_PQ_10  0xe00
+#define XIVE_ESB_SET_PQ_11  0xf00
+
+#define XIVE_ESB_VAL_P  0x2
+#define XIVE_ESB_VAL_Q  0x1
+
+#define XIVE_ESB_RESET  0x0
+#define XIVE_ESB_PENDINGXIVE_ESB_VAL_P
+#define XIVE_ESB_QUEUED (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
+#define XIVE_ESB_OFFXIVE_ESB_VAL_Q
+
+static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t idx)
+{
+uint32_t byte = idx / 4;
+uint32_t bit  = (idx % 4) * 2;
+
+assert(byte < xive->sbe_size);
+
+return (xive->sbe[byte] >> bit) & 0x3;
+}
+
+static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t idx, uint8_t pq)
+{
+uint32_t byte = idx / 4;
+uint32_t bit  = (idx % 4) * 2;
+uint8_t old, new;
+
+assert(byte < xive->sbe_size);
+
+old = xive->sbe[byte];
+
+new = xive->sbe[byte] & ~(0x3 << bit);
+new |= (pq & 0x3) << bit;
+
+xive->sbe[byte] = new;
+
+return (old >> bit) & 0x3;
+}
+
+static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t srcno)
+{
+uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
+
+switch (old_pq) {
+case XIVE_ESB_RESET:
+spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
+return false;
+case XIVE_ESB_PENDING:
+spapr_xive_pq_set(xive, srcno, XIVE_ESB_RESET);
+return false;
+case XIVE_ESB_QUEUED:
+spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
+return true;
+case XIVE_ESB_OFF:
+spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
+return false;
+default:
+ g_assert_not_reached();
+}
+}
+
+static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t srcno)
+{
+uint8_t old_pq = spapr_xive_pq_get(xive, srcno);
+
+switch (old_pq) {
+case XIVE_ESB_RESET:
+spapr_xive_pq_set(xive, srcno, XIVE_ESB_PENDING);
+return true;
+case XIVE_ESB_PENDING:
+spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
+return true;
+case XIVE_ESB_QUEUED:
+spapr_xive_pq_set(xive, srcno, XIVE_ESB_QUEUED);
+return true;
+case XIVE_ESB_OFF:
+spapr_xive_pq_set(xive, srcno, XIVE_ESB_OFF);
+return false;
+default:
+ g_assert_not_reached();
+}
+}
+
+/*
+ * XIVE Interrupt Source MMIOs
+ */
+static void spapr_xive_source_eoi(sPAPRXive *xive, uint32_t srcno)
+{
+ICSIRQState *irq = >ics->irqs[srcno];
+
+if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+irq->status &= ~XICS_STATUS_SENT;
+}
+}
+
+/* TODO: handle second page
+ *
+ * Some HW use a separate page for trigger. We only support the case
+ * in which the trigger can be done in the same page as the EOI.
+ */
+static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
+{
+