On Wed, Apr 27, 2016 at 07:14:15PM +1000, Alexey Kardashevskiy wrote:
> On 04/27/2016 04:39 PM, David Gibson wrote:
> >On Thu, Apr 21, 2016 at 02:22:01PM +1000, Alexey Kardashevskiy wrote:
> >>On 04/21/2016 01:59 PM, David Gibson wrote:
> >>>On Wed, Apr 20, 2016 at 07:15:15PM +1000, Alexey Kardashevskiy wrote:
> >>>>On 04/07/2016 10:40 AM, David Gibson wrote:
> >>>>>On Mon, Apr 04, 2016 at 07:33:43PM +1000, Alexey Kardashevskiy wrote:
> >>>>>>The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU -
> >>>>>>a guest view of the table and a hardware TCE table. If there is no VFIO
> >>>>>>presense in the address space, then just the guest view is used, if
> >>>>>>this is the case, it is allocated in the KVM. However since there is no
> >>>>>>support yet for VFIO in KVM TCE hypercalls, when we start using VFIO,
> >>>>>>we need to move the guest view from KVM to the userspace; and we need
> >>>>>>to do this for every IOMMU on a bus with VFIO devices.
> >>>>>>
> >>>>>>This adds vfio_start/vfio_stop callbacks in MemoryRegionIOMMUOps to
> >>>>>>notifiy IOMMU about changing environment so it can reallocate the table
> >>>>>>to/from KVM or (when available) hook the IOMMU groups with the logical
> >>>>>>bus (LIOBN) in the KVM.
> >>>>>>
> >>>>>>This removes explicit spapr_tce_set_need_vfio() call from PCI hotplug
> >>>>>>path as the new callbacks do this better - they notify IOMMU at
> >>>>>>the exact moment when the configuration is changed, and this also
> >>>>>>includes the case of PCI hot unplug.
> >>>>>>
> >>>>>>As there can be multiple containers attached to the same PHB/LIOBN,
> >>>>>>this replaces the @need_vfio flag in sPAPRTCETable with the counter
> >>>>>>of VFIO users.
> >>>>>>
> >>>>>>Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
> >>>>>
> >>>>>This looks correct, but there's one remaining ugly.
> >>>>>
> >>>>>>---
> >>>>>>Changes:
> >>>>>>v15:
> >>>>>>* s/need_vfio/vfio-Users/g
> >>>>>>---
> >>>>>> hw/ppc/spapr_iommu.c   | 30 ++++++++++++++++++++----------
> >>>>>> hw/ppc/spapr_pci.c     |  6 ------
> >>>>>> hw/vfio/common.c       |  9 +++++++++
> >>>>>> include/exec/memory.h  |  4 ++++
> >>>>>> include/hw/ppc/spapr.h |  2 +-
> >>>>>> 5 files changed, 34 insertions(+), 17 deletions(-)
> >>>>>>
> >>>>>>diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> >>>>>>index c945dba..ea09414 100644
> >>>>>>--- a/hw/ppc/spapr_iommu.c
> >>>>>>+++ b/hw/ppc/spapr_iommu.c
> >>>>>>@@ -155,6 +155,16 @@ static uint64_t 
> >>>>>>spapr_tce_get_page_sizes(MemoryRegion *iommu)
> >>>>>>     return 1ULL << tcet->page_shift;
> >>>>>> }
> >>>>>>
> >>>>>>+static void spapr_tce_vfio_start(MemoryRegion *iommu)
> >>>>>>+{
> >>>>>>+    spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), 
> >>>>>>true);
> >>>>>>+}
> >>>>>>+
> >>>>>>+static void spapr_tce_vfio_stop(MemoryRegion *iommu)
> >>>>>>+{
> >>>>>>+    spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), 
> >>>>>>false);
> >>>>>>+}
> >>>>>>+
> >>>>>> static void spapr_tce_table_do_enable(sPAPRTCETable *tcet);
> >>>>>> static void spapr_tce_table_do_disable(sPAPRTCETable *tcet);
> >>>>>>
> >>>>>>@@ -239,6 +249,8 @@ static const VMStateDescription 
> >>>>>>vmstate_spapr_tce_table = {
> >>>>>> static MemoryRegionIOMMUOps spapr_iommu_ops = {
> >>>>>>     .translate = spapr_tce_translate_iommu,
> >>>>>>     .get_page_sizes = spapr_tce_get_page_sizes,
> >>>>>>+    .vfio_start = spapr_tce_vfio_start,
> >>>>>>+    .vfio_stop = spapr_tce_vfio_stop,
> >>>>>
> >>>>>Ok, so AFAICT these callbacks are called whenever a VFIO context is
> >>>>>added / removed from the gIOMMU's address space, and it's up to the
> >>>>>gIOMMU code to ref count that to see if there are any current vfio
> >>>>>users.  That makes "vfio_start" and "vfio_stop" not great names.
> >>>>>
> >>>>>But.. better than changing the names would be to move the refcounting
> >>>>>to the generic code if you can manage it, so the individual gIOMMU
> >>>>>backends don't need to - they just told when they need to start / stop
> >>>>>providing VFIO support.
> >>>>
> >>>>Everything is manageable...
> >>>>
> >>>>This referencing is needed for the case of >=2 containers so
> >>>>2xvfio_listener_region_add will create 2xVFIOGuestIOMMU as they are per
> >>>>VFIOContainer so VFIOGuestIOMMU is not the right place for the reference
> >>>>counting, VFIOAddressSpace seems to be that place (=> add list of IOMMU 
> >>>>MRs
> >>>>with refcounter). Or even IOMMU MR. Or move VFIOGuestIOMMU list from
> >>>>VFIOContainer to VFIOAddressSpace and then gIOMMU can handle
> >>>>refcounting?
> >>>
> >>>I'm having a lot of trouble parsing that.  I think the ref parsing has
> >>>to be per-giommu (because individual giommus could, in theory, be
> >>>mapped or unmapped from an address space).
> >>
> >>
> >>Example 1.
> >>POWER8, no DDW, one QEMU PHB, 2 IOMMU groups, table sharing so just 1
> >>container, one TCE table (aka gIOMMU), one TCE table in KVM, no reference
> >>counting needed at all, simple.
> >>
> >>Example 2.
> >>POWER7, no DDW, one QEMU PHB, 2 IOMMU groups, no table sharing so there are
> >>2 containers but still one IOMMU MR which is added to each container so
> >>there are 2 gIOMMU objects. And there is still one TCE table in KVM (which
> >>is a guest view). Where do I put the reference counter which will count that
> >>there are 2 gIOMMUs per KVM TCE table in this example?
> >
> >Ah.. I'd forgotten that the gIOMMU object is per guest IOMMU window
> >*and* per container, not just per guest IOMMU window.
> >
> >Ultimately it's the code implementing the guest side IOMMU which needs
> >to know if it is supporting VFIO or not, so in generic terms that
> >means per IOMMU-type MemoryRegion.
> >
> >Essentially you need to count the number of VFIOGuestIOMMU objects
> >associated with each (gIOMMU) MemoryRegion, and notify the
> >MemoryRegion if that changes from zero to non-zero or vice versa.
> >
> >I'd prefer if we can maintain that count from just the VFIO code and
> >just notify the gIOMMU code on zero / non-zero changes.  But I guess
> >we'd need approval from Paolo to add that count to the MemoryRegion.
> 
> 
> Why MR? I could wrap MR to "VFIOIOMMUMR", add a counter and keep a list of
> these VFIOIOMMUMRs in VFIOAddressSpace.

Ah, yes I guess we could.  It's just kinda ugly to have to keep
another object with the same lifetime around for one extra counter.

> I am adding Paolo, just for the case :)
> 
> 
> >The fallback would be similar to what you have - instead the
> >MemoryRegion gets notified whenever a VFIOGuestIOMMU is attached or
> >removed, and the MR (i.e. the guest side IOMMU code) has to maintain
> >the count itself.
> 
> 
> 
> 
> 
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature

Reply via email to