> -----Original Message-----
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Friday, March 31, 2017 3:17 PM
> To: Liu, Yi L <yi.l....@intel.com>; 'Peter Xu' <pet...@redhat.com>
> Cc: Lan, Tianyu <tianyu....@intel.com>; Tian, Kevin <kevin.t...@intel.com>;
> 'm...@redhat.com' <m...@redhat.com>; 'jan.kis...@siemens.com'
> <jan.kis...@siemens.com>; 'bd.a...@gmail.com' <bd.a...@gmail.com>; 'qemu-
> de...@nongnu.org' <qemu-devel@nongnu.org>; 'alex.william...@redhat.com'
> <alex.william...@redhat.com>; 'David Gibson' <da...@gibson.dropbear.id.au>
> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add
> MemoryRegionIOMMUOps.replay() callback
> 
> 
> 
> On 2017年03月31日 13:34, Liu, Yi L wrote:
> >> -----Original Message-----
> >> From: Jason Wang [mailto:jasow...@redhat.com]
> >> Sent: Thursday, March 30, 2017 7:58 PM
> >> To: Liu, Yi L <yi.l....@intel.com>; 'Peter Xu' <pet...@redhat.com>
> >> Cc: 'alex.william...@redhat.com' <alex.william...@redhat.com>; Lan,
> >> Tianyu <tianyu....@intel.com>; Tian, Kevin <kevin.t...@intel.com>;
> 'm...@redhat.com'
> >> <m...@redhat.com>; 'jan.kis...@siemens.com' <jan.kis...@siemens.com>;
> >> 'bd.a...@gmail.com' <bd.a...@gmail.com>; 'David Gibson'
> >> <da...@gibson.dropbear.id.au>; 'qemu-devel@nongnu.org' <qemu-
> >> de...@nongnu.org>
> >> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add
> >> MemoryRegionIOMMUOps.replay() callback
> >>
> >>
> >>
> >> On 2017年03月30日 19:06, Liu, Yi L wrote:
> >>>> -----Original Message-----
> >>>> From: Liu, Yi L
> >>>> Sent: Monday, March 27, 2017 5:22 PM
> >>>> To: Peter Xu <pet...@redhat.com>
> >>>> Cc: alex.william...@redhat.com; Lan, Tianyu <tianyu....@intel.com>;
> >>>> Tian, Kevin <kevin.t...@intel.com>; m...@redhat.com;
> >>>> jan.kis...@siemens.com; jasow...@redhat.com; bd.a...@gmail.com;
> >>>> David Gibson <da...@gibson.dropbear.id.au>; qemu-devel@nongnu.org
> >>>> Subject: RE: [Qemu-devel] [PATCH v7 14/17] memory: add
> >>>> MemoryRegionIOMMUOps.replay() callback
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Peter Xu [mailto:pet...@redhat.com]
> >>>>> Sent: Monday, March 27, 2017 5:12 PM
> >>>>> To: Liu, Yi L <yi.l....@intel.com>
> >>>>> Cc: alex.william...@redhat.com; Lan, Tianyu
> >>>>> <tianyu....@intel.com>; Tian, Kevin <kevin.t...@intel.com>;
> >>>>> m...@redhat.com; jan.kis...@siemens.com; jasow...@redhat.com;
> >>>>> bd.a...@gmail.com; David Gibson <da...@gibson.dropbear.id.au>;
> >>>>> qemu-devel@nongnu.org
> >>>>> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add
> >>>>> MemoryRegionIOMMUOps.replay() callback
> >>>>>
> >>>>> On Mon, Mar 27, 2017 at 08:35:05AM +0000, Liu, Yi L wrote:
> >>>>>>> -----Original Message-----
> >>>>>>> From: Qemu-devel
> >>>>>>> [mailto:qemu-devel-bounces+yi.l.liu=intel....@nongnu.org] On
> >>>>>>> Behalf Of Peter Xu
> >>>>>>> Sent: Tuesday, February 7, 2017 4:28 PM
> >>>>>>> To: qemu-devel@nongnu.org
> >>>>>>> Cc: Lan, Tianyu <tianyu....@intel.com>; Tian, Kevin
> >>>>>>> <kevin.t...@intel.com>; m...@redhat.com; jan.kis...@siemens.com;
> >>>>>>> jasow...@redhat.com; pet...@redhat.com;
> >>>>>>> alex.william...@redhat.com; bd.a...@gmail.com; David Gibson
> >>>>>>> <da...@gibson.dropbear.id.au>
> >>>>>>> Subject: [Qemu-devel] [PATCH v7 14/17] memory: add
> >>>>>>> MemoryRegionIOMMUOps.replay() callback
> >>>>>>>
> >>>>>>> Originally we have one memory_region_iommu_replay() function,
> >>>>>>> which is the default behavior to replay the translations of the
> >>>>>>> whole IOMMU region. However, on some platform like x86, we may
> >>>>>>> want our own
> >>>>> replay logic for IOMMU regions.
> >>>>>>> This patch add one more hook for IOMMUOps for the callback, and
> >>>>>>> it'll override the default if set.
> >>>>>>>
> >>>>>>> Signed-off-by: Peter Xu <pet...@redhat.com>
> >>>>>>> ---
> >>>>>>>    include/exec/memory.h | 2 ++
> >>>>>>>    memory.c              | 6 ++++++
> >>>>>>>    2 files changed, 8 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h index
> >>>>>>> 0767888..30b2a74 100644
> >>>>>>> --- a/include/exec/memory.h
> >>>>>>> +++ b/include/exec/memory.h
> >>>>>>> @@ -191,6 +191,8 @@ struct MemoryRegionIOMMUOps {
> >>>>>>>        void (*notify_flag_changed)(MemoryRegion *iommu,
> >>>>>>>                                    IOMMUNotifierFlag old_flags,
> >>>>>>>                                    IOMMUNotifierFlag new_flags);
> >>>>>>> +    /* Set this up to provide customized IOMMU replay function */
> >>>>>>> +    void (*replay)(MemoryRegion *iommu, IOMMUNotifier
> >>>>>>> + *notifier);
> >>>>>>>    };
> >>>>>>>
> >>>>>>>    typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> >>>>>>> diff --git a/memory.c b/memory.c index 7a4f2f9..9c253cc 100644
> >>>>>>> --- a/memory.c
> >>>>>>> +++ b/memory.c
> >>>>>>> @@ -1630,6 +1630,12 @@ void
> >>>>>>> memory_region_iommu_replay(MemoryRegion
> >>>>>>> *mr, IOMMUNotifier *n,
> >>>>>>>        hwaddr addr, granularity;
> >>>>>>>        IOMMUTLBEntry iotlb;
> >>>>>>> +    /* If the IOMMU has its own replay callback, override */
> >>>>>>> +    if (mr->iommu_ops->replay) {
> >>>>>>> +        mr->iommu_ops->replay(mr, n);
> >>>>>>> +        return;
> >>>>>>> +    }
> >>>>>> Hi Alex, Peter,
> >>>>>>
> >>>>>> Will all the other vendors(e.g. PPC, s390, ARM) add their own
> >>>>>> replay callback as well? I guess it depends on whether the
> >>>>>> original replay algorithm work well for them? Do you have such 
> >>>>>> knowledge?
> >>>>> I guess so. At least for VT-d we had this callback since the
> >>>>> default replay mechanism did not work well on x86 due to its
> >>>>> extremely large memory region size. Thanks,
> >>>> thx. that would make sense.
> >>> Peter,
> >>>
> >>> Just come to mind that there may be a corner case here.
> >>>
> >>> Intel VT-d actually has a "pt" mode which allows device use physical
> >>> address even when VT-d is enabled. In kernel, there is a
> iommu_identity_mapping.
> >>> If a device is in this map, then it would use "pt" mode. So that
> >>> IOMMU driver would not build second-level page table for it.
> >> Yes, but qemu does not support ECAP_PT now, so guest will still have
> >> a page table in this case.
> > That's true. Without ECAP_PT, IOMMU driver would create a 1:1 map. So
> > this solution can work well even a device is in identify_map.
> >
> >>> Back to the virtual IOVA implementation, if an assigned device is in
> >>> the iommu_identity_mapping(e.g. VGA controller), it uses GPA directly to 
> >>> do
> DMA.
> >>> So it demands a GPA->HPA mapping in host. However, the
> >>> iommu->ops.replay is not able to build it when guest SL page table is 
> >>> empty.
> >>>
> >>> So I think building an entire guest PA->HPA mapping before guest
> >>> kernel boot would be recommended. Any thoughts?
> >> We plan to add PT in 2.10, a possible rough idea is disabled iommu
> >> dmar region and use another region without iommu_ops. Then
> >> vfio_listener_region_add() will just do the correct mappings.
> > Good to know it. Actually, I also need to expose ECAP_PT for vSVM. So
> > just comes to realize that the current replay solution may not work well 
> > when I
> expose ECAP_PT to guest.
> > I also have a rough idea here. The current listener in container
> > listens to address space named with devfn if virtual VTd is added. How
> > about adding one more listener to listen memory address space. So that the
> listener can build entire guest PA->HPA mapping.
> 
> This is only needed for PT. So looks like current code is sufficient to do 
> this I think.
> See the else part of if (memory_region_is_iommu()) of 
> vfio_listener_region_add().

Jason, when the listener listen to device address space, the "else part" may 
not work
even we set the mr->iommu_ops = NULL. The mr would be a non-ram region when the
time region_add is called since it is actually listen to changes from device 
address space.

Regards,
Yi L

Reply via email to