> -----Original Message----- > From: Jason Wang [mailto:jasow...@redhat.com] > Sent: Friday, March 31, 2017 3:17 PM > To: Liu, Yi L <yi.l....@intel.com>; 'Peter Xu' <pet...@redhat.com> > Cc: Lan, Tianyu <tianyu....@intel.com>; Tian, Kevin <kevin.t...@intel.com>; > 'm...@redhat.com' <m...@redhat.com>; 'jan.kis...@siemens.com' > <jan.kis...@siemens.com>; 'bd.a...@gmail.com' <bd.a...@gmail.com>; 'qemu- > de...@nongnu.org' <qemu-devel@nongnu.org>; 'alex.william...@redhat.com' > <alex.william...@redhat.com>; 'David Gibson' <da...@gibson.dropbear.id.au> > Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > MemoryRegionIOMMUOps.replay() callback > > > > On 2017年03月31日 13:34, Liu, Yi L wrote: > >> -----Original Message----- > >> From: Jason Wang [mailto:jasow...@redhat.com] > >> Sent: Thursday, March 30, 2017 7:58 PM > >> To: Liu, Yi L <yi.l....@intel.com>; 'Peter Xu' <pet...@redhat.com> > >> Cc: 'alex.william...@redhat.com' <alex.william...@redhat.com>; Lan, > >> Tianyu <tianyu....@intel.com>; Tian, Kevin <kevin.t...@intel.com>; > 'm...@redhat.com' > >> <m...@redhat.com>; 'jan.kis...@siemens.com' <jan.kis...@siemens.com>; > >> 'bd.a...@gmail.com' <bd.a...@gmail.com>; 'David Gibson' > >> <da...@gibson.dropbear.id.au>; 'qemu-devel@nongnu.org' <qemu- > >> de...@nongnu.org> > >> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > >> MemoryRegionIOMMUOps.replay() callback > >> > >> > >> > >> On 2017年03月30日 19:06, Liu, Yi L wrote: > >>>> -----Original Message----- > >>>> From: Liu, Yi L > >>>> Sent: Monday, March 27, 2017 5:22 PM > >>>> To: Peter Xu <pet...@redhat.com> > >>>> Cc: alex.william...@redhat.com; Lan, Tianyu <tianyu....@intel.com>; > >>>> Tian, Kevin <kevin.t...@intel.com>; m...@redhat.com; > >>>> jan.kis...@siemens.com; jasow...@redhat.com; bd.a...@gmail.com; > >>>> David Gibson <da...@gibson.dropbear.id.au>; qemu-devel@nongnu.org > >>>> Subject: RE: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>> MemoryRegionIOMMUOps.replay() callback > >>>> > >>>>> -----Original Message----- > >>>>> From: Peter Xu [mailto:pet...@redhat.com] > >>>>> Sent: Monday, March 27, 2017 5:12 PM > >>>>> To: Liu, Yi L <yi.l....@intel.com> > >>>>> Cc: alex.william...@redhat.com; Lan, Tianyu > >>>>> <tianyu....@intel.com>; Tian, Kevin <kevin.t...@intel.com>; > >>>>> m...@redhat.com; jan.kis...@siemens.com; jasow...@redhat.com; > >>>>> bd.a...@gmail.com; David Gibson <da...@gibson.dropbear.id.au>; > >>>>> qemu-devel@nongnu.org > >>>>> Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>>> MemoryRegionIOMMUOps.replay() callback > >>>>> > >>>>> On Mon, Mar 27, 2017 at 08:35:05AM +0000, Liu, Yi L wrote: > >>>>>>> -----Original Message----- > >>>>>>> From: Qemu-devel > >>>>>>> [mailto:qemu-devel-bounces+yi.l.liu=intel....@nongnu.org] On > >>>>>>> Behalf Of Peter Xu > >>>>>>> Sent: Tuesday, February 7, 2017 4:28 PM > >>>>>>> To: qemu-devel@nongnu.org > >>>>>>> Cc: Lan, Tianyu <tianyu....@intel.com>; Tian, Kevin > >>>>>>> <kevin.t...@intel.com>; m...@redhat.com; jan.kis...@siemens.com; > >>>>>>> jasow...@redhat.com; pet...@redhat.com; > >>>>>>> alex.william...@redhat.com; bd.a...@gmail.com; David Gibson > >>>>>>> <da...@gibson.dropbear.id.au> > >>>>>>> Subject: [Qemu-devel] [PATCH v7 14/17] memory: add > >>>>>>> MemoryRegionIOMMUOps.replay() callback > >>>>>>> > >>>>>>> Originally we have one memory_region_iommu_replay() function, > >>>>>>> which is the default behavior to replay the translations of the > >>>>>>> whole IOMMU region. However, on some platform like x86, we may > >>>>>>> want our own > >>>>> replay logic for IOMMU regions. > >>>>>>> This patch add one more hook for IOMMUOps for the callback, and > >>>>>>> it'll override the default if set. > >>>>>>> > >>>>>>> Signed-off-by: Peter Xu <pet...@redhat.com> > >>>>>>> --- > >>>>>>> include/exec/memory.h | 2 ++ > >>>>>>> memory.c | 6 ++++++ > >>>>>>> 2 files changed, 8 insertions(+) > >>>>>>> > >>>>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h index > >>>>>>> 0767888..30b2a74 100644 > >>>>>>> --- a/include/exec/memory.h > >>>>>>> +++ b/include/exec/memory.h > >>>>>>> @@ -191,6 +191,8 @@ struct MemoryRegionIOMMUOps { > >>>>>>> void (*notify_flag_changed)(MemoryRegion *iommu, > >>>>>>> IOMMUNotifierFlag old_flags, > >>>>>>> IOMMUNotifierFlag new_flags); > >>>>>>> + /* Set this up to provide customized IOMMU replay function */ > >>>>>>> + void (*replay)(MemoryRegion *iommu, IOMMUNotifier > >>>>>>> + *notifier); > >>>>>>> }; > >>>>>>> > >>>>>>> typedef struct CoalescedMemoryRange CoalescedMemoryRange; > >>>>>>> diff --git a/memory.c b/memory.c index 7a4f2f9..9c253cc 100644 > >>>>>>> --- a/memory.c > >>>>>>> +++ b/memory.c > >>>>>>> @@ -1630,6 +1630,12 @@ void > >>>>>>> memory_region_iommu_replay(MemoryRegion > >>>>>>> *mr, IOMMUNotifier *n, > >>>>>>> hwaddr addr, granularity; > >>>>>>> IOMMUTLBEntry iotlb; > >>>>>>> + /* If the IOMMU has its own replay callback, override */ > >>>>>>> + if (mr->iommu_ops->replay) { > >>>>>>> + mr->iommu_ops->replay(mr, n); > >>>>>>> + return; > >>>>>>> + } > >>>>>> Hi Alex, Peter, > >>>>>> > >>>>>> Will all the other vendors(e.g. PPC, s390, ARM) add their own > >>>>>> replay callback as well? I guess it depends on whether the > >>>>>> original replay algorithm work well for them? Do you have such > >>>>>> knowledge? > >>>>> I guess so. At least for VT-d we had this callback since the > >>>>> default replay mechanism did not work well on x86 due to its > >>>>> extremely large memory region size. Thanks, > >>>> thx. that would make sense. > >>> Peter, > >>> > >>> Just come to mind that there may be a corner case here. > >>> > >>> Intel VT-d actually has a "pt" mode which allows device use physical > >>> address even when VT-d is enabled. In kernel, there is a > iommu_identity_mapping. > >>> If a device is in this map, then it would use "pt" mode. So that > >>> IOMMU driver would not build second-level page table for it. > >> Yes, but qemu does not support ECAP_PT now, so guest will still have > >> a page table in this case. > > That's true. Without ECAP_PT, IOMMU driver would create a 1:1 map. So > > this solution can work well even a device is in identify_map. > > > >>> Back to the virtual IOVA implementation, if an assigned device is in > >>> the iommu_identity_mapping(e.g. VGA controller), it uses GPA directly to > >>> do > DMA. > >>> So it demands a GPA->HPA mapping in host. However, the > >>> iommu->ops.replay is not able to build it when guest SL page table is > >>> empty. > >>> > >>> So I think building an entire guest PA->HPA mapping before guest > >>> kernel boot would be recommended. Any thoughts? > >> We plan to add PT in 2.10, a possible rough idea is disabled iommu > >> dmar region and use another region without iommu_ops. Then > >> vfio_listener_region_add() will just do the correct mappings. > > Good to know it. Actually, I also need to expose ECAP_PT for vSVM. So > > just comes to realize that the current replay solution may not work well > > when I > expose ECAP_PT to guest. > > I also have a rough idea here. The current listener in container > > listens to address space named with devfn if virtual VTd is added. How > > about adding one more listener to listen memory address space. So that the > listener can build entire guest PA->HPA mapping. > > This is only needed for PT. So looks like current code is sufficient to do > this I think. > See the else part of if (memory_region_is_iommu()) of > vfio_listener_region_add().
Jason, when the listener listen to device address space, the "else part" may not work even we set the mr->iommu_ops = NULL. The mr would be a non-ram region when the time region_add is called since it is actually listen to changes from device address space. Regards, Yi L