On Wed, Sep 07, 2016 at 08:20:35PM +1000, David Gibson wrote: > On Wed, Sep 07, 2016 at 03:09:16PM +0800, Peter Xu wrote: > > On Wed, Sep 07, 2016 at 04:02:39PM +1000, David Gibson wrote: > > > On Wed, Sep 07, 2016 at 01:32:22PM +0800, Peter Xu wrote: > > > > IOMMU Notifier list is used for notifying IO address mapping changes. > > > > Currently VFIO is the only user. > > > > > > > > However it is possible that future consumer like vhost would like to > > > > only listen to part of its notifications (e.g., cache invalidations). > > > > > > > > This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer > > > > grained control of it. > > > > > > > > IOMMUNotifier contains a bitfield for the notify consumer describing > > > > what kind of notification it is interested in. Currently two kinds of > > > > notifications are defined: > > > > > > > > - IOMMU_NOTIFIER_CHANGE: for entry changes (additions) > > > > - IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates) > > > > > > As noted on the other thread, I think the correct options for your > > > bitmap here are "map" and "unmap". Which are triggered depends on the > > > permissions / existence of the *previous* mapping, as well as the new > > > one. > > > > As mentioned in previous reply, both work for me. :) > > > > If you insist on changing it (without anyone that strongly > > disagree...), I can do it in next spin. > > This is not just a name change I'm proposing, but a semantic change > (or at least a clarification).
I see that kernel IOMMU driver is using map_page() and unmap_page() for its interface. Now I prefer MAP/UNMAP. > > > > You could in fact have "map-read", "map-write", "unmap-read", > > > "unmap-write" as separate bitmap options (e.g. changing a mapping from > > > RO to WO would be both a read-unmap and write-map event). I can't see > > > any real use for that though, so just "map" and "unmap" are probably > > > sufficient. > > > > Agreed. We can enhance it in the future if there is any real > > requirement. Before that, it would be over-engineering. > > > > (Btw, we should not need {read|write}_unmap in all cases. IIUC unmap > > should not need any permission check.) > > In practice probably not, but they are distinct operations. > read_unmap means a readable mapping has been removed, write_unmap > means a writable mapping has been removed. Again - the permissions on > the *old* mapping are what matters here. Why they are distinct operations? Or could you help explain in what case would we need this flag (read/write) for an unmap() operation? > > > > > [...] > > > > > > -static void vfio_iommu_map_notify(Notifier *n, void *data) > > > > +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data) > > > > { > > > > VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n); > > > > VFIOContainer *container = giommu->container; > > > > @@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener > > > > *listener, > > > > section->offset_within_region; > > > > giommu->container = container; > > > > giommu->n.notify = vfio_iommu_map_notify; > > > > + giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL; > > > > > > "caps" isn't really right. It's a *requirement* that VFIO get all the > > > notifications, not a capability. "caps" would only make sense on the > > > other side (the vIOMMU implementation). > > > > Sounds reasonable. How about "flags"? Or any suggestion? > > "flags" would do. I feel like there should be a better name, but I > can't think of it. Sure. I can switch. > > > [...] > > > > > > /** > > > > @@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr, > > > > * IOMMU translation entries. > > > > * > > > > * @mr: the memory region to observe > > > > - * @n: the notifier to be added; the notifier receives a pointer to an > > > > - * #IOMMUTLBEntry as the opaque value; the pointer ceases to be > > > > - * valid on exit from the notifier. > > > > + * @n: the IOMMUNotifier to be added; the notify callback receives a > > > > + * pointer to an #IOMMUTLBEntry as the opaque value; the pointer > > > > + * ceases to be valid on exit from the notifier. > > > > */ > > > > -void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier > > > > *n); > > > > +void memory_region_register_iommu_notifier(MemoryRegion *mr, > > > > + IOMMUNotifier *n); > > > > > > It seems to me that this should be allowed to fail, if the notifier > > > you're trying to register requires notifications that the MR > > > implementation can't supply. That seems cleaner than delaying the > > > checking until the notification actually happens. > > > > Do we have real use case for this one? For example, do we have case > > that VM "registering required notifications that MR cannot support" > > can still work? > > > > Currently there is only one use case AFAIK, which is VFIO+IntelIOMMU. > > In that case, I take it a configuration error (we should never allow > > that configuration happen). IMHO All configuration errors should be > > reported ASAP, and we should never let VM start. > > Yes... I'm not proposing changing that. I just think it would be > cleaner to report the error through the error channels, instead of > just aborting. Agree. But since I have no obvious reason to change the return code, I'd prefer to keep it as it is for this series if you don't mind. [...] > > > > @@ -1564,8 +1569,22 @@ void > > > > memory_region_unregister_iommu_notifier(MemoryRegion *mr, Notifier *n) > > > > void memory_region_notify_iommu(MemoryRegion *mr, > > > > IOMMUTLBEntry entry) > > > > { > > > > + IOMMUNotifier *iommu_notifier; > > > > + IOMMUNotifierCap request_cap; > > > > + > > > > assert(memory_region_is_iommu(mr)); > > > > - notifier_list_notify(&mr->iommu_notify, &entry); > > > > + > > > > + if (entry.perm & IOMMU_RW) { > > > > + request_cap = IOMMU_NOTIFIER_CHANGE; > > > > + } else { > > > > + request_cap = IOMMU_NOTIFIER_INVALIDATION; > > > > + } > > > > > > As noted right at the top, I don't think this logic is really right. > > > An in-place change should be treated as both a map and unmap. > > > > IIUC, guest needs to send two invalidations for an in-place change, > > right? So a "change" will actually trigger this twice. > > No, or at least not necessarily. How the invalidate is reported > really depends on the guest side vIOMMU interface. Maybe it requires > an explicit invalidate followed by a set, maybe a direct in place > replacement is possible (the PAPR hypercall interface allows this at > least in theory). > > What I had in mind here is that assuming the vIOMMU can detect an in > place change, it would then ping all notifiers that have *either* the > MAP or UNMAP flag bits set. Yes, so I think we don't need a "change" interface. And maybe we should start using MAP/UNMAP for the flags naming to avoid unecessary misunderstandings (when we talk about CHANGE flag, it's actually map(), but not unmap() + map()). > > > (At least this should be true for Power, and I am guessing the same > > for Intel, otherwise PowerIOMMU+VFIO should not work before this > > series...) > > I don't really follow what you're saying here. I meant at least current Power IOMMU should be sending two notifies if a "change" happened, otherwise current code won't work. Thanks, -- peterx