On 2020/6/29 下午9:34, Peter Xu wrote:
On Mon, Jun 29, 2020 at 01:51:47PM +0800, Jason Wang wrote:
On 2020/6/28 下午10:47, Peter Xu wrote:
On Sun, Jun 28, 2020 at 03:03:41PM +0800, Jason Wang wrote:
On 2020/6/27 上午5:29, Peter Xu wrote:
Hi, Eugenio,

(CCing Eric, Yan and Michael too)

On Fri, Jun 26, 2020 at 08:41:22AM +0200, Eugenio Pérez wrote:
diff --git a/memory.c b/memory.c
index 2f15a4b250..7f789710d2 100644
--- a/memory.c
+++ b/memory.c
@@ -1915,8 +1915,6 @@ void memory_region_notify_one(IOMMUNotifier *notifier,
-    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
I can understand removing the assertion should solve the issue, however imho
the major issue is not about this single assertion but the whole addr_mask
issue behind with virtio...
I don't get here, it looks to the the range was from guest IOMMU drivers.
Yes.  Note that I didn't mean that it's a problem in virtio, it's just the fact
that virtio is the only one I know that would like to support arbitrary address
range for the translated region.  I don't know about tcg, but vfio should still
need some kind of page alignment in both the address and the addr_mask.  We
have that assumption too across the memory core when we do translations.

Right but it looks to me the issue is not the alignment.

A further cause of the issue is the MSI region when vIOMMU enabled - currently
we implemented the interrupt region using another memory region so it split the
whole DMA region into two parts.  That's really a clean approach to IR
implementation, however that's also a burden to the invalidation part because
then we'll need to handle things like this when the listened range is not page
alighed at all (neither 0-0xfedffff, nor 0xfef0000-MAX).  If without the IR
region (so the whole iommu address range will be a single FlatRange),

Is this a bug? I remember that at least for vtd, it won't do any DMAR on the
intrrupt address range
I don't think it's a bug, at least it's working as how I understand...  that
interrupt range is using an IR region, that's why I said the IR region splits
the DMAR region into two pieces, so we have two FlatRange for the same

I don't check the qemu code but if "a single FlatRange" means 0xFEEx_xxxx is subject to DMA remapping, OS need to setup passthrough mapping for that range in order to get MSI to work. This is not what vtd spec said:


3.14 Handling Requests to Interrupt Address Range

Requests without PASID to address range 0xFEEx_xxxx are treated as
potential interrupt requests and are not subjected to DMA remapping
(even if translation structures specify a mapping for this
range). Instead, remapping hardware can be enabled to subject such
interrupt requests to interrupt remapping.


My understanding is vtd won't do any DMA translation on 0xFEEx_xxxx even if IR is not enabled.

   I think
we probably don't need most of the logic in vtd_address_space_unmap() at all,
then we can directly deliver all the IOTLB invalidations without splitting into
small page aligned ranges to all the iommu notifiers.  Sadly, so far I still
don't have ideal solution for it, because we definitely need IR.

Another possible (theoretical) issue (for vhost) is that it can't trigger
interrupt through the interrupt range.
Hmm.. Could you explain?  When IR is enabled, all devices including virtio
who send interrupt to 0xfeeXXXXX should be trapped by IR.

I meant vhost not virtio, if you teach vhost to DMA to 0xFEEx_xxxx, it can't generate any interrupts as expected.

For normal IOTLB invalidations, we were trying our best to always make
IOMMUTLBEntry contain a valid addr_mask to be 2**N-1.  E.g., that's what we're
doing with the loop in vtd_address_space_unmap().
I'm sure such such assumption can work for any type of IOMMU.

But this is not the first time that we may want to break this assumption for
virtio so that we make the IOTLB a tuple of (start, len), then that len can be
not a address mask any more.  That seems to be more efficient for things like
vhost because iotlbs there are not page based, so it'll be inefficient if we
always guarantee the addr_mask because it'll be quite a lot more roundtrips of
the same range of invalidation.  Here we've encountered another issue of
triggering the assertion with virtio-net, but only with the old RHEL7 guest.

I'm thinking whether we can make the IOTLB invalidation configurable by
specifying whether the backend of the notifier can handle arbitary address
range in some way.  So we still have the guaranteed addr_masks by default
(since I still don't think totally break the addr_mask restriction is wise...),
however we can allow the special backends to take adavantage of using arbitary
(start, len) ranges for reasons like performance.

To do that, a quick idea is to introduce a flag IOMMU_NOTIFIER_ARBITRARY_MASK
to IOMMUNotifierFlag, to declare that the iommu notifier (and its backend) can
take arbitrary address mask, then it can be any value and finally becomes a
length rather than an addr_mask.  Then for every iommu notify() we can directly
deliver whatever we've got from the upper layer to this notifier.  With the new
flag, vhost can do iommu_notifier_init() with UNMAP|ARBITRARY_MASK so it
declares this capability.  Then no matter for device iotlb or normal iotlb, we
skip the complicated procedure to split a big range into small ranges that are
with strict addr_mask, but directly deliver the message to the iommu notifier.
E.g., we can skip the loop in vtd_address_space_unmap() if the notifier is with
ARBITRARY flag set.
I'm not sure coupling IOMMU capability to notifier is the best choice.
IMHO it's not an IOMMU capability.  The flag I wanted to introduce is a
capability of the one who listens to the IOMMU TLB updates.  For our case, it's
virtio/vhost's capability to allow arbitrary length. The IOMMU itself
definitely has some limitation on the address range to be bound to an IOTLB
invalidation, e.g., the device-iotlb we're talking here only accept both the
iova address and addr_mask to be aligned to 2**N-1.

I think this go back to one of our previous discussion of whether to
introduce a dedicated notifiers for device IOTLB.

For IOMMU, it might have limitation like GAW, but for device IOTLB it
probably doesn't. That's the reason we hit the assert here.
I feel like even for hardware it shouldn't be arbitrary either,

Yes, but from the view of IOMMU, it's hard to know about that. Allowing [0, ~0ULL] looks sane.

  because the
device iotlb sent from at least vt-d driver is very restricted too (borrowing
the comment you wrote :):

     /* According to ATS spec table 2.4:
      * S = 0, bits 15:12 = xxxx     range size: 4K
      * S = 1, bits 15:12 = xxx0     range size: 8K
      * S = 1, bits 15:12 = xx01     range size: 16K
      * S = 1, bits 15:12 = x011     range size: 32K
      * S = 1, bits 15:12 = 0111     range size: 64K
      * ...

Right, but the comment is probably misleading here, since it's for the PCI-E transaction between IOMMU and device not for the device IOTLB invalidation descriptor.

For device IOTLB invalidation descriptor, spec allows a [0, ~0ULL] invalidation:

" Device-TLB Invalidate Descriptor


Size (S): The size field indicates the number of consecutive pages targeted by this invalidation request. If S field is zero, a single page at page address specified by Address [63:12] is requested to be invalidated. If S field is Set, the least significant bit in the Address field with value 0b indicates the invalidation address range. For example, if S field is Set and Address[12] is Clear, it indicates an 8KB invalidation address range with base address in Address [63:13]. If S field and Address[12] is Set and bit 13 is Clear, it indicates a 16KB invalidation address range with base
address in Address [63:14], etc.


So if we receive an address whose [63] is 0 and the rest is all 1, it's then a [0, ~0ULL] invalidation.

How about just convert to use a range [start, end] for any notifier and move
the checks (e.g the assert) into the actual notifier implemented (vhost or
IOMMUTLBEntry itself is the abstraction layer of TLB entry.  Hardware TLB entry
is definitely not arbitrary range either (because AFAICT the hardware should
only cache PFN rather than address, so at least PAGE_SIZE aligned).
Introducing this flag will already make this trickier just to avoid introducing
another similar struct to IOMMUTLBEntry, but I really don't want to make it a
default option...  Not to mention I probably have no reason to urge the rest
iommu notifier users (tcg, vfio) to change their existing good code to suite
any of the backend who can cooperate with arbitrary address ranges...

Ok, so it looks like we need a dedicated notifiers to device IOTLB.
Or we can also make a new flag for device iotlb just like current UNMAP? Then
we replace the vhost type from UNMAP to DEVICE_IOTLB.  But IMHO using the
ARBITRARY_LENGTH flag would work in a similar way.  DEVICE_IOTLB flag could
also allow virtio/vhost to only receive one invalidation (now IIUC it'll
receive both iotlb and device-iotlb for unmapping a page when ats=on), but then
ats=on will be a must and it could break some old (misconfiged) qemu because
afaict previously virtio/vhost could even work with vIOMMU (accidentally) even
without ats=on.

That's a bug and I don't think we need to workaround mis-configurated qemu :)



Reply via email to