On 30/06/2025 07:37, Jahan Murudi wrote:
Hi Julien,
On 25/06/2025 16:53, Julien Grall wrote:
Hi Jahan,
+ dsb(sy);
Any clue why Linux (mainline) does not do that?
One process remark, we typically comment inline rather than pasting a quote and
replying at the top of the e-mail.
Thanks for the style note - I'll follow the inline commenting convention
moving forward.
The implementation writel() which contains an implicit dsb(st) which likely
sufficient for Linux for its Stage-1 IOMMU usage where CPU and IOMMU
interactions are coherent.
However, Xen uses the IPMMU as a Stage-2 IOMMU for non-coherent DMA operations (such
as PCIe passthrough), requiring the stronger dsb(sy) to ensure writes fully propagate
to the IPMMU >>hardware before continuing.
I don't follow. Are you saying the IPMMU driver in Linux doesn't non-coherent
DMA operations?
Let me clarify my understanding: In native Linux, the IOMMU works at stage-1 (VA -> PA) and typically assumes coherency between CPU and IOMMU. The implicit dsb(st) in writel() is enough there. But in Xen, we use this as stage-2 (GPA -> HPA) for cases like PCI passthrough where devices might be non-coherent.
I understand for the PCI passthrough, Xen will be using stage-2, so in
theory the stage-1 could be used by the guest OS. But ultimately, this
is the same PCI device behind. So if it is not coherent, it should be
for both stages. Do you have any pointer to the documentation that would
state otherwise?
> We might need stronger barrier dsb(sy) in xen because: 1) We can't
assume the TLB walker is coherent for stage -2
Why would the TLB walker coherent for stage-2 but not stage-1? Any
pointer to the documentation?
Note, I just noticed that IOMMU_FEAT_COHERENT_WALK is not set for the
IPMMU. So the "dsb sy" is coherent. However, I find doubful an IOMMU
would have a difference of coherency between two stages. So maybe we
should set the flag either unconditionally or based on a register.
> and we must also prevent(minimise) any DMA operations during TLB
invalidation( observed some IPMMU hardware limitations in the
documentation) .
I don't understand what you wrote in parentheses. But isn't it what you
wrote all true for stage-1?
Cheers,
--
Julien Grall