On Wed, Jan 17, 2024 at 11:29:08AM +0100, Eric Auger wrote:
> Hi Peter,

Hi, Eric,

Thanks for the reviews!

> 
> On 1/17/24 10:15, pet...@redhat.com wrote:
> > From: Peter Xu <pet...@redhat.com>
> >
> > There're issue reported that when syetem_reset the VM with an intel iommu
> system_reset
> > device and MT2892 PF(mlx5_core driver), the host kernel throws DMAR error.
> >
> >   https://issues.redhat.com/browse/RHEL-7188
> >
> > Alex quickly spot a possible issue on ordering of device resets.
> >
> > It's verified by our QE team then that it is indeed the root cause of the
> > problem.  Consider when vIOMMU is reset before a VFIO device in a system
> > reset: the device can be doing DMAs even if the vIOMMU is gone; in this
> > specific context it means the shadow mapping can already be completely
> > destroyed.  Host will see these DMAs as malicious and report.
> That's curious we did not get this earlier?

I sincerely don't know.  It could be that we just didn't test much on
system resets. Or, we could have overlooked the host dmesgs; after all the
error messages can be benign from functional pov.

> >
> > To fix it, we'll need to make sure all devices under the vIOMMU device
> > hierachy will be reset before the vIOMMU itself.  There's plenty of trick
> > inside, one can get those by reading the last patch.
> Not sure what you meant here ;-)

I meant "how to make sure all the vIOMMU managed devices will be reset
before the vIOMMU" is tricky on the implementation.  I didn't reference any
of those in the cover letter, because I think I stated mostly in patch 4, I
want to reference that patch for the details.  Since I think it's very
tricky, I left that major comment in the code to persist.

Thanks,

-- 
Peter Xu


Reply via email to