On 21/11/2023 08:43, Zhenzhong Duan wrote: > Hi, > > Thanks all for giving guides and comments on previous series, this is > the remaining part of the iommufd support. > > Besides suggested changes in v6, I'd like to highlight two changes > for final review: > 1. Instantiate can_be_deleted callback to fix race where iommufd object > can be deleted before vfio device > 2. After careful re-thinking, I'd like to follow Nicolin's suggestion in v5, > remove is_ioas check which indeed looks heavy just for tracepoint. > In fact we can get corresponding info by looking over trace context. > > PATCH 1: Introduce iommufd object > PATCH 2-9: add IOMMUFD container and cdev support > PATCH 10-17: fd passing for cdev and linking to IOMMUFD > PATCH 18: make VFIOContainerBase parameter const > PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 > PATCH 22-26: vfio device init code cleanup > PATCH 27: add iommufd doc > > > We have done wide test with different combinations, e.g: > - PCI device were tested > - FD passing and hot reset with some trick. > - device hotplug test with legacy and iommufd backends > - with or without vIOMMU for legacy and iommufd backends > - divices linked to different iommufds > - VFIO migration with a E800 net card(no dirty sync support) passthrough > - platform, ccw and ap were only compile-tested due to environment limit > - test mdev pass through with mtty and mix with real device and different BE > - test iommufd object hotplug/unplug and mix with vfio device plug/unplug > > Given some iommufd kernel limitations, the iommufd backend is > not yet fully on par with the legacy backend w.r.t. features like: > - p2p mappings (you will see related error traces) > - dirty page sync
Just as a quick update: I intend to follow-up with this (alongside my other debt) when I am back from vacation ~2weeks from now. The one thing needed before dirty tracking enabling is a userspace autodomains that's equivalent to the kernel. I don't know what to do about mdevs tbh, considering there's no hw domain going on it's just telling iommufd that 'hey I am accessing these pages' (IIUC I could be missing something). Right now it just falls back to IOAS attach instead of hwpt-id attach if it all fails; Here's a partial snip below that is trying to match kernel iommufd_device_auto_get_domain(). It's not that far from Zhenzhong v5 implementation, except that we handle the -EINVAL from the attach to allocate a new hwpt for said device, and if that fails then just bail out like the kernel. static int iommufd_cdev_attach_container(VFIODevice *vbasedev, VFIOIOMMUFDContainer *container, Error **errp) { - return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp); + uint32_t pt_id; + + if (!iommufd_cdev_autodomains_get(vbasedev, container, &pt_id, errp)) { + return 0; + } + QLIST_FOREACH(hwpt, &container->hwpt_list, next) { + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err); + if (ret) { + /* -EINVAL means the domain is incompatible with the device. */ + if (ret == -EINVAL) { + continue; + } + return ret; + } else { + vbasedev->hwpt = hwpt; + *hwpt_id = hwpt->hwpt_id; + return 0; + } + } + + ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid, + container->ioas_id, 0, 0, 0, + NULL, hwpt_id); + if (ret) { + error_setg_errno(errp, errno, "error alloc shadow hwpt"); + return ret; + } + + hwpt = g_malloc0(sizeof(*hwpt)); + hwpt->hwpt_id = *hwpt_id; + QLIST_INIT(&hwpt->device_list); + + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, &err); + if (ret) { + iommufd_backend_free_id(vbasedev->iommufd, hwpt->hwpt_id); + g_free(hwpt); + return ret; + } + + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next); + return 0; +} + static int iommufd_cdev_attach_container(VFIODevice *vbasedev, VFIOIOMMUFDContainer *container, Error **errp) { - return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp); + uint32_t pt_id; + + if (!iommufd_cdev_autodomains_get(vbasedev, container, &pt_id, errp)) { + return 0; + } + + pt_id = container->ioas_id; + return iommufd_cdev_attach_ioas_hwpt(vbasedev, pt_id, errp); }