I'm very new to all this iommu stuff, but as I understand it, devices in the same iommu group are supposed to be treated as a single unit, meaning if any of them are assigned to a VM then they all must be assigned to the same VM. This is because those devices cannot be isolated from each other -- they can communicate directly without going through the IOMMU at all, for example.
This not only creates obvious security holes, but can also cause compatibility problems. For example, devices in different VMs will have different perspectives of system memory. [3] Or something like that. Like I said, I'm still trying to wrap my head around it. Point is, the entire group is supposed to be treated as a single unit. Linux/KVM enforces this, Xen does not, I'm not sure about any other platforms. [1] This caused a very sneaky problem on my machine. My USB controllers are in the same group as my GPU, sound card, and SATA controller. So when sys-usb (or rd.qubes.hide_all_usb) takes over those two USB controllers, everything stops working. [4] It was quite difficult to trace. It would have been much easier to diagnose if grouping was enforced somewhere. I would much rather have an error in my logs about being unable to assign USB controllers, than have my whole screen freeze up with no indication why. (I got lucky that it just crashed; if something interferes with your SATA controller's address space it can cause disk corruption. [5]) I don't really know who's at fault here. Qubes? Xen? AMD? Dell? Unfortunately, Qubes has no way of knowing anything about iommu grouping because Xen takes over the IOMMU (and therefore grouping is not visible in dom0). [2] So probably the only way Qubes could enforce grouping is by some kind of heuristic. For example, assume all functions of a device are grouped. Or, assume all devices on a hub are grouped. Or just disable the USB Qube option on AMD systems entirely, or warn the user that it may cause serious problems that are hard to diagnose. As for fixing the actual problem, that is, grouping them in a more sensible way so that the GPU and USB controllers can be isolated for example, can only be done in a firmware (or microcode?) update by the vendor, if at all. There are some hacks for KVM to spoof the grouping restrictions (which Xen doesn't enforce in the first place), but they don't solve the underlying problem. VFIO seems like it could work (by emulating some IOMMU functionality in software), but I don't know if it's supported by Xen. I'm guessing part of the reason this problem doesn't usually come up on Intel systems is because of the Xen option iommu=no-igfx. This means that the integrated GPU is always exempt from IOMMU control altogether, but this option is Intel-specific and has no AMD equivalent. However, that doesn't do anything about other devices such as sound cards or SATA controllers. Intel systems seem to just to have better grouping usually (or, are less likely to crash when grouping rules are violated). [6] At least that's my understanding so far. Thoughts? Is there anything Qubes can do to do avoid splitting up IOMMU groups? Is there anything Qubes *should* do? Should Qubes attempt to guess the IOMMU groups before taking over devices? Should the USB Qube option be disabled on AMD systems (you can still manually set up sys-usb of course)? Should we just blame Xen for not enforcing IOMMU groups in the first place? [1] https://lists.gt.net/xen/devel/345279#345279 [2] http://xen.1045712.n5.nabble.com/IOMMU-group-dissapear-in-XEN-td5737357.html [3] https://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html [4] https://www.mail-archive.com/[email protected]/msg31494.html [5] http://xen.1045712.n5.nabble.com/VGA-passthrough-with-USB-passthrough-td5738340.html [6] https://hardforum.com/threads/ryzen-and-iommu-groups-is-this-ever-going-to-get-fixed.1944064 --- Dell Inspiron 5575, AMD Ryzen 5 2500U, Qubes R4.1 booted without Xen: # lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A 00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 07) 02:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (rev 31) 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c4) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller 03:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 03:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 03:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 03:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller 04:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 61) # lspci -t -[0000:00]-+-00.0 +-00.2 +-01.0 +-01.6-[01]----00.0 +-01.7-[02]----00.0 +-08.0 +-08.1-[03]--+-00.0 | +-00.1 | +-00.2 | +-00.3 | +-00.4 | \-00.6 +-08.2-[04]----00.0 +-14.0 +-14.3 +-18.0 +-18.1 +-18.2 +-18.3 +-18.4 +-18.5 +-18.6 \-18.7 # tree /sys/kernel/iommu_groups/ ├── 0 │ ├── devices │ │ └── 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0 │ ├── reserved_regions │ └── type ├── 1 │ ├── devices │ │ └── 0000:00:01.6 -> ../../../../devices/pci0000:00/0000:00:01.6 │ ├── reserved_regions │ └── type ├── 2 │ ├── devices │ │ └── 0000:00:01.7 -> ../../../../devices/pci0000:00/0000:00:01.7 │ ├── reserved_regions │ └── type ├── 3 │ ├── devices │ │ ├── 0000:00:08.0 -> ../../../../devices/pci0000:00/0000:00:08.0 │ │ ├── 0000:00:08.1 -> ../../../../devices/pci0000:00/0000:00:08.1 │ │ ├── 0000:00:08.2 -> ../../../../devices/pci0000:00/0000:00:08.2 │ │ ├── 0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.0 │ │ ├── 0000:03:00.1 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.1 │ │ ├── 0000:03:00.2 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.2 │ │ ├── 0000:03:00.3 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.3 │ │ ├── 0000:03:00.4 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.4 │ │ ├── 0000:03:00.6 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.6 │ │ └── 0000:04:00.0 -> ../../../../devices/pci0000:00/0000:00:08.2/0000:04:00.0 │ ├── reserved_regions │ └── type ├── 4 │ ├── devices │ │ ├── 0000:00:14.0 -> ../../../../devices/pci0000:00/0000:00:14.0 │ │ └── 0000:00:14.3 -> ../../../../devices/pci0000:00/0000:00:14.3 │ ├── reserved_regions │ └── type ├── 5 │ ├── devices │ │ ├── 0000:00:18.0 -> ../../../../devices/pci0000:00/0000:00:18.0 │ │ ├── 0000:00:18.1 -> ../../../../devices/pci0000:00/0000:00:18.1 │ │ ├── 0000:00:18.2 -> ../../../../devices/pci0000:00/0000:00:18.2 │ │ ├── 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3 │ │ ├── 0000:00:18.4 -> ../../../../devices/pci0000:00/0000:00:18.4 │ │ ├── 0000:00:18.5 -> ../../../../devices/pci0000:00/0000:00:18.5 │ │ ├── 0000:00:18.6 -> ../../../../devices/pci0000:00/0000:00:18.6 │ │ └── 0000:00:18.7 -> ../../../../devices/pci0000:00/0000:00:18.7 │ ├── reserved_regions │ └── type ├── 6 │ ├── devices │ │ └── 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.6/0000:01:00.0 │ ├── reserved_regions │ └── type └── 7 ├── devices │ └── 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.7/0000:02:00.0 ├── reserved_regions └── type -- You received this message because you are subscribed to the Google Groups "qubes-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/qubes-users/7f613e09f34cdb11666352d07d0c1dcb%40disroot.org.
