I'm very new to all this iommu stuff, but as I understand it, devices in the 
same iommu group are
supposed to be treated as a single unit, meaning if any of them are assigned to 
a VM then they all
must be assigned to the same VM. This is because those devices cannot be 
isolated from each other
-- they can communicate directly without going through the IOMMU at all, for 
example.

This not only creates obvious security holes, but can also cause compatibility 
problems. For
example, devices in different VMs will have different perspectives of system 
memory. [3] Or 
something like that. Like I said, I'm still trying to wrap my head around it.

Point is, the entire group is supposed to be treated as a single unit. 
Linux/KVM enforces this, Xen
does not, I'm not sure about any other platforms. [1]

This caused a very sneaky problem on my machine. My USB controllers are in the 
same group as my
GPU, sound card, and SATA controller. So when sys-usb (or 
rd.qubes.hide_all_usb) takes over those
two USB controllers, everything stops working. [4] It was quite difficult to 
trace. It would have
been much easier to diagnose if grouping was enforced somewhere. I would much 
rather have an error
in my logs about being unable to assign USB controllers, than have my whole 
screen freeze up with
no indication why. (I got lucky that it just crashed; if something interferes 
with your SATA 
controller's address space it can cause disk corruption. [5])

I don't really know who's at fault here. Qubes? Xen? AMD? Dell?

Unfortunately, Qubes has no way of knowing anything about iommu grouping 
because Xen takes over the
IOMMU (and therefore grouping is not visible in dom0). [2] So probably the only 
way Qubes could
enforce grouping is by some kind of heuristic. For example, assume all 
functions of a device are 
grouped. Or, assume all devices on a hub are grouped. Or just disable the USB 
Qube option on AMD 
systems entirely, or warn the user that it may cause serious problems that are 
hard to diagnose.

As for fixing the actual problem, that is, grouping them in a more sensible way 
so that the GPU and
USB controllers can be isolated for example, can only be done in a firmware (or 
microcode?) update
by the vendor, if at all. There are some hacks for KVM to spoof the grouping 
restrictions (which
Xen doesn't enforce in the first place), but they don't solve the underlying 
problem. VFIO seems
like it could work (by emulating some IOMMU functionality in software), but I 
don't know if it's
supported by Xen.

I'm guessing part of the reason this problem doesn't usually come up on Intel 
systems is because of
the Xen option iommu=no-igfx. This means that the integrated GPU is always 
exempt from IOMMU
control altogether, but this option is Intel-specific and has no AMD 
equivalent. However, that 
doesn't do anything about other devices such as sound cards or SATA 
controllers. Intel systems
seem to just to have better grouping usually (or, are less likely to crash when 
grouping rules are
violated). [6]

At least that's my understanding so far. 

Thoughts? Is there anything Qubes can do to do avoid splitting up IOMMU groups? 
Is there anything
Qubes *should* do? Should Qubes attempt to guess the IOMMU groups before taking 
over devices?
Should the USB Qube option be disabled on AMD systems (you can still manually 
set up sys-usb of
course)? Should we just blame Xen for not enforcing IOMMU groups in the first 
place? 

[1] https://lists.gt.net/xen/devel/345279#345279
[2] http://xen.1045712.n5.nabble.com/IOMMU-group-dissapear-in-XEN-td5737357.html
[3] https://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
[4] https://www.mail-archive.com/[email protected]/msg31494.html
[5] 
http://xen.1045712.n5.nabble.com/VGA-passthrough-with-USB-passthrough-td5738340.html
[6] 
https://hardforum.com/threads/ryzen-and-iommu-groups-is-this-ever-going-to-get-fixed.1944064

---

Dell Inspiron 5575, AMD Ryzen 5 2500U, Qubes R4.1 booted without Xen: 

# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root 
Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 
00h-1fh) PCIe Dummy Host
Bridge
00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP 
Bridge [6:0]
00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP 
Bridge [6:0]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 
00h-1fh) PCIe Dummy Host
Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal 
PCIe GPP Bridge 0 to
Bus A
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal 
PCIe GPP Bridge 0 to
Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 7
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI 
Express Fast Ethernet
controller (rev 07)
02:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless Network 
Adapter (rev 31)
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven 
Ridge [Radeon Vega
Series / Radeon Vega Mobile Series] (rev c4)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] 
Raven/Raven2/Fenghuang HDMI/DP Audio
Controller
03:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h 
(Models 10h-1fh)
Platform Security Processor
03:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
03:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
03:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 
10h-1fh) HD Audio
Controller
04:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller 
[AHCI mode] (rev
61)

# lspci -t
-[0000:00]-+-00.0
+-00.2
+-01.0
+-01.6-[01]----00.0
+-01.7-[02]----00.0
+-08.0
+-08.1-[03]--+-00.0
| +-00.1
| +-00.2
| +-00.3
| +-00.4
| \-00.6
+-08.2-[04]----00.0
+-14.0
+-14.3
+-18.0
+-18.1
+-18.2
+-18.3
+-18.4
+-18.5
+-18.6
\-18.7

# tree /sys/kernel/iommu_groups/
├── 0
│ ├── devices
│ │ └── 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
│ ├── reserved_regions
│ └── type
├── 1
│ ├── devices
│ │ └── 0000:00:01.6 -> ../../../../devices/pci0000:00/0000:00:01.6
│ ├── reserved_regions
│ └── type
├── 2
│ ├── devices
│ │ └── 0000:00:01.7 -> ../../../../devices/pci0000:00/0000:00:01.7
│ ├── reserved_regions
│ └── type
├── 3
│ ├── devices
│ │ ├── 0000:00:08.0 -> ../../../../devices/pci0000:00/0000:00:08.0
│ │ ├── 0000:00:08.1 -> ../../../../devices/pci0000:00/0000:00:08.1
│ │ ├── 0000:00:08.2 -> ../../../../devices/pci0000:00/0000:00:08.2
│ │ ├── 0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.0
│ │ ├── 0000:03:00.1 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.1
│ │ ├── 0000:03:00.2 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.2
│ │ ├── 0000:03:00.3 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.3
│ │ ├── 0000:03:00.4 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.4
│ │ ├── 0000:03:00.6 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.6
│ │ └── 0000:04:00.0 -> ../../../../devices/pci0000:00/0000:00:08.2/0000:04:00.0
│ ├── reserved_regions
│ └── type
├── 4
│ ├── devices
│ │ ├── 0000:00:14.0 -> ../../../../devices/pci0000:00/0000:00:14.0
│ │ └── 0000:00:14.3 -> ../../../../devices/pci0000:00/0000:00:14.3
│ ├── reserved_regions
│ └── type
├── 5
│ ├── devices
│ │ ├── 0000:00:18.0 -> ../../../../devices/pci0000:00/0000:00:18.0
│ │ ├── 0000:00:18.1 -> ../../../../devices/pci0000:00/0000:00:18.1
│ │ ├── 0000:00:18.2 -> ../../../../devices/pci0000:00/0000:00:18.2
│ │ ├── 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
│ │ ├── 0000:00:18.4 -> ../../../../devices/pci0000:00/0000:00:18.4
│ │ ├── 0000:00:18.5 -> ../../../../devices/pci0000:00/0000:00:18.5
│ │ ├── 0000:00:18.6 -> ../../../../devices/pci0000:00/0000:00:18.6
│ │ └── 0000:00:18.7 -> ../../../../devices/pci0000:00/0000:00:18.7
│ ├── reserved_regions
│ └── type
├── 6
│ ├── devices
│ │ └── 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.6/0000:01:00.0
│ ├── reserved_regions
│ └── type
└── 7
├── devices
│ └── 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.7/0000:02:00.0
├── reserved_regions
└── type

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/7f613e09f34cdb11666352d07d0c1dcb%40disroot.org.

Reply via email to