December 26, 2019 12:59 PM, "awokd' via qubes-users" 
<[email protected]> wrote:

> Claudia:
> 
> TLDR; check bottom of https://community.amd.com/thread/241650, looks
> like there was a recently released related updated. Not sure if
> applicable to your situation.

Thanks for the link! I'm not sure if it affects me or not. I did install a Dell 
BIOS update dated March 2019, so it sounds like that could have contained this 
Agesa update. So downgrading might fix the grouping issue, but this update also 
contained an "urgent" security update which I'd have to look into before 
downgrading.

>> This caused a very sneaky problem on my machine. My USB controllers are in 
>> the same group as my
>> GPU, sound card, and SATA controller. So when sys-usb (or 
>> rd.qubes.hide_all_usb) takes over those
>> two USB controllers, everything stops working. [4] It was quite difficult to 
>> trace. It would have
>> been much easier to diagnose if grouping was enforced somewhere. I would 
>> much rather have an error
>> in my logs about being unable to assign USB controllers, than have my whole 
>> screen freeze up with
>> no indication why. (I got lucky that it just crashed; if something 
>> interferes with your SATA
>> controller's address space it can cause disk corruption. [5])
>> 
>> I don't really know who's at fault here. Qubes? Xen? AMD? Dell?
> 
> The improper grouping is probably somewhere in AGESA, which is provided
> to the manufacturers by AMD. It could be because of hardware related
> limitations, which again are supplied by AMD. Sometimes vendors take
> liberties (cost cutting measures) with both and break functionality, as
> their primary/sole concern is that Windows boots. This can especially be
> the case with consumer class machines such as Ryzen. Agree it would be
> nice if Xen handled this failure mode more gracefully. Not sure there is
> much Qubes can do here, though. On the other hand, my older AMD
> (pre-Ryzen) consumer laptop running Coreboot has correct groupings.

Yeah, my impression is the firmware can influence IOMMU grouping to an extent, 
within the bounds of
the physical hardware. If this problem was indeed caused by an update then I 
assume it's (at least partly) firmware-related. According to that thread, a fix 
has been released for some boards/CPUs, "ComboPI", but the only feedback I can 
find on it is for Ryzen 3000-series which doesn't help me. Also I don't even 
know if or when my machine will receive a BIOS update with this Agesa fix.

I sort of blame Xen for not enforcing IOMMU grouping, especially considering 
that it hides that
info from the OS. KVM does enforce IOMMU grouping rules, so I don't see why Xen 
wouldn't. Xen
leaves it up to the user software to be careful what it passes where, but 
that's kind of hard when
you don't have /sys/kernel/iommu_groups for a hint.

>> Intel systems
>> seem to just to have better grouping usually (or, are less likely to crash 
>> when grouping rules are
>> violated). [6]
> 
> I think that is overbroad. There are plenty of Intel systems with broken
> passthrough. iommu=no-igfx itself is a workaround for broken passthrough
> of Intel graphics. There are also plenty of AMD systems with properly
> implemented passthrough.

Very possible. I don't have experience with a lot of other hardware, so I'm 
just going by what I've
heard. It definitely seems to be a Ryzen problem at least, maybe not AMD in 
general. I just seemed
to come across a lot more complaints about AMD than Intel, though. It would be 
nice if the HCL
contained more detailed information about the IOMMU such as grouping, so we 
could get a better
idea. At any rate, that's the least of my worries.

TBH I don't really understand what no-igfx does, so I don't know if an 
AMD-equivalent option would help in this case or not. It's just worth noting 
that it's an Intel-specific fix which could improve Intel compatibility 
compared to AMD generally.

>> Thoughts? Is there anything Qubes can do to do avoid splitting up IOMMU 
>> groups? Is there anything
>> Qubes *should* do? Should Qubes attempt to guess the IOMMU groups before 
>> taking over devices?
>> Should the USB Qube option be disabled on AMD systems (you can still 
>> manually set up sys-usb of
>> course)? Should we just blame Xen for not enforcing IOMMU groups in the 
>> first place?
> 
> Ultimately, it's a hardware/firmware issue. Threadripper and Epyc based
> AMD systems ought to be more thoroughly vetted to support passthrough.
> My suggestions are to disable automatic IOMMU grouping in your UEFI
> configuration, if possible. Otherwise, try a newer firmware version with
> updated AGESA code and see if it helps, or possibly add a card with
> additional USB controllers as that should appear in its own group.

There is no way to enable or disable automatic IOMMU grouping in my bios. The 
only options are IOMMU
enabled or disabled, as far as I can tell. There is no newer firmware for this 
machine at this
time. Not sure about microcode, though. This is a laptop, so I can't add any 
cards.

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/ee58fd69e8de3703866ca8cdaa68d389%40disroot.org.

Reply via email to