Re: [PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"
On Mon, Oct 10, 2022 at 04:16:33PM -0700, David Woodhouse wrote: > On Mon, 2022-10-10 at 15:08 -0400, Peter Xu wrote: > > On Mon, Oct 10, 2022 at 10:39:52AM -0700, David Woodhouse wrote: > > > On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote: > > > > From: Peter Xu < > > > > pet...@redhat.com > > > > > > > > > > > > It's true that when vcpus<=255 we don't require the length of 32bit APIC > > > > IDs. However here since we already have EIM=ON it means the hypervisor > > > > will declare the VM as x2apic supported (e.g. VT-d ECAP register will > > > > have > > > > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width > > > > even if vcpus<=255. In short, commit 77250171bdc breaks any simple > > > > cmdline > > > > that wants to boot a VM with >=9 but <=255 vcpus with: > > > > > > I find that paragraph really hard to parse. What does it even mean that > > > "guest should assume the APIC IDs are 32bits"? > > > > Quotting EIM definition: > > > > 0: On Intel® 64 platforms, hardware supports only 8-bit APIC-IDs (xAPIC > > Mode). > > > > 1: On Intel® 64 platforms, hardware supports 32-bit APIC- IDs (x2APIC > > mode). Hardware implementation reporting Interrupt Remapping support > > (IR) field as Clear also report this field as Clear. > > > > I hope the statement was matching the spec. Please let me know if you have > > better way to reword it. > > It needs to mention logical mode addressing. Because that, I presume, > is why it broke only when you had more than 8 vCPUs. Because that's > when the *logical* destination ID grew past 0xFF. Agree. > > > > In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the > > > tables. Which is perfectly fine if there are only 254 CPUs anyway, and > > > we never need to use a higher value. > > > > > > I *think* the actual problem here is when logical addressing is used, > > > which puts the APIC cluster ID into higher bits? But it's kind of weird > > > that the message doesn't mention that at all? > > > > The commit message actually doesn't even need to contain a lot of > > information in this case, IMO. > > Well, it would be kind of useful if it said what the actual problem > was, no? Yes it'll be nice to have. > > > Literally it can be seen as a revert of a commit which breaks guest with > > > 8vcpu from boot. I kept the other lines because that still make sense, or > > > > it can be a full revert with "something broke with commit xxx, revert it to > > fix" and anything else could be reworked. AFAICT that's how it normally > > works with QEMU or Linux. > > > > I am not 100% familiar with the original purpose of the patch, would > > eim=off work for you even after patch applied? Anything severely wrong > > with this patch? > > I think the patch itself is fine; I'd just like the commit message to > be clearer about what the problem was. Thanks for confirming. > > > > That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI > > > FADT, isn't it? Then the only values that a guest may put into those > > > fields — 32-bit fields or not — are lower than 0xff anyway. > > > > It's still not clear to me why we need to make it inconsistent between the > > EIM we declare to the guest and the KVM behavior on understanding EIM bit. > > Even if enforced physical mode will work we loose the possibility of > > cluster mode, and I also don't see what's the major benefit since EIM=off > > will just work, afaiu, meanwhile make everything aligned. > > Yeah, I think turning EIM off is absolutely fine. > > > Are you fine if we proceed with this pull request first and revisit later? > > Follow up patches will always be fine, and we're unbreaking something. I > > have copied you since the 1st patch I posted and the small patch was there > > for weeks, it'll be appreciated if either you could comment earlier next > > time, or even propose a better fix then we can discuss what's the best way > > to fix. Thanks. > > Yeah, sorry for the delay. But that was partly because the commit > message was confusing me and it took me a while to work out what was > actually going on... which is really all I'm heckling now. I see, that was totally fine, and it'll be definitely also fine to comment anything even on the pull req. It's just that as I tried to argue for this specific case IMHO we should move on and revisit later so we shrink the regression window, rather than redo a pull and let this fix wait for another one. It seems we reached a consensus on this, thanks for that. In all cases (irrelevant of the pull req), feel free to post any patch either based on this one or as replacement. I'll be happy to read and rethink. So far it still doesn't make sense to me to not enable kvm x2apic with eim=on, but maybe I'm wrong, and I'd be happy to be corrected in that case. Thanks, -- Peter Xu
Re: [PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"
On Mon, 2022-10-10 at 15:08 -0400, Peter Xu wrote: > On Mon, Oct 10, 2022 at 10:39:52AM -0700, David Woodhouse wrote: > > On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote: > > > From: Peter Xu < > > > pet...@redhat.com > > > > > > > > > It's true that when vcpus<=255 we don't require the length of 32bit APIC > > > IDs. However here since we already have EIM=ON it means the hypervisor > > > will declare the VM as x2apic supported (e.g. VT-d ECAP register will have > > > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width > > > even if vcpus<=255. In short, commit 77250171bdc breaks any simple > > > cmdline > > > that wants to boot a VM with >=9 but <=255 vcpus with: > > > > I find that paragraph really hard to parse. What does it even mean that > > "guest should assume the APIC IDs are 32bits"? > > Quotting EIM definition: > > 0: On Intel® 64 platforms, hardware supports only 8-bit APIC-IDs (xAPIC > Mode). > > 1: On Intel® 64 platforms, hardware supports 32-bit APIC- IDs (x2APIC > mode). Hardware implementation reporting Interrupt Remapping support > (IR) field as Clear also report this field as Clear. > > I hope the statement was matching the spec. Please let me know if you have > better way to reword it. It needs to mention logical mode addressing. Because that, I presume, is why it broke only when you had more than 8 vCPUs. Because that's when the *logical* destination ID grew past 0xFF. > > In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the > > tables. Which is perfectly fine if there are only 254 CPUs anyway, and > > we never need to use a higher value. > > > > I *think* the actual problem here is when logical addressing is used, > > which puts the APIC cluster ID into higher bits? But it's kind of weird > > that the message doesn't mention that at all? > > The commit message actually doesn't even need to contain a lot of > information in this case, IMO. Well, it would be kind of useful if it said what the actual problem was, no? > Literally it can be seen as a revert of a commit which breaks guest with > > 8vcpu from boot. I kept the other lines because that still make sense, or > > it can be a full revert with "something broke with commit xxx, revert it to > fix" and anything else could be reworked. AFAICT that's how it normally > works with QEMU or Linux. > > I am not 100% familiar with the original purpose of the patch, would > eim=off work for you even after patch applied? Anything severely wrong > with this patch? I think the patch itself is fine; I'd just like the commit message to be clearer about what the problem was. > > That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI > > FADT, isn't it? Then the only values that a guest may put into those > > fields — 32-bit fields or not — are lower than 0xff anyway. > > It's still not clear to me why we need to make it inconsistent between the > EIM we declare to the guest and the KVM behavior on understanding EIM bit. > Even if enforced physical mode will work we loose the possibility of > cluster mode, and I also don't see what's the major benefit since EIM=off > will just work, afaiu, meanwhile make everything aligned. Yeah, I think turning EIM off is absolutely fine. > Are you fine if we proceed with this pull request first and revisit later? > Follow up patches will always be fine, and we're unbreaking something. I > have copied you since the 1st patch I posted and the small patch was there > for weeks, it'll be appreciated if either you could comment earlier next > time, or even propose a better fix then we can discuss what's the best way > to fix. Thanks. Yeah, sorry for the delay. But that was partly because the commit message was confusing me and it took me a while to work out what was actually going on... which is really all I'm heckling now. smime.p7s Description: S/MIME cryptographic signature
Re: [PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"
On Mon, Oct 10, 2022 at 10:39:52AM -0700, David Woodhouse wrote: > On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote: > > From: Peter Xu < > > pet...@redhat.com > > > > > > > It's true that when vcpus<=255 we don't require the length of 32bit APIC > > IDs. However here since we already have EIM=ON it means the hypervisor > > will declare the VM as x2apic supported (e.g. VT-d ECAP register will have > > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width > > even if vcpus<=255. In short, commit 77250171bdc breaks any simple cmdline > > that wants to boot a VM with >=9 but <=255 vcpus with: > > I find that paragraph really hard to parse. What does it even mean that > "guest should assume the APIC IDs are 32bits"? Quotting EIM definition: 0: On Intel® 64 platforms, hardware supports only 8-bit APIC-IDs (xAPIC Mode). 1: On Intel® 64 platforms, hardware supports 32-bit APIC- IDs (x2APIC mode). Hardware implementation reporting Interrupt Remapping support (IR) field as Clear also report this field as Clear. I hope the statement was matching the spec. Please let me know if you have better way to reword it. > > In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the > tables. Which is perfectly fine if there are only 254 CPUs anyway, and > we never need to use a higher value. > > I *think* the actual problem here is when logical addressing is used, > which puts the APIC cluster ID into higher bits? But it's kind of weird > that the message doesn't mention that at all? The commit message actually doesn't even need to contain a lot of information in this case, IMO. Literally it can be seen as a revert of a commit which breaks guest with >8vcpu from boot. I kept the other lines because that still make sense, or it can be a full revert with "something broke with commit xxx, revert it to fix" and anything else could be reworked. AFAICT that's how it normally works with QEMU or Linux. I am not 100% familiar with the original purpose of the patch, would eim=off work for you even after patch applied? Anything severely wrong with this patch? > > That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI > FADT, isn't it? Then the only values that a guest may put into those > fields — 32-bit fields or not — are lower than 0xff anyway. It's still not clear to me why we need to make it inconsistent between the EIM we declare to the guest and the KVM behavior on understanding EIM bit. Even if enforced physical mode will work we loose the possibility of cluster mode, and I also don't see what's the major benefit since EIM=off will just work, afaiu, meanwhile make everything aligned. Are you fine if we proceed with this pull request first and revisit later? Follow up patches will always be fine, and we're unbreaking something. I have copied you since the 1st patch I posted and the small patch was there for weeks, it'll be appreciated if either you could comment earlier next time, or even propose a better fix then we can discuss what's the best way to fix. Thanks. -- Peter Xu
Re: [PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"
On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote: > From: Peter Xu < > pet...@redhat.com > > > > It's true that when vcpus<=255 we don't require the length of 32bit APIC > IDs. However here since we already have EIM=ON it means the hypervisor > will declare the VM as x2apic supported (e.g. VT-d ECAP register will have > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width > even if vcpus<=255. In short, commit 77250171bdc breaks any simple cmdline > that wants to boot a VM with >=9 but <=255 vcpus with: I find that paragraph really hard to parse. What does it even mean that "guest should assume the APIC IDs are 32bits"? In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the tables. Which is perfectly fine if there are only 254 CPUs anyway, and we never need to use a higher value. I *think* the actual problem here is when logical addressing is used, which puts the APIC cluster ID into higher bits? But it's kind of weird that the message doesn't mention that at all? That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI FADT, isn't it? Then the only values that a guest may put into those fields — 32-bit fields or not — are lower than 0xff anyway. smime.p7s Description: S/MIME cryptographic signature
[PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"
From: Peter Xu It's true that when vcpus<=255 we don't require the length of 32bit APIC IDs. However here since we already have EIM=ON it means the hypervisor will declare the VM as x2apic supported (e.g. VT-d ECAP register will have EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width even if vcpus<=255. In short, commit 77250171bdc breaks any simple cmdline that wants to boot a VM with >=9 but <=255 vcpus with: -device intel-iommu,intremap=on For anyone who does not want to enable x2apic, we can use eim=off in the intel-iommu parameters to skip enabling KVM x2apic. This partly reverts commit 77250171bdc02aee106083fd2a068147befa1a38, while keeping the valid bit on checking split irqchip, but revert the other change. One thing to mention is that this patch may break migration compatibility of such VM, however that's probably the best thing we can do, because the old behavior was simply wrong and not working for >8 vcpus. For <=8 vcpus, there could be a light guest ABI change (by enabling KVM x2apic after this patch), but logically it shouldn't affect the migration from working. Also, this is not the 1st commit to change x2apic behavior. Igor provided a full history of how this evolved for the past few years: https://lore.kernel.org/qemu-devel/20220922154617.57d1a...@redhat.com/ Relevant commits for reference: fb506e701e ("intel_iommu: reject broken EIM", 2016-10-17) c1bb5418e3 ("target/i386: Support up to 32768 CPUs without IRQ remapping", 2020-12-10) 77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks", 2022-05-16) dc89f32d92 ("target/i386: Fix sanity check on max APIC ID / X2APIC enablement", 2022-05-16) We may want to have this for stable too (mostly for 7.1.0 only). Adding a fixes tag. Cc: David Woodhouse Cc: Claudio Fontana Cc: Igor Mammedov Fixes: 77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks") Signed-off-by: Peter Xu Message-Id: <20220926153206.10881-1-pet...@redhat.com> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Reviewed-by: Igor Mammedov --- hw/i386/intel_iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 05d53a1aa9..6524c2ee32 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3818,6 +3818,11 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split"); return false; } +if (!kvm_enable_x2apic()) { +error_setg(errp, "eim=on requires support on the KVM side" + "(X2APIC_API, first shipped in v4.7)"); +return false; +} } /* Currently only address widths supported are 39 and 48 bits */ -- MST