Re: [PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"

2022-10-10 Thread Peter Xu
On Mon, Oct 10, 2022 at 04:16:33PM -0700, David Woodhouse wrote:
> On Mon, 2022-10-10 at 15:08 -0400, Peter Xu wrote:
> > On Mon, Oct 10, 2022 at 10:39:52AM -0700, David Woodhouse wrote:
> > > On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote:
> > > > From: Peter Xu <
> > > > pet...@redhat.com
> > > > 
> > > > 
> > > > It's true that when vcpus<=255 we don't require the length of 32bit APIC
> > > > IDs.  However here since we already have EIM=ON it means the hypervisor
> > > > will declare the VM as x2apic supported (e.g. VT-d ECAP register will 
> > > > have
> > > > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width
> > > > even if vcpus<=255.  In short, commit 77250171bdc breaks any simple 
> > > > cmdline
> > > > that wants to boot a VM with >=9 but <=255 vcpus with:
> > > 
> > > I find that paragraph really hard to parse. What does it even mean that
> > > "guest should assume the APIC IDs are 32bits"? 
> > 
> > Quotting EIM definition:
> > 
> >  0: On Intel® 64 platforms, hardware supports only 8-bit APIC-IDs (xAPIC
> > Mode).
> > 
> >  1: On Intel® 64 platforms, hardware supports 32-bit APIC- IDs (x2APIC
> > mode).  Hardware implementation reporting Interrupt Remapping support
> > (IR) field as Clear also report this field as Clear.
> > 
> > I hope the statement was matching the spec.  Please let me know if you have
> > better way to reword it.
> 
> It needs to mention logical mode addressing. Because that, I presume,
> is why it broke only when you had more than 8 vCPUs. Because that's
> when the *logical* destination ID grew past 0xFF.

Agree.

> 
> > > In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the
> > > tables. Which is perfectly fine if there are only 254 CPUs anyway, and
> > > we never need to use a higher value.
> > > 
> > > I *think* the actual problem here is when logical addressing is used,
> > > which puts the APIC cluster ID into higher bits? But it's kind of weird
> > > that the message doesn't mention that at all?
> > 
> > The commit message actually doesn't even need to contain a lot of
> > information in this case, IMO.
> 
> Well, it would be kind of useful if it said what the actual problem
> was, no?

Yes it'll be nice to have.

> 
> > Literally it can be seen as a revert of a commit which breaks guest with
> > > 8vcpu from boot.  I kept the other lines because that still make sense, or
> > 
> > it can be a full revert with "something broke with commit xxx, revert it to
> > fix" and anything else could be reworked.  AFAICT that's how it normally
> > works with QEMU or Linux.
> > 
> > I am not 100% familiar with the original purpose of the patch, would
> > eim=off work for you even after patch applied?  Anything severely wrong
> > with this patch?
> 
> I think the patch itself is fine; I'd just like the commit message to
> be clearer about what the problem was.

Thanks for confirming.

> 
> > > That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI
> > > FADT, isn't it? Then the only values that a guest may put into those
> > > fields — 32-bit fields or not — are lower than 0xff anyway.
> > 
> > It's still not clear to me why we need to make it inconsistent between the
> > EIM we declare to the guest and the KVM behavior on understanding EIM bit.
> > Even if enforced physical mode will work we loose the possibility of
> > cluster mode, and I also don't see what's the major benefit since EIM=off
> > will just work, afaiu, meanwhile make everything aligned.
> 
> Yeah, I think turning EIM off is absolutely fine.
> 
> > Are you fine if we proceed with this pull request first and revisit later?
> > Follow up patches will always be fine, and we're unbreaking something.  I
> > have copied you since the 1st patch I posted and the small patch was there
> > for weeks, it'll be appreciated if either you could comment earlier next
> > time, or even propose a better fix then we can discuss what's the best way
> > to fix.  Thanks.
> 
> Yeah, sorry for the delay. But that was partly because the commit
> message was confusing me and it took me a while to work out what was
> actually going on... which is really all I'm heckling now.

I see, that was totally fine, and it'll be definitely also fine to comment
anything even on the pull req.  It's just that as I tried to argue for this
specific case IMHO we should move on and revisit later so we shrink the
regression window, rather than redo a pull and let this fix wait for
another one.  It seems we reached a consensus on this, thanks for that.

In all cases (irrelevant of the pull req), feel free to post any patch
either based on this one or as replacement.  I'll be happy to read and
rethink.  So far it still doesn't make sense to me to not enable kvm x2apic
with eim=on, but maybe I'm wrong, and I'd be happy to be corrected in that
case.

Thanks,

-- 
Peter Xu




Re: [PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"

2022-10-10 Thread David Woodhouse
On Mon, 2022-10-10 at 15:08 -0400, Peter Xu wrote:
> On Mon, Oct 10, 2022 at 10:39:52AM -0700, David Woodhouse wrote:
> > On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote:
> > > From: Peter Xu <
> > > pet...@redhat.com
> > > 
> > > 
> > > It's true that when vcpus<=255 we don't require the length of 32bit APIC
> > > IDs.  However here since we already have EIM=ON it means the hypervisor
> > > will declare the VM as x2apic supported (e.g. VT-d ECAP register will have
> > > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width
> > > even if vcpus<=255.  In short, commit 77250171bdc breaks any simple 
> > > cmdline
> > > that wants to boot a VM with >=9 but <=255 vcpus with:
> > 
> > I find that paragraph really hard to parse. What does it even mean that
> > "guest should assume the APIC IDs are 32bits"? 
> 
> Quotting EIM definition:
> 
>  0: On Intel® 64 platforms, hardware supports only 8-bit APIC-IDs (xAPIC
> Mode).
> 
>  1: On Intel® 64 platforms, hardware supports 32-bit APIC- IDs (x2APIC
> mode).  Hardware implementation reporting Interrupt Remapping support
> (IR) field as Clear also report this field as Clear.
> 
> I hope the statement was matching the spec.  Please let me know if you have
> better way to reword it.

It needs to mention logical mode addressing. Because that, I presume,
is why it broke only when you had more than 8 vCPUs. Because that's
when the *logical* destination ID grew past 0xFF.

> > In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the
> > tables. Which is perfectly fine if there are only 254 CPUs anyway, and
> > we never need to use a higher value.
> > 
> > I *think* the actual problem here is when logical addressing is used,
> > which puts the APIC cluster ID into higher bits? But it's kind of weird
> > that the message doesn't mention that at all?
> 
> The commit message actually doesn't even need to contain a lot of
> information in this case, IMO.

Well, it would be kind of useful if it said what the actual problem
was, no?

> Literally it can be seen as a revert of a commit which breaks guest with
> > 8vcpu from boot.  I kept the other lines because that still make sense, or
> 
> it can be a full revert with "something broke with commit xxx, revert it to
> fix" and anything else could be reworked.  AFAICT that's how it normally
> works with QEMU or Linux.
> 
> I am not 100% familiar with the original purpose of the patch, would
> eim=off work for you even after patch applied?  Anything severely wrong
> with this patch?

I think the patch itself is fine; I'd just like the commit message to
be clearer about what the problem was.

> > That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI
> > FADT, isn't it? Then the only values that a guest may put into those
> > fields — 32-bit fields or not — are lower than 0xff anyway.
> 
> It's still not clear to me why we need to make it inconsistent between the
> EIM we declare to the guest and the KVM behavior on understanding EIM bit.
> Even if enforced physical mode will work we loose the possibility of
> cluster mode, and I also don't see what's the major benefit since EIM=off
> will just work, afaiu, meanwhile make everything aligned.

Yeah, I think turning EIM off is absolutely fine.

> Are you fine if we proceed with this pull request first and revisit later?
> Follow up patches will always be fine, and we're unbreaking something.  I
> have copied you since the 1st patch I posted and the small patch was there
> for weeks, it'll be appreciated if either you could comment earlier next
> time, or even propose a better fix then we can discuss what's the best way
> to fix.  Thanks.

Yeah, sorry for the delay. But that was partly because the commit
message was confusing me and it took me a while to work out what was
actually going on... which is really all I'm heckling now.



smime.p7s
Description: S/MIME cryptographic signature


Re: [PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"

2022-10-10 Thread Peter Xu
On Mon, Oct 10, 2022 at 10:39:52AM -0700, David Woodhouse wrote:
> On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote:
> > From: Peter Xu <
> > pet...@redhat.com
> > >
> > 
> > It's true that when vcpus<=255 we don't require the length of 32bit APIC
> > IDs.  However here since we already have EIM=ON it means the hypervisor
> > will declare the VM as x2apic supported (e.g. VT-d ECAP register will have
> > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width
> > even if vcpus<=255.  In short, commit 77250171bdc breaks any simple cmdline
> > that wants to boot a VM with >=9 but <=255 vcpus with:
> 
> I find that paragraph really hard to parse. What does it even mean that
> "guest should assume the APIC IDs are 32bits"? 

Quotting EIM definition:

 0: On Intel® 64 platforms, hardware supports only 8-bit APIC-IDs (xAPIC
Mode).

 1: On Intel® 64 platforms, hardware supports 32-bit APIC- IDs (x2APIC
mode).  Hardware implementation reporting Interrupt Remapping support
(IR) field as Clear also report this field as Clear.

I hope the statement was matching the spec.  Please let me know if you have
better way to reword it.

> 
> In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the
> tables. Which is perfectly fine if there are only 254 CPUs anyway, and
> we never need to use a higher value.
> 
> I *think* the actual problem here is when logical addressing is used,
> which puts the APIC cluster ID into higher bits? But it's kind of weird
> that the message doesn't mention that at all?

The commit message actually doesn't even need to contain a lot of
information in this case, IMO.

Literally it can be seen as a revert of a commit which breaks guest with
>8vcpu from boot.  I kept the other lines because that still make sense, or
it can be a full revert with "something broke with commit xxx, revert it to
fix" and anything else could be reworked.  AFAICT that's how it normally
works with QEMU or Linux.

I am not 100% familiar with the original purpose of the patch, would
eim=off work for you even after patch applied?  Anything severely wrong
with this patch?

> 
> That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI
> FADT, isn't it? Then the only values that a guest may put into those
> fields — 32-bit fields or not — are lower than 0xff anyway.

It's still not clear to me why we need to make it inconsistent between the
EIM we declare to the guest and the KVM behavior on understanding EIM bit.
Even if enforced physical mode will work we loose the possibility of
cluster mode, and I also don't see what's the major benefit since EIM=off
will just work, afaiu, meanwhile make everything aligned.

Are you fine if we proceed with this pull request first and revisit later?
Follow up patches will always be fine, and we're unbreaking something.  I
have copied you since the 1st patch I posted and the small patch was there
for weeks, it'll be appreciated if either you could comment earlier next
time, or even propose a better fix then we can discuss what's the best way
to fix.  Thanks.

-- 
Peter Xu




Re: [PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"

2022-10-10 Thread David Woodhouse
On Mon, 2022-10-10 at 13:30 -0400, Michael S. Tsirkin wrote:
> From: Peter Xu <
> pet...@redhat.com
> >
> 
> It's true that when vcpus<=255 we don't require the length of 32bit APIC
> IDs.  However here since we already have EIM=ON it means the hypervisor
> will declare the VM as x2apic supported (e.g. VT-d ECAP register will have
> EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width
> even if vcpus<=255.  In short, commit 77250171bdc breaks any simple cmdline
> that wants to boot a VM with >=9 but <=255 vcpus with:

I find that paragraph really hard to parse. What does it even mean that
"guest should assume the APIC IDs are 32bits"? 

In practice, all the EIM bit does is *allow* 32 bits of APIC ID in the
tables. Which is perfectly fine if there are only 254 CPUs anyway, and
we never need to use a higher value.

I *think* the actual problem here is when logical addressing is used,
which puts the APIC cluster ID into higher bits? But it's kind of weird
that the message doesn't mention that at all?

That's fixable by just setting the X2APIC_PHYSICAL bit in the ACPI
FADT, isn't it? Then the only values that a guest may put into those
fields — 32-bit fields or not — are lower than 0xff anyway.



smime.p7s
Description: S/MIME cryptographic signature


[PULL 29/55] Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"

2022-10-10 Thread Michael S. Tsirkin
From: Peter Xu 

It's true that when vcpus<=255 we don't require the length of 32bit APIC
IDs.  However here since we already have EIM=ON it means the hypervisor
will declare the VM as x2apic supported (e.g. VT-d ECAP register will have
EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width
even if vcpus<=255.  In short, commit 77250171bdc breaks any simple cmdline
that wants to boot a VM with >=9 but <=255 vcpus with:

  -device intel-iommu,intremap=on

For anyone who does not want to enable x2apic, we can use eim=off in the
intel-iommu parameters to skip enabling KVM x2apic.

This partly reverts commit 77250171bdc02aee106083fd2a068147befa1a38, while
keeping the valid bit on checking split irqchip, but revert the other change.

One thing to mention is that this patch may break migration compatibility
of such VM, however that's probably the best thing we can do, because the
old behavior was simply wrong and not working for >8 vcpus.  For <=8 vcpus,
there could be a light guest ABI change (by enabling KVM x2apic after this
patch), but logically it shouldn't affect the migration from working.

Also, this is not the 1st commit to change x2apic behavior.  Igor provided
a full history of how this evolved for the past few years:

https://lore.kernel.org/qemu-devel/20220922154617.57d1a...@redhat.com/

Relevant commits for reference:

  fb506e701e ("intel_iommu: reject broken EIM", 2016-10-17)
  c1bb5418e3 ("target/i386: Support up to 32768 CPUs without IRQ remapping", 
2020-12-10)
  77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks", 
2022-05-16)
  dc89f32d92 ("target/i386: Fix sanity check on max APIC ID / X2APIC 
enablement", 2022-05-16)

We may want to have this for stable too (mostly for 7.1.0 only).  Adding a
fixes tag.

Cc: David Woodhouse 
Cc: Claudio Fontana 
Cc: Igor Mammedov 
Fixes: 77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks")
Signed-off-by: Peter Xu 
Message-Id: <20220926153206.10881-1-pet...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Igor Mammedov 
---
 hw/i386/intel_iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 05d53a1aa9..6524c2ee32 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3818,6 +3818,11 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
 error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split");
 return false;
 }
+if (!kvm_enable_x2apic()) {
+error_setg(errp, "eim=on requires support on the KVM side"
+ "(X2APIC_API, first shipped in v4.7)");
+return false;
+}
 }
 
 /* Currently only address widths supported are 39 and 48 bits */
-- 
MST