Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Gleb Natapov
On Mon, Dec 16, 2013 at 02:31:43PM +0100, Radim Krčmář wrote:
> 2013-12-16 13:55+0100, Radim Krčmář:
> > 2013-12-16 14:16+0200, Gleb Natapov:
> > > On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
> > > > > >  - Where does the 'only one supported cluster' come from?
> > > > > > 
> > > > > "only one supported cluster" comes from 8 bit cpuid limitation of 
> > > > > KVM's x2apic
> > > > > implementation. With 8 bit cpuid you can only address cluster 0 in 
> > > > > logical mode.
> > > > 
> > > > One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
> > > > so 8 bit cpuid can address first 16 clusters as well.
> > > > 
> > > >   u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> > > > 
> > > Interrupt from a device cannot generate such ldr, only IPI can. Only
> > > 4 cpus in cluster zero are addressable in clustering mode by a
> > > device. Without irq remapping x2apic is a PV interface between host
> > > and guest where guest needs to know KVM implementation's limitation to
> > > use it.
> > 
> > Thanks, I'll read more about devices ... still no idea how could they
> > address cluster > 15.
> > 
With irq remapping device they will be able to provide 32bit apic addresses.

> > > I do not see a point in fixing problems in x2apic logical mode
> > > emulation right now since it will not make it usable, as long as
> > > there is not security problems there.
> > 
> > Agreed; I wanted to know why this patch was correct, if we cared.
> > 
> > > > > >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> > > > > >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> > > > > >everything to logical_map[0][0:15], we would not work correctly 
> > > > > > in
> > > > > >the cluster x2apic, with > 16 vcpus.
> > > > > > 
> > > > > Such config cannot work today because of 8 bit cpuid limitation. When 
> > > > > the limitation
> > > > > will be removed KMV_X2APIC_CID_BITS will be set to actual number of 
> > > > > bits we want to support.
> > > > 
> > > > Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
> > > > cpuid, we would still deliver interrupts destined for cpuid > 256 to
> > > > potentially plugged cpus.
> > > Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
> > > unfortunately, not sure what you mean by the second part of the sentence.
> > 
> > Sorry, I meant that with this change, we map all clusters to cluster 0,
> > which has two flaws:
> >  - in kvm_lapic_set_base(), the order of vcpu creation determines those
> 
> Gah, should have been 'recalculate_apic_map()' ...
> 
> The patch would be especially surprising with a dynamic adding of vcpus.
> 
> >assigned to cluster 0, and the rest is unaddressable (overwritten)
> >  - we can send IPI to an unplugged high cpuid and it arives in cluster 0
It does not matter since well behaved guest is not supposed to configure apic 
like
that.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-16 13:55+0100, Radim Krčmář:
> 2013-12-16 14:16+0200, Gleb Natapov:
> > On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
> > > > >  - Where does the 'only one supported cluster' come from?
> > > > > 
> > > > "only one supported cluster" comes from 8 bit cpuid limitation of KVM's 
> > > > x2apic
> > > > implementation. With 8 bit cpuid you can only address cluster 0 in 
> > > > logical mode.
> > > 
> > > One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
> > > so 8 bit cpuid can address first 16 clusters as well.
> > > 
> > >   u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> > > 
> > Interrupt from a device cannot generate such ldr, only IPI can. Only
> > 4 cpus in cluster zero are addressable in clustering mode by a
> > device. Without irq remapping x2apic is a PV interface between host
> > and guest where guest needs to know KVM implementation's limitation to
> > use it.
> 
> Thanks, I'll read more about devices ... still no idea how could they
> address cluster > 15.
> 
> > I do not see a point in fixing problems in x2apic logical mode
> > emulation right now since it will not make it usable, as long as
> > there is not security problems there.
> 
> Agreed; I wanted to know why this patch was correct, if we cared.
> 
> > > > >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> > > > >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> > > > >everything to logical_map[0][0:15], we would not work correctly in
> > > > >the cluster x2apic, with > 16 vcpus.
> > > > > 
> > > > Such config cannot work today because of 8 bit cpuid limitation. When 
> > > > the limitation
> > > > will be removed KMV_X2APIC_CID_BITS will be set to actual number of 
> > > > bits we want to support.
> > > 
> > > Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
> > > cpuid, we would still deliver interrupts destined for cpuid > 256 to
> > > potentially plugged cpus.
> > Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
> > unfortunately, not sure what you mean by the second part of the sentence.
> 
> Sorry, I meant that with this change, we map all clusters to cluster 0,
> which has two flaws:
>  - in kvm_lapic_set_base(), the order of vcpu creation determines those

Gah, should have been 'recalculate_apic_map()' ...

The patch would be especially surprising with a dynamic adding of vcpus.

>assigned to cluster 0, and the rest is unaddressable (overwritten)
>  - we can send IPI to an unplugged high cpuid and it arives in cluster 0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-16 14:16+0200, Gleb Natapov:
> On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
> > > >  - Where does the 'only one supported cluster' come from?
> > > > 
> > > "only one supported cluster" comes from 8 bit cpuid limitation of KVM's 
> > > x2apic
> > > implementation. With 8 bit cpuid you can only address cluster 0 in 
> > > logical mode.
> > 
> > One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
> > so 8 bit cpuid can address first 16 clusters as well.
> > 
> >   u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> > 
> Interrupt from a device cannot generate such ldr, only IPI can. Only
> 4 cpus in cluster zero are addressable in clustering mode by a
> device. Without irq remapping x2apic is a PV interface between host
> and guest where guest needs to know KVM implementation's limitation to
> use it.

Thanks, I'll read more about devices ... still no idea how could they
address cluster > 15.

> I do not see a point in fixing problems in x2apic logical mode
> emulation right now since it will not make it usable, as long as
> there is not security problems there.

Agreed; I wanted to know why this patch was correct, if we cared.

> > > >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> > > >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> > > >everything to logical_map[0][0:15], we would not work correctly in
> > > >the cluster x2apic, with > 16 vcpus.
> > > > 
> > > Such config cannot work today because of 8 bit cpuid limitation. When the 
> > > limitation
> > > will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits 
> > > we want to support.
> > 
> > Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
> > cpuid, we would still deliver interrupts destined for cpuid > 256 to
> > potentially plugged cpus.
> Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
> unfortunately, not sure what you mean by the second part of the sentence.

Sorry, I meant that with this change, we map all clusters to cluster 0,
which has two flaws:
 - in kvm_lapic_set_base(), the order of vcpu creation determines those
   assigned to cluster 0, and the rest is unaddressable (overwritten)
 - we can send IPI to an unplugged high cpuid and it arives in cluster 0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Gleb Natapov
On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
> > >  - Where does the 'only one supported cluster' come from?
> > > 
> > "only one supported cluster" comes from 8 bit cpuid limitation of KVM's 
> > x2apic
> > implementation. With 8 bit cpuid you can only address cluster 0 in logical 
> > mode.
> 
> One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
> so 8 bit cpuid can address first 16 clusters as well.
> 
>   u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> 
Interrupt from a device cannot generate such ldr, only IPI can. Only
4 cpus in cluster zero are addressable in clustering mode by a
device. Without irq remapping x2apic is a PV interface between host
and guest where guest needs to know KVM implementation's limitation to
use it. I do not see a point in fixing problems in x2apic logical mode
emulation right now since it will not make it usable, as long as
there is not security problems there.

> > >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> > >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> > >everything to logical_map[0][0:15], we would not work correctly in
> > >the cluster x2apic, with > 16 vcpus.
> > > 
> > Such config cannot work today because of 8 bit cpuid limitation. When the 
> > limitation
> > will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
> > want to support.
> 
> Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
> cpuid, we would still deliver interrupts destined for cpuid > 256 to
> potentially plugged cpus.
Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
unfortunately, not sure what you mean by the second part of the sentence.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-14 11:46+0200, Gleb Natapov:
> On Fri, Dec 13, 2013 at 05:07:54PM +0100, Radim Krčmář wrote:
> > 2013-12-12 21:36+0100, Paolo Bonzini:
> > > From: Gleb Natapov 
> > > 
> > > A guest can cause a BUG_ON() leading to a host kernel crash.
> > > When the guest writes to the ICR to request an IPI, while in x2apic
> > > mode the following things happen, the destination is read from
> > > ICR2, which is a register that the guest can control.
> > > 
> > > kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
> > > cluster id.  A BUG_ON is triggered, which is a protection against
> > > accessing map->logical_map with an out-of-bounds access and manages
> > > to avoid that anything really unsafe occurs.
> > > 
> > > The logic in the code is correct from real HW point of view. The problem
> > > is that KVM supports only one cluster with ID 0 in clustered mode, but
> > > the code that has the bug does not take this into account.
> > 
> > The more I read about x2apic, the more confused I am ...
> > 
> >  - How was the cluster x2apic enabled?
> > 
> >Linux won't enable cluster x2apic without interrupt remapping and I
> >had no idea we were allowed to do it.
> > 
> Malicious code can do it.
> 
> >  - A hardware test-suite found this?
> > 
> >This bug can only be hit when the destination cpu is > 256, so the
> >request itself is buggy -- we don't support that many in kvm and it
> >would crash when initializing the vcpus if we did.
> >=> It looks like we should just ignore the ipi, because we have no
> >   vcpus in that cluster.
> > 
> That's the nature of malicious code: it does what your code does not expects
> it to do :)

I was wondering if there wasn't malicious linux on the other side too :)

> >  - Where does the 'only one supported cluster' come from?
> > 
> "only one supported cluster" comes from 8 bit cpuid limitation of KVM's x2apic
> implementation. With 8 bit cpuid you can only address cluster 0 in logical 
> mode.

One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
so 8 bit cpuid can address first 16 clusters as well.

  u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));

> >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> >everything to logical_map[0][0:15], we would not work correctly in
> >the cluster x2apic, with > 16 vcpus.
> > 
> Such config cannot work today because of 8 bit cpuid limitation. When the 
> limitation
> will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
> want to support.

Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
cpuid, we would still deliver interrupts destined for cpuid > 256 to
potentially plugged cpus.

> It will likely be smaller then 16 bit though since full 18 bit support will 
> require huge tables.

Yeah, we'll have to do dynamic allocation if we are ever going to need
the full potential of x2apic.
(2^20-16 cpus in cluster and 2^32-1 in flat mode)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-14 11:46+0200, Gleb Natapov:
 On Fri, Dec 13, 2013 at 05:07:54PM +0100, Radim Krčmář wrote:
  2013-12-12 21:36+0100, Paolo Bonzini:
   From: Gleb Natapov g...@redhat.com
   
   A guest can cause a BUG_ON() leading to a host kernel crash.
   When the guest writes to the ICR to request an IPI, while in x2apic
   mode the following things happen, the destination is read from
   ICR2, which is a register that the guest can control.
   
   kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
   cluster id.  A BUG_ON is triggered, which is a protection against
   accessing map-logical_map with an out-of-bounds access and manages
   to avoid that anything really unsafe occurs.
   
   The logic in the code is correct from real HW point of view. The problem
   is that KVM supports only one cluster with ID 0 in clustered mode, but
   the code that has the bug does not take this into account.
  
  The more I read about x2apic, the more confused I am ...
  
   - How was the cluster x2apic enabled?
  
 Linux won't enable cluster x2apic without interrupt remapping and I
 had no idea we were allowed to do it.
  
 Malicious code can do it.
 
   - A hardware test-suite found this?
  
 This bug can only be hit when the destination cpu is  256, so the
 request itself is buggy -- we don't support that many in kvm and it
 would crash when initializing the vcpus if we did.
 = It looks like we should just ignore the ipi, because we have no
vcpus in that cluster.
  
 That's the nature of malicious code: it does what your code does not expects
 it to do :)

I was wondering if there wasn't malicious linux on the other side too :)

   - Where does the 'only one supported cluster' come from?
  
 only one supported cluster comes from 8 bit cpuid limitation of KVM's x2apic
 implementation. With 8 bit cpuid you can only address cluster 0 in logical 
 mode.

One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
so 8 bit cpuid can address first 16 clusters as well.

  u32 ldr = ((id  4)  16) | (1  (id  0xf));

 I only see we use 'struct kvm_lapic *logical_map[16][16];', which
 supports 16 clusters of 16 apics = first 256 vcpus, so if we map
 everything to logical_map[0][0:15], we would not work correctly in
 the cluster x2apic, with  16 vcpus.
  
 Such config cannot work today because of 8 bit cpuid limitation. When the 
 limitation
 will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
 want to support.

Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
cpuid, we would still deliver interrupts destined for cpuid  256 to
potentially plugged cpus.

 It will likely be smaller then 16 bit though since full 18 bit support will 
 require huge tables.

Yeah, we'll have to do dynamic allocation if we are ever going to need
the full potential of x2apic.
(2^20-16 cpus in cluster and 2^32-1 in flat mode)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Gleb Natapov
On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
- Where does the 'only one supported cluster' come from?
   
  only one supported cluster comes from 8 bit cpuid limitation of KVM's 
  x2apic
  implementation. With 8 bit cpuid you can only address cluster 0 in logical 
  mode.
 
 One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
 so 8 bit cpuid can address first 16 clusters as well.
 
   u32 ldr = ((id  4)  16) | (1  (id  0xf));
 
Interrupt from a device cannot generate such ldr, only IPI can. Only
4 cpus in cluster zero are addressable in clustering mode by a
device. Without irq remapping x2apic is a PV interface between host
and guest where guest needs to know KVM implementation's limitation to
use it. I do not see a point in fixing problems in x2apic logical mode
emulation right now since it will not make it usable, as long as
there is not security problems there.

  I only see we use 'struct kvm_lapic *logical_map[16][16];', which
  supports 16 clusters of 16 apics = first 256 vcpus, so if we map
  everything to logical_map[0][0:15], we would not work correctly in
  the cluster x2apic, with  16 vcpus.
   
  Such config cannot work today because of 8 bit cpuid limitation. When the 
  limitation
  will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
  want to support.
 
 Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
 cpuid, we would still deliver interrupts destined for cpuid  256 to
 potentially plugged cpus.
Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
unfortunately, not sure what you mean by the second part of the sentence.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-16 14:16+0200, Gleb Natapov:
 On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
 - Where does the 'only one supported cluster' come from?

   only one supported cluster comes from 8 bit cpuid limitation of KVM's 
   x2apic
   implementation. With 8 bit cpuid you can only address cluster 0 in 
   logical mode.
  
  One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
  so 8 bit cpuid can address first 16 clusters as well.
  
u32 ldr = ((id  4)  16) | (1  (id  0xf));
  
 Interrupt from a device cannot generate such ldr, only IPI can. Only
 4 cpus in cluster zero are addressable in clustering mode by a
 device. Without irq remapping x2apic is a PV interface between host
 and guest where guest needs to know KVM implementation's limitation to
 use it.

Thanks, I'll read more about devices ... still no idea how could they
address cluster  15.

 I do not see a point in fixing problems in x2apic logical mode
 emulation right now since it will not make it usable, as long as
 there is not security problems there.

Agreed; I wanted to know why this patch was correct, if we cared.

   I only see we use 'struct kvm_lapic *logical_map[16][16];', which
   supports 16 clusters of 16 apics = first 256 vcpus, so if we map
   everything to logical_map[0][0:15], we would not work correctly in
   the cluster x2apic, with  16 vcpus.

   Such config cannot work today because of 8 bit cpuid limitation. When the 
   limitation
   will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits 
   we want to support.
  
  Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
  cpuid, we would still deliver interrupts destined for cpuid  256 to
  potentially plugged cpus.
 Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
 unfortunately, not sure what you mean by the second part of the sentence.

Sorry, I meant that with this change, we map all clusters to cluster 0,
which has two flaws:
 - in kvm_lapic_set_base(), the order of vcpu creation determines those
   assigned to cluster 0, and the rest is unaddressable (overwritten)
 - we can send IPI to an unplugged high cpuid and it arives in cluster 0
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-16 13:55+0100, Radim Krčmář:
 2013-12-16 14:16+0200, Gleb Natapov:
  On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
  - Where does the 'only one supported cluster' come from?
 
only one supported cluster comes from 8 bit cpuid limitation of KVM's 
x2apic
implementation. With 8 bit cpuid you can only address cluster 0 in 
logical mode.
   
   One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
   so 8 bit cpuid can address first 16 clusters as well.
   
 u32 ldr = ((id  4)  16) | (1  (id  0xf));
   
  Interrupt from a device cannot generate such ldr, only IPI can. Only
  4 cpus in cluster zero are addressable in clustering mode by a
  device. Without irq remapping x2apic is a PV interface between host
  and guest where guest needs to know KVM implementation's limitation to
  use it.
 
 Thanks, I'll read more about devices ... still no idea how could they
 address cluster  15.
 
  I do not see a point in fixing problems in x2apic logical mode
  emulation right now since it will not make it usable, as long as
  there is not security problems there.
 
 Agreed; I wanted to know why this patch was correct, if we cared.
 
I only see we use 'struct kvm_lapic *logical_map[16][16];', which
supports 16 clusters of 16 apics = first 256 vcpus, so if we map
everything to logical_map[0][0:15], we would not work correctly in
the cluster x2apic, with  16 vcpus.
 
Such config cannot work today because of 8 bit cpuid limitation. When 
the limitation
will be removed KMV_X2APIC_CID_BITS will be set to actual number of 
bits we want to support.
   
   Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
   cpuid, we would still deliver interrupts destined for cpuid  256 to
   potentially plugged cpus.
  Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
  unfortunately, not sure what you mean by the second part of the sentence.
 
 Sorry, I meant that with this change, we map all clusters to cluster 0,
 which has two flaws:
  - in kvm_lapic_set_base(), the order of vcpu creation determines those

Gah, should have been 'recalculate_apic_map()' ...

The patch would be especially surprising with a dynamic adding of vcpus.

assigned to cluster 0, and the rest is unaddressable (overwritten)
  - we can send IPI to an unplugged high cpuid and it arives in cluster 0
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Gleb Natapov
On Mon, Dec 16, 2013 at 02:31:43PM +0100, Radim Krčmář wrote:
 2013-12-16 13:55+0100, Radim Krčmář:
  2013-12-16 14:16+0200, Gleb Natapov:
   On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote:
   - Where does the 'only one supported cluster' come from?
  
 only one supported cluster comes from 8 bit cpuid limitation of 
 KVM's x2apic
 implementation. With 8 bit cpuid you can only address cluster 0 in 
 logical mode.

One x2apic cluster has 16 cpus and we generate the x2apic LDR correctly,
so 8 bit cpuid can address first 16 clusters as well.

  u32 ldr = ((id  4)  16) | (1  (id  0xf));

   Interrupt from a device cannot generate such ldr, only IPI can. Only
   4 cpus in cluster zero are addressable in clustering mode by a
   device. Without irq remapping x2apic is a PV interface between host
   and guest where guest needs to know KVM implementation's limitation to
   use it.
  
  Thanks, I'll read more about devices ... still no idea how could they
  address cluster  15.
  
With irq remapping device they will be able to provide 32bit apic addresses.

   I do not see a point in fixing problems in x2apic logical mode
   emulation right now since it will not make it usable, as long as
   there is not security problems there.
  
  Agreed; I wanted to know why this patch was correct, if we cared.
  
 I only see we use 'struct kvm_lapic *logical_map[16][16];', which
 supports 16 clusters of 16 apics = first 256 vcpus, so if we map
 everything to logical_map[0][0:15], we would not work correctly 
  in
 the cluster x2apic, with  16 vcpus.
  
 Such config cannot work today because of 8 bit cpuid limitation. When 
 the limitation
 will be removed KMV_X2APIC_CID_BITS will be set to actual number of 
 bits we want to support.

Even with KMV_X2APIC_CID_BITS = 4, which would allow us to support 8 bit
cpuid, we would still deliver interrupts destined for cpuid  256 to
potentially plugged cpus.
   Again, KMV_X2APIC_CID_BITS = 4 will not allow us to support 8 bit cpuids
   unfortunately, not sure what you mean by the second part of the sentence.
  
  Sorry, I meant that with this change, we map all clusters to cluster 0,
  which has two flaws:
   - in kvm_lapic_set_base(), the order of vcpu creation determines those
 
 Gah, should have been 'recalculate_apic_map()' ...
 
 The patch would be especially surprising with a dynamic adding of vcpus.
 
 assigned to cluster 0, and the rest is unaddressable (overwritten)
   - we can send IPI to an unplugged high cpuid and it arives in cluster 0
It does not matter since well behaved guest is not supposed to configure apic 
like
that.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-14 Thread Gleb Natapov
On Fri, Dec 13, 2013 at 06:25:20PM +0100, Paolo Bonzini wrote:
> Il 13/12/2013 17:07, Radim Krčmář ha scritto:
> >This bug can only be hit when the destination cpu is > 256, so the
> >request itself is buggy -- we don't support that many in kvm and it
> >would crash when initializing the vcpus if we did.
> >=> It looks like we should just ignore the ipi, because we have no
> >   vcpus in that cluster.
> 
> That's what should happen in physical mode.  Something like this patch:
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5439117d5c4c..1f8e9e1abd3b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
>  
>   if (irq->dest_mode == 0) { /* physical mode */
>   if (irq->delivery_mode == APIC_DM_LOWEST ||
> - irq->dest_id == 0xff)
> + irq->dest_id == 0xff ||
> + (apic_x2apic_mode(src) && irq->dest_id > 0xff))
Here you fall back to slow path wich still uses irq->dest_id == 0xff as 
broadcast test.

>   goto out;
> - dst = >phys_map[irq->dest_id & 0xff];
> + dst = >phys_map[irq->dest_id];
>   } else {
>   u32 mda = irq->dest_id << (32 - map->ldr_bits);
>  
> 
> On top of this, the x2apic spec documents a "broadcast" destination ID that
> could be implemented as follows.  But I have no idea if this is correct and
> how it differs from the usual broadcast shorthand:
> 
> @@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
I'd rather add kvm_apic_is_broadcast() function like that:
bool kvm_apic_is_broadcast(struct kvm_lapic *l, u32 dest)
{
 return dest == (apic_x2apic_mode(l) ? 0x : 0xff);
}

and use everywhere instead of  irq->dest_id == 0xff test.

>   irq.level = icr_low & APIC_INT_ASSERT;
>   irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
>   irq.shorthand = icr_low & APIC_SHORT_MASK;
> - if (apic_x2apic_mode(apic))
> + if (apic_x2apic_mode(apic)) {
>   irq.dest_id = icr_high;
> - else
> + if (icr_high == 0x)
> + irq.shorthand = APIC_DEST_ALLINC;
> + } else
>   irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
>  
>   trace_kvm_apic_ipi(icr_low, irq.dest_id);
> 
> 
> >  - Where does the 'only one supported cluster' come from?
> > 
> >I only see we use 'struct kvm_lapic *logical_map[16][16];', which
> >supports 16 clusters of 16 apics = first 256 vcpus, so if we map
> >everything to logical_map[0][0:15], we would not work correctly in
> >the cluster x2apic, with > 16 vcpus.
> 
> I think you're right.  Something like this would be a better fix:
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
See my answer to Radim. There is not point in maintaining cid mapping that 
cannot be
addressed.

> index dec48bfaddb8..58745cbbb7e6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -198,7 +198,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>   cid = apic_cluster_id(new, ldr);
>   lid = apic_logical_id(new, ldr);
>  
> - if (lid)
> + if (cid < ARRAY_SIZE(new->logical_map) && lid)
>   new->logical_map[cid][ffs(lid) - 1] = apic;
>   }
>  out:
> @@ -621,9 +621,12 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
>   dst = >phys_map[irq->dest_id & 0xff];
>   } else {
>   u32 mda = irq->dest_id << (32 - map->ldr_bits);
> + u32 cid = apic_cluster_id(map, mda);
>  
> - dst = map->logical_map[apic_cluster_id(map, mda)];
> + if (cid >= ARRAY_SIZE(map->logical_map))
> + goto out;
>  
> + dst = map->logical_map[cid];
>   bitmap = apic_logical_id(map, mda);
>  
>   if (irq->delivery_mode == APIC_DM_LOWEST) {
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index c8b0d0d2da5c..5ccb71658894 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -152,8 +152,6 @@ static inline u16 apic_cluster_id(struct kvm_apic_map 
> *map, u32 ldr)
>   ldr >>= 32 - map->ldr_bits;
>   cid = (ldr >> map->cid_shift) & map->cid_mask;
>  
> - BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
> -
>   return cid;
>  }
>  
> playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ vi kvm/lapic.c
> playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ git diff !$
> git diff kvm/lapic.c
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index dec48bfaddb8..1005185972a9 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -143,8 +143,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>   return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>  }
>  
> -#define KVM_X2APIC_CID_BITS 0
> -
>  static void 

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-14 Thread Gleb Natapov
On Fri, Dec 13, 2013 at 05:07:54PM +0100, Radim Krčmář wrote:
> 2013-12-12 21:36+0100, Paolo Bonzini:
> > From: Gleb Natapov 
> > 
> > A guest can cause a BUG_ON() leading to a host kernel crash.
> > When the guest writes to the ICR to request an IPI, while in x2apic
> > mode the following things happen, the destination is read from
> > ICR2, which is a register that the guest can control.
> > 
> > kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
> > cluster id.  A BUG_ON is triggered, which is a protection against
> > accessing map->logical_map with an out-of-bounds access and manages
> > to avoid that anything really unsafe occurs.
> > 
> > The logic in the code is correct from real HW point of view. The problem
> > is that KVM supports only one cluster with ID 0 in clustered mode, but
> > the code that has the bug does not take this into account.
> 
> The more I read about x2apic, the more confused I am ...
> 
>  - How was the cluster x2apic enabled?
> 
>Linux won't enable cluster x2apic without interrupt remapping and I
>had no idea we were allowed to do it.
> 
Malicious code can do it.

>  - A hardware test-suite found this?
> 
>This bug can only be hit when the destination cpu is > 256, so the
>request itself is buggy -- we don't support that many in kvm and it
>would crash when initializing the vcpus if we did.
>=> It looks like we should just ignore the ipi, because we have no
>   vcpus in that cluster.
> 
That's the nature of malicious code: it does what your code does not expects
it to do :)


>  - Where does the 'only one supported cluster' come from?
> 
"only one supported cluster" comes from 8 bit cpuid limitation of KVM's x2apic
implementation. With 8 bit cpuid you can only address cluster 0 in logical mode.

>I only see we use 'struct kvm_lapic *logical_map[16][16];', which
>supports 16 clusters of 16 apics = first 256 vcpus, so if we map
>everything to logical_map[0][0:15], we would not work correctly in
>the cluster x2apic, with > 16 vcpus.
> 
Such config cannot work today because of 8 bit cpuid limitation. When the 
limitation
will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
want to support.
It will likely be smaller then 16 bit though since full 18 bit support will 
require huge tables.

> Thanks.
> 
> > Reported-by: Lars Bull 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Gleb Natapov 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  arch/x86/kvm/lapic.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index b8bec45c1610..801dc3fd66e1 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -143,6 +143,8 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
> > return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
> >  }
> >  
> > +#define KMV_X2APIC_CID_BITS 0
> > +
> >  static void recalculate_apic_map(struct kvm *kvm)
> >  {
> > struct kvm_apic_map *new, *old = NULL;
> > @@ -180,7 +182,8 @@ static void recalculate_apic_map(struct kvm *kvm)
> > if (apic_x2apic_mode(apic)) {
> > new->ldr_bits = 32;
> > new->cid_shift = 16;
> > -   new->cid_mask = new->lid_mask = 0x;
> > +   new->cid_mask = (1 << KMV_X2APIC_CID_BITS) - 1;
> > +   new->lid_mask = 0x;
> > } else if (kvm_apic_sw_enabled(apic) &&
> > !new->cid_mask /* flat mode */ &&
> > kvm_apic_get_reg(apic, APIC_DFR) == 
> > APIC_DFR_CLUSTER) {
> > -- 
> > 1.8.3.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-14 Thread Gleb Natapov
On Fri, Dec 13, 2013 at 05:07:54PM +0100, Radim Krčmář wrote:
 2013-12-12 21:36+0100, Paolo Bonzini:
  From: Gleb Natapov g...@redhat.com
  
  A guest can cause a BUG_ON() leading to a host kernel crash.
  When the guest writes to the ICR to request an IPI, while in x2apic
  mode the following things happen, the destination is read from
  ICR2, which is a register that the guest can control.
  
  kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
  cluster id.  A BUG_ON is triggered, which is a protection against
  accessing map-logical_map with an out-of-bounds access and manages
  to avoid that anything really unsafe occurs.
  
  The logic in the code is correct from real HW point of view. The problem
  is that KVM supports only one cluster with ID 0 in clustered mode, but
  the code that has the bug does not take this into account.
 
 The more I read about x2apic, the more confused I am ...
 
  - How was the cluster x2apic enabled?
 
Linux won't enable cluster x2apic without interrupt remapping and I
had no idea we were allowed to do it.
 
Malicious code can do it.

  - A hardware test-suite found this?
 
This bug can only be hit when the destination cpu is  256, so the
request itself is buggy -- we don't support that many in kvm and it
would crash when initializing the vcpus if we did.
= It looks like we should just ignore the ipi, because we have no
   vcpus in that cluster.
 
That's the nature of malicious code: it does what your code does not expects
it to do :)


  - Where does the 'only one supported cluster' come from?
 
only one supported cluster comes from 8 bit cpuid limitation of KVM's x2apic
implementation. With 8 bit cpuid you can only address cluster 0 in logical mode.

I only see we use 'struct kvm_lapic *logical_map[16][16];', which
supports 16 clusters of 16 apics = first 256 vcpus, so if we map
everything to logical_map[0][0:15], we would not work correctly in
the cluster x2apic, with  16 vcpus.
 
Such config cannot work today because of 8 bit cpuid limitation. When the 
limitation
will be removed KMV_X2APIC_CID_BITS will be set to actual number of bits we 
want to support.
It will likely be smaller then 16 bit though since full 18 bit support will 
require huge tables.

 Thanks.
 
  Reported-by: Lars Bull larsb...@google.com
  Cc: sta...@vger.kernel.org
  Signed-off-by: Gleb Natapov g...@redhat.com
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
   arch/x86/kvm/lapic.c | 5 -
   1 file changed, 4 insertions(+), 1 deletion(-)
  
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index b8bec45c1610..801dc3fd66e1 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -143,6 +143,8 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
  return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
   }
   
  +#define KMV_X2APIC_CID_BITS 0
  +
   static void recalculate_apic_map(struct kvm *kvm)
   {
  struct kvm_apic_map *new, *old = NULL;
  @@ -180,7 +182,8 @@ static void recalculate_apic_map(struct kvm *kvm)
  if (apic_x2apic_mode(apic)) {
  new-ldr_bits = 32;
  new-cid_shift = 16;
  -   new-cid_mask = new-lid_mask = 0x;
  +   new-cid_mask = (1  KMV_X2APIC_CID_BITS) - 1;
  +   new-lid_mask = 0x;
  } else if (kvm_apic_sw_enabled(apic) 
  !new-cid_mask /* flat mode */ 
  kvm_apic_get_reg(apic, APIC_DFR) == 
  APIC_DFR_CLUSTER) {
  -- 
  1.8.3.1
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-14 Thread Gleb Natapov
On Fri, Dec 13, 2013 at 06:25:20PM +0100, Paolo Bonzini wrote:
 Il 13/12/2013 17:07, Radim Krčmář ha scritto:
 This bug can only be hit when the destination cpu is  256, so the
 request itself is buggy -- we don't support that many in kvm and it
 would crash when initializing the vcpus if we did.
 = It looks like we should just ignore the ipi, because we have no
vcpus in that cluster.
 
 That's what should happen in physical mode.  Something like this patch:
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 5439117d5c4c..1f8e9e1abd3b 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
 struct kvm_lapic *src,
  
   if (irq-dest_mode == 0) { /* physical mode */
   if (irq-delivery_mode == APIC_DM_LOWEST ||
 - irq-dest_id == 0xff)
 + irq-dest_id == 0xff ||
 + (apic_x2apic_mode(src)  irq-dest_id  0xff))
Here you fall back to slow path wich still uses irq-dest_id == 0xff as 
broadcast test.

   goto out;
 - dst = map-phys_map[irq-dest_id  0xff];
 + dst = map-phys_map[irq-dest_id];
   } else {
   u32 mda = irq-dest_id  (32 - map-ldr_bits);
  
 
 On top of this, the x2apic spec documents a broadcast destination ID that
 could be implemented as follows.  But I have no idea if this is correct and
 how it differs from the usual broadcast shorthand:
 
 @@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
I'd rather add kvm_apic_is_broadcast() function like that:
bool kvm_apic_is_broadcast(struct kvm_lapic *l, u32 dest)
{
 return dest == (apic_x2apic_mode(l) ? 0x : 0xff);
}

and use everywhere instead of  irq-dest_id == 0xff test.

   irq.level = icr_low  APIC_INT_ASSERT;
   irq.trig_mode = icr_low  APIC_INT_LEVELTRIG;
   irq.shorthand = icr_low  APIC_SHORT_MASK;
 - if (apic_x2apic_mode(apic))
 + if (apic_x2apic_mode(apic)) {
   irq.dest_id = icr_high;
 - else
 + if (icr_high == 0x)
 + irq.shorthand = APIC_DEST_ALLINC;
 + } else
   irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
  
   trace_kvm_apic_ipi(icr_low, irq.dest_id);
 
 
   - Where does the 'only one supported cluster' come from?
  
 I only see we use 'struct kvm_lapic *logical_map[16][16];', which
 supports 16 clusters of 16 apics = first 256 vcpus, so if we map
 everything to logical_map[0][0:15], we would not work correctly in
 the cluster x2apic, with  16 vcpus.
 
 I think you're right.  Something like this would be a better fix:
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
See my answer to Radim. There is not point in maintaining cid mapping that 
cannot be
addressed.

 index dec48bfaddb8..58745cbbb7e6 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -198,7 +198,7 @@ static void recalculate_apic_map(struct kvm *kvm)
   cid = apic_cluster_id(new, ldr);
   lid = apic_logical_id(new, ldr);
  
 - if (lid)
 + if (cid  ARRAY_SIZE(new-logical_map)  lid)
   new-logical_map[cid][ffs(lid) - 1] = apic;
   }
  out:
 @@ -621,9 +621,12 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
 struct kvm_lapic *src,
   dst = map-phys_map[irq-dest_id  0xff];
   } else {
   u32 mda = irq-dest_id  (32 - map-ldr_bits);
 + u32 cid = apic_cluster_id(map, mda);
  
 - dst = map-logical_map[apic_cluster_id(map, mda)];
 + if (cid = ARRAY_SIZE(map-logical_map))
 + goto out;
  
 + dst = map-logical_map[cid];
   bitmap = apic_logical_id(map, mda);
  
   if (irq-delivery_mode == APIC_DM_LOWEST) {
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index c8b0d0d2da5c..5ccb71658894 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -152,8 +152,6 @@ static inline u16 apic_cluster_id(struct kvm_apic_map 
 *map, u32 ldr)
   ldr = 32 - map-ldr_bits;
   cid = (ldr  map-cid_shift)  map-cid_mask;
  
 - BUG_ON(cid = ARRAY_SIZE(map-logical_map));
 -
   return cid;
  }
  
 playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ vi kvm/lapic.c
 playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ git diff !$
 git diff kvm/lapic.c
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index dec48bfaddb8..1005185972a9 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -143,8 +143,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
   return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
  }
  
 -#define KVM_X2APIC_CID_BITS 0
 -
  static void recalculate_apic_map(struct kvm *kvm)
  {
   struct kvm_apic_map *new, *old = NULL;
 @@ -182,8 +180,7 @@ static void recalculate_apic_map(struct kvm *kvm)
   if 

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Radim Krčmář
2013-12-13 18:25+0100, Paolo Bonzini:
> Il 13/12/2013 17:07, Radim Krčmář ha scritto:
> >This bug can only be hit when the destination cpu is > 256, so the
> >request itself is buggy -- we don't support that many in kvm and it
> >would crash when initializing the vcpus if we did.
> >=> It looks like we should just ignore the ipi, because we have no
> >   vcpus in that cluster.
> 
> That's what should happen in physical mode.  Something like this patch:
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5439117d5c4c..1f8e9e1abd3b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
>  
>   if (irq->dest_mode == 0) { /* physical mode */
>   if (irq->delivery_mode == APIC_DM_LOWEST ||
> - irq->dest_id == 0xff)
> + irq->dest_id == 0xff ||

The 0xff is xapic broadcast, we did not care about the x2apic one and
no-one noticed that it does not work :)

0xff isn't special in x2apic mode.

> + (apic_x2apic_mode(src) && irq->dest_id > 0xff))
>   goto out;
> - dst = >phys_map[irq->dest_id & 0xff];
> + dst = >phys_map[irq->dest_id];
>   } else {
>   u32 mda = irq->dest_id << (32 - map->ldr_bits);
>  
> 
> On top of this, the x2apic spec documents a "broadcast" destination ID that
> could be implemented as follows.  But I have no idea if this is correct and
> how it differs from the usual broadcast shorthand:
> 
> @@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
>   irq.level = icr_low & APIC_INT_ASSERT;
>   irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
>   irq.shorthand = icr_low & APIC_SHORT_MASK;
> - if (apic_x2apic_mode(apic))
> + if (apic_x2apic_mode(apic)) {
>   irq.dest_id = icr_high;
> - else
> + if (icr_high == 0x)
> + irq.shorthand = APIC_DEST_ALLINC;

The shorthand takes priority, so we shouldn't overwrite it.
This is better solved after we 'goto out' from the _fast().
(could be nicer still ...)

> + } else
>   irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
>   trace_kvm_apic_ipi(icr_low, irq.dest_id);
> 
> 

And another possibility occured to me now: Google was the first one to
encounter a broadcast; we don't handle it at all in the fast path and
x2apic has 0x as the target in both modes ...
(I think that xapic has 0xff in both of them too, but I'll check)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5439117..e4618c4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -512,19 +512,24 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
apic_update_ppr(apic);
 }
 
-int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)
+int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest)
 {
-   return dest == 0xff || kvm_apic_id(apic) == dest;
+   return dest == (apic_x2apic_mode(apic) ? 0x : 0xff)
+  || kvm_apic_id(apic) == dest;
 }
 
-int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
+int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
 {
int result = 0;
u32 logical_id;
 
+   if (mda == (apic_x2apic_mode(apic) ? 0x : 0xff))
+   return 1;
+
if (apic_x2apic_mode(apic)) {
logical_id = kvm_apic_get_reg(apic, APIC_LDR);
-   return logical_id & mda;
+   return mda >> 16 == logical_id >> 16
+  && logical_id & mda & 0x;
}
 
logical_id = GET_APIC_LOGICAL_ID(kvm_apic_get_reg(apic, APIC_LDR));
@@ -605,6 +610,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
if (irq->shorthand)
return false;
 
+   /* broadcast */
+   if (irq->dest_id == (apic_x2apic_mode(src) ? 0x : 0xff))
+   return false;
+
rcu_read_lock();
map = rcu_dereference(kvm->arch.apic_map);
 
@@ -612,10 +621,13 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
struct kvm_lapic *src,
goto out;
 
if (irq->dest_mode == 0) { /* physical mode */
-   if (irq->delivery_mode == APIC_DM_LOWEST ||
-   irq->dest_id == 0xff)
+   if (irq->dest_id > 0xff) {
+   ret = true;
+   goto out;
+   }
+   if (irq->delivery_mode == APIC_DM_LOWEST)
goto out;
-   dst = >phys_map[irq->dest_id & 0xff];
+   dst = >phys_map[irq->dest_id];
} else {
u32 mda = irq->dest_id << (32 - map->ldr_bits);
 

The logical x2apic slowpath was completely broken, maybe still is,
I have no way to test it ...

The broadcast checking should be 

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Paolo Bonzini
Il 13/12/2013 17:07, Radim Krčmář ha scritto:
>This bug can only be hit when the destination cpu is > 256, so the
>request itself is buggy -- we don't support that many in kvm and it
>would crash when initializing the vcpus if we did.
>=> It looks like we should just ignore the ipi, because we have no
>   vcpus in that cluster.

That's what should happen in physical mode.  Something like this patch:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5439117d5c4c..1f8e9e1abd3b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
 
if (irq->dest_mode == 0) { /* physical mode */
if (irq->delivery_mode == APIC_DM_LOWEST ||
-   irq->dest_id == 0xff)
+   irq->dest_id == 0xff ||
+   (apic_x2apic_mode(src) && irq->dest_id > 0xff))
goto out;
-   dst = >phys_map[irq->dest_id & 0xff];
+   dst = >phys_map[irq->dest_id];
} else {
u32 mda = irq->dest_id << (32 - map->ldr_bits);
 

On top of this, the x2apic spec documents a "broadcast" destination ID that
could be implemented as follows.  But I have no idea if this is correct and
how it differs from the usual broadcast shorthand:

@@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
irq.level = icr_low & APIC_INT_ASSERT;
irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
irq.shorthand = icr_low & APIC_SHORT_MASK;
-   if (apic_x2apic_mode(apic))
+   if (apic_x2apic_mode(apic)) {
irq.dest_id = icr_high;
-   else
+   if (icr_high == 0x)
+   irq.shorthand = APIC_DEST_ALLINC;
+   } else
irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
 
trace_kvm_apic_ipi(icr_low, irq.dest_id);


>  - Where does the 'only one supported cluster' come from?
> 
>I only see we use 'struct kvm_lapic *logical_map[16][16];', which
>supports 16 clusters of 16 apics = first 256 vcpus, so if we map
>everything to logical_map[0][0:15], we would not work correctly in
>the cluster x2apic, with > 16 vcpus.

I think you're right.  Something like this would be a better fix:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dec48bfaddb8..58745cbbb7e6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -198,7 +198,7 @@ static void recalculate_apic_map(struct kvm *kvm)
cid = apic_cluster_id(new, ldr);
lid = apic_logical_id(new, ldr);
 
-   if (lid)
+   if (cid < ARRAY_SIZE(new->logical_map) && lid)
new->logical_map[cid][ffs(lid) - 1] = apic;
}
 out:
@@ -621,9 +621,12 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
dst = >phys_map[irq->dest_id & 0xff];
} else {
u32 mda = irq->dest_id << (32 - map->ldr_bits);
+   u32 cid = apic_cluster_id(map, mda);
 
-   dst = map->logical_map[apic_cluster_id(map, mda)];
+   if (cid >= ARRAY_SIZE(map->logical_map))
+   goto out;
 
+   dst = map->logical_map[cid];
bitmap = apic_logical_id(map, mda);
 
if (irq->delivery_mode == APIC_DM_LOWEST) {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index c8b0d0d2da5c..5ccb71658894 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -152,8 +152,6 @@ static inline u16 apic_cluster_id(struct kvm_apic_map *map, 
u32 ldr)
ldr >>= 32 - map->ldr_bits;
cid = (ldr >> map->cid_shift) & map->cid_mask;
 
-   BUG_ON(cid >= ARRAY_SIZE(map->logical_map));
-
return cid;
 }
 
playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ vi kvm/lapic.c
playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ git diff !$
git diff kvm/lapic.c
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dec48bfaddb8..1005185972a9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -143,8 +143,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
 
-#define KVM_X2APIC_CID_BITS 0
-
 static void recalculate_apic_map(struct kvm *kvm)
 {
struct kvm_apic_map *new, *old = NULL;
@@ -182,8 +180,7 @@ static void recalculate_apic_map(struct kvm *kvm)
if (apic_x2apic_mode(apic)) {
new->ldr_bits = 32;
new->cid_shift = 16;
-   new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
-   new->lid_mask = 0x;
+   new->cid_mask = new->lid_mask = 0x;
} else if (kvm_apic_sw_enabled(apic) &&
!new->cid_mask /* flat mode */ &&

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Radim Krčmář
2013-12-12 21:36+0100, Paolo Bonzini:
> From: Gleb Natapov 
> 
> A guest can cause a BUG_ON() leading to a host kernel crash.
> When the guest writes to the ICR to request an IPI, while in x2apic
> mode the following things happen, the destination is read from
> ICR2, which is a register that the guest can control.
> 
> kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
> cluster id.  A BUG_ON is triggered, which is a protection against
> accessing map->logical_map with an out-of-bounds access and manages
> to avoid that anything really unsafe occurs.
> 
> The logic in the code is correct from real HW point of view. The problem
> is that KVM supports only one cluster with ID 0 in clustered mode, but
> the code that has the bug does not take this into account.

The more I read about x2apic, the more confused I am ...

 - How was the cluster x2apic enabled?

   Linux won't enable cluster x2apic without interrupt remapping and I
   had no idea we were allowed to do it.

 - A hardware test-suite found this?

   This bug can only be hit when the destination cpu is > 256, so the
   request itself is buggy -- we don't support that many in kvm and it
   would crash when initializing the vcpus if we did.
   => It looks like we should just ignore the ipi, because we have no
  vcpus in that cluster.

 - Where does the 'only one supported cluster' come from?

   I only see we use 'struct kvm_lapic *logical_map[16][16];', which
   supports 16 clusters of 16 apics = first 256 vcpus, so if we map
   everything to logical_map[0][0:15], we would not work correctly in
   the cluster x2apic, with > 16 vcpus.

Thanks.

> Reported-by: Lars Bull 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gleb Natapov 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/lapic.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b8bec45c1610..801dc3fd66e1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -143,6 +143,8 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>   return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>  }
>  
> +#define KMV_X2APIC_CID_BITS 0
> +
>  static void recalculate_apic_map(struct kvm *kvm)
>  {
>   struct kvm_apic_map *new, *old = NULL;
> @@ -180,7 +182,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>   if (apic_x2apic_mode(apic)) {
>   new->ldr_bits = 32;
>   new->cid_shift = 16;
> - new->cid_mask = new->lid_mask = 0x;
> + new->cid_mask = (1 << KMV_X2APIC_CID_BITS) - 1;
> + new->lid_mask = 0x;
>   } else if (kvm_apic_sw_enabled(apic) &&
>   !new->cid_mask /* flat mode */ &&
>   kvm_apic_get_reg(apic, APIC_DFR) == 
> APIC_DFR_CLUSTER) {
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Radim Krčmář
2013-12-12 21:36+0100, Paolo Bonzini:
 From: Gleb Natapov g...@redhat.com
 
 A guest can cause a BUG_ON() leading to a host kernel crash.
 When the guest writes to the ICR to request an IPI, while in x2apic
 mode the following things happen, the destination is read from
 ICR2, which is a register that the guest can control.
 
 kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
 cluster id.  A BUG_ON is triggered, which is a protection against
 accessing map-logical_map with an out-of-bounds access and manages
 to avoid that anything really unsafe occurs.
 
 The logic in the code is correct from real HW point of view. The problem
 is that KVM supports only one cluster with ID 0 in clustered mode, but
 the code that has the bug does not take this into account.

The more I read about x2apic, the more confused I am ...

 - How was the cluster x2apic enabled?

   Linux won't enable cluster x2apic without interrupt remapping and I
   had no idea we were allowed to do it.

 - A hardware test-suite found this?

   This bug can only be hit when the destination cpu is  256, so the
   request itself is buggy -- we don't support that many in kvm and it
   would crash when initializing the vcpus if we did.
   = It looks like we should just ignore the ipi, because we have no
  vcpus in that cluster.

 - Where does the 'only one supported cluster' come from?

   I only see we use 'struct kvm_lapic *logical_map[16][16];', which
   supports 16 clusters of 16 apics = first 256 vcpus, so if we map
   everything to logical_map[0][0:15], we would not work correctly in
   the cluster x2apic, with  16 vcpus.

Thanks.

 Reported-by: Lars Bull larsb...@google.com
 Cc: sta...@vger.kernel.org
 Signed-off-by: Gleb Natapov g...@redhat.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/lapic.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index b8bec45c1610..801dc3fd66e1 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -143,6 +143,8 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
   return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
  }
  
 +#define KMV_X2APIC_CID_BITS 0
 +
  static void recalculate_apic_map(struct kvm *kvm)
  {
   struct kvm_apic_map *new, *old = NULL;
 @@ -180,7 +182,8 @@ static void recalculate_apic_map(struct kvm *kvm)
   if (apic_x2apic_mode(apic)) {
   new-ldr_bits = 32;
   new-cid_shift = 16;
 - new-cid_mask = new-lid_mask = 0x;
 + new-cid_mask = (1  KMV_X2APIC_CID_BITS) - 1;
 + new-lid_mask = 0x;
   } else if (kvm_apic_sw_enabled(apic) 
   !new-cid_mask /* flat mode */ 
   kvm_apic_get_reg(apic, APIC_DFR) == 
 APIC_DFR_CLUSTER) {
 -- 
 1.8.3.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Paolo Bonzini
Il 13/12/2013 17:07, Radim Krčmář ha scritto:
This bug can only be hit when the destination cpu is  256, so the
request itself is buggy -- we don't support that many in kvm and it
would crash when initializing the vcpus if we did.
= It looks like we should just ignore the ipi, because we have no
   vcpus in that cluster.

That's what should happen in physical mode.  Something like this patch:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5439117d5c4c..1f8e9e1abd3b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
 
if (irq-dest_mode == 0) { /* physical mode */
if (irq-delivery_mode == APIC_DM_LOWEST ||
-   irq-dest_id == 0xff)
+   irq-dest_id == 0xff ||
+   (apic_x2apic_mode(src)  irq-dest_id  0xff))
goto out;
-   dst = map-phys_map[irq-dest_id  0xff];
+   dst = map-phys_map[irq-dest_id];
} else {
u32 mda = irq-dest_id  (32 - map-ldr_bits);
 

On top of this, the x2apic spec documents a broadcast destination ID that
could be implemented as follows.  But I have no idea if this is correct and
how it differs from the usual broadcast shorthand:

@@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
irq.level = icr_low  APIC_INT_ASSERT;
irq.trig_mode = icr_low  APIC_INT_LEVELTRIG;
irq.shorthand = icr_low  APIC_SHORT_MASK;
-   if (apic_x2apic_mode(apic))
+   if (apic_x2apic_mode(apic)) {
irq.dest_id = icr_high;
-   else
+   if (icr_high == 0x)
+   irq.shorthand = APIC_DEST_ALLINC;
+   } else
irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
 
trace_kvm_apic_ipi(icr_low, irq.dest_id);


  - Where does the 'only one supported cluster' come from?
 
I only see we use 'struct kvm_lapic *logical_map[16][16];', which
supports 16 clusters of 16 apics = first 256 vcpus, so if we map
everything to logical_map[0][0:15], we would not work correctly in
the cluster x2apic, with  16 vcpus.

I think you're right.  Something like this would be a better fix:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dec48bfaddb8..58745cbbb7e6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -198,7 +198,7 @@ static void recalculate_apic_map(struct kvm *kvm)
cid = apic_cluster_id(new, ldr);
lid = apic_logical_id(new, ldr);
 
-   if (lid)
+   if (cid  ARRAY_SIZE(new-logical_map)  lid)
new-logical_map[cid][ffs(lid) - 1] = apic;
}
 out:
@@ -621,9 +621,12 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
dst = map-phys_map[irq-dest_id  0xff];
} else {
u32 mda = irq-dest_id  (32 - map-ldr_bits);
+   u32 cid = apic_cluster_id(map, mda);
 
-   dst = map-logical_map[apic_cluster_id(map, mda)];
+   if (cid = ARRAY_SIZE(map-logical_map))
+   goto out;
 
+   dst = map-logical_map[cid];
bitmap = apic_logical_id(map, mda);
 
if (irq-delivery_mode == APIC_DM_LOWEST) {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index c8b0d0d2da5c..5ccb71658894 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -152,8 +152,6 @@ static inline u16 apic_cluster_id(struct kvm_apic_map *map, 
u32 ldr)
ldr = 32 - map-ldr_bits;
cid = (ldr  map-cid_shift)  map-cid_mask;
 
-   BUG_ON(cid = ARRAY_SIZE(map-logical_map));
-
return cid;
 }
 
playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ vi kvm/lapic.c
playground:~/work/upstream/linux-2.6/arch/x86 pbonzini$ git diff !$
git diff kvm/lapic.c
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dec48bfaddb8..1005185972a9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -143,8 +143,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
 }
 
-#define KVM_X2APIC_CID_BITS 0
-
 static void recalculate_apic_map(struct kvm *kvm)
 {
struct kvm_apic_map *new, *old = NULL;
@@ -182,8 +180,7 @@ static void recalculate_apic_map(struct kvm *kvm)
if (apic_x2apic_mode(apic)) {
new-ldr_bits = 32;
new-cid_shift = 16;
-   new-cid_mask = (1  KVM_X2APIC_CID_BITS) - 1;
-   new-lid_mask = 0x;
+   new-cid_mask = new-lid_mask = 0x;
} else if (kvm_apic_sw_enabled(apic) 
!new-cid_mask /* flat mode */ 
kvm_apic_get_reg(apic, APIC_DFR) == 
APIC_DFR_CLUSTER) {
@@ -198,7 +195,7 @@ 

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Radim Krčmář
2013-12-13 18:25+0100, Paolo Bonzini:
 Il 13/12/2013 17:07, Radim Krčmář ha scritto:
 This bug can only be hit when the destination cpu is  256, so the
 request itself is buggy -- we don't support that many in kvm and it
 would crash when initializing the vcpus if we did.
 = It looks like we should just ignore the ipi, because we have no
vcpus in that cluster.
 
 That's what should happen in physical mode.  Something like this patch:
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 5439117d5c4c..1f8e9e1abd3b 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -613,9 +615,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
 struct kvm_lapic *src,
  
   if (irq-dest_mode == 0) { /* physical mode */
   if (irq-delivery_mode == APIC_DM_LOWEST ||
 - irq-dest_id == 0xff)
 + irq-dest_id == 0xff ||

The 0xff is xapic broadcast, we did not care about the x2apic one and
no-one noticed that it does not work :)

0xff isn't special in x2apic mode.

 + (apic_x2apic_mode(src)  irq-dest_id  0xff))
   goto out;
 - dst = map-phys_map[irq-dest_id  0xff];
 + dst = map-phys_map[irq-dest_id];
   } else {
   u32 mda = irq-dest_id  (32 - map-ldr_bits);
  
 
 On top of this, the x2apic spec documents a broadcast destination ID that
 could be implemented as follows.  But I have no idea if this is correct and
 how it differs from the usual broadcast shorthand:
 
 @@ -815,9 +818,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
   irq.level = icr_low  APIC_INT_ASSERT;
   irq.trig_mode = icr_low  APIC_INT_LEVELTRIG;
   irq.shorthand = icr_low  APIC_SHORT_MASK;
 - if (apic_x2apic_mode(apic))
 + if (apic_x2apic_mode(apic)) {
   irq.dest_id = icr_high;
 - else
 + if (icr_high == 0x)
 + irq.shorthand = APIC_DEST_ALLINC;

The shorthand takes priority, so we shouldn't overwrite it.
This is better solved after we 'goto out' from the _fast().
(could be nicer still ...)

 + } else
   irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
   trace_kvm_apic_ipi(icr_low, irq.dest_id);
 
 

And another possibility occured to me now: Google was the first one to
encounter a broadcast; we don't handle it at all in the fast path and
x2apic has 0x as the target in both modes ...
(I think that xapic has 0xff in both of them too, but I'll check)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5439117..e4618c4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -512,19 +512,24 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
apic_update_ppr(apic);
 }
 
-int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)
+int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest)
 {
-   return dest == 0xff || kvm_apic_id(apic) == dest;
+   return dest == (apic_x2apic_mode(apic) ? 0x : 0xff)
+  || kvm_apic_id(apic) == dest;
 }
 
-int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
+int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
 {
int result = 0;
u32 logical_id;
 
+   if (mda == (apic_x2apic_mode(apic) ? 0x : 0xff))
+   return 1;
+
if (apic_x2apic_mode(apic)) {
logical_id = kvm_apic_get_reg(apic, APIC_LDR);
-   return logical_id  mda;
+   return mda  16 == logical_id  16
+   logical_id  mda  0x;
}
 
logical_id = GET_APIC_LOGICAL_ID(kvm_apic_get_reg(apic, APIC_LDR));
@@ -605,6 +610,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
if (irq-shorthand)
return false;
 
+   /* broadcast */
+   if (irq-dest_id == (apic_x2apic_mode(src) ? 0x : 0xff))
+   return false;
+
rcu_read_lock();
map = rcu_dereference(kvm-arch.apic_map);
 
@@ -612,10 +621,13 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
struct kvm_lapic *src,
goto out;
 
if (irq-dest_mode == 0) { /* physical mode */
-   if (irq-delivery_mode == APIC_DM_LOWEST ||
-   irq-dest_id == 0xff)
+   if (irq-dest_id  0xff) {
+   ret = true;
+   goto out;
+   }
+   if (irq-delivery_mode == APIC_DM_LOWEST)
goto out;
-   dst = map-phys_map[irq-dest_id  0xff];
+   dst = map-phys_map[irq-dest_id];
} else {
u32 mda = irq-dest_id  (32 - map-ldr_bits);
 

The logical x2apic slowpath was completely broken, maybe still is,
I have no way to test it ...

The broadcast checking should be abstracted, we call it a lot and giving
it a name would be much better than commenting.

Also,