Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects

2013-11-11 Thread Igor Mammedov
On Sun, 10 Nov 2013 12:36:29 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Fri, Nov 08, 2013 at 06:33:12PM +0100, Igor Mammedov wrote:
  On Fri, 8 Nov 2013 12:22:12 +0200
  Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com wrote:
  
   Hi,
   
   On Thu, Nov 07, 2013 at 03:03:42PM +0200, Michael S. Tsirkin wrote:
On Thu, Nov 07, 2013 at 01:41:59PM +0100, Vasilis Liaskovitis wrote:
 This patch adds a _PXM method to ACPI CPU objects for the pc machine. 
 The _PXM
 value is derived from the passed in guest info, same way as CPU SRAT 
 entries.
 
 The motivation for this patch is a CPU hot-unplug/hot-plug bug 
 observed when
 using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The 
 linux
 guest kernel parses the SRAT CPU entries at boot time and stores them 
 in the
 array __apicid_to_node. When a CPU is hot-removed, the linux guest 
 kernel
 resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE 
 (kernel commit
 c4c60524). When the removed cpu is hot-added again, the linux kernel 
 looks up
 the hot-added cpu object's _PXM method instead of somehow 
 re-discovering the
 SRAT entry info. With current qemu/seabios, the _PXM method is not 
 found, and
 the CPU is thus hot-plugged in the default NUMA node 0. (The problem 
 does not
 show up on initial hotplug of a cpu; the PXM method is still not 
 found in this
 case, but the kernel still has the correct proximity value from the 
 CPU's SRAT
 entry stored in __apicid_to_node)
 
 ACPI spec mentions that the _PXM method is the correct way to 
 determine
 proximity information at hot-add time.

Where does it say this?
I found this:
If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically
added processor is not present in the System Resource Affinity Table
(SRAT), a _PXM object must exist for the processor’s device or one of
its ancestors in the ACPI Namespace.

Does this mean that linux is buggy, and should be fixed up to look up
the apic ID in SRAT?
   
   The quote above suggests that if SRAT is absent, _PXM should be present.
   Seabios/qemu provide SRAT entries, and  no _PXM. The fact that the kernel
   resets the parse SRAT info on hot-remove time looks like a kernel problem.
   
   But As Toshi Kani mentioned in the original thread, here is a quote from 
   ACPI
   5.0, stating _PXM and only _PXM should be used at hot-plug time:
   
   ===
   17.2.1 System Resource Affinity Table Definition
   
   This optional System Resource Affinity Table (SRAT) provides the boot
   time description of the processor and memory ranges belonging to a
   system locality. OSPM will consume the SRAT only at boot time. OSPM
   should use _PXM for any devices that are hot-added into the system after
   boot up.
   
   
   So in this sense, the kernel is correct (kernel only uses _PXM at 
   hot-plug time)
   , and qemu/Seabios should have _PXM methods for hot operations.
  
  in terms of RFC SHOULD doesn't mean MUST, and in my interpretation of above 
  is
  that SRAT parsed once but it doesn't mean that OS should forget data from 
  it.
 
 Well it says OSPM will consume the SRAT only at boot time.
 How do you interpret that?
Exactly as I've wrote before. In the same chapter spec makes provision that 
hotplug memory entries could be in SRAT as well, which rules out using table
only at boot time.

But looking at APIC entry it doesn't have flag that marks it as hotplugable
opposed to memory entry, so maybe we are doing useless work putting not present
CPUs into it.

 
  Anyway we surely can have both in QEMU.
   

 So far, qemu/seabios do not provide this
 method for CPUs. So regardless of kernel behaviour, it is a good idea 
 to add
 this _PXM method. Since ACPI table generation has recently been moved 
 from
 seabios to qemu, we do this in qemu.
 
 Note that the above hot-remove/hot-add scenario has been tested on an 
 older
 qemu + non-upstreamed patches for cpu hot-removal support, and not on 
 qemu
 master (since cpu-del support is still not on master). The only 
 testing done
 with qemu/seabios master and this patch, are successful boots of 
 multi-node
 linux and windows8 guests.
 
 For the initial discussion on seabios and linux-acpi lists see
 http://www.spinics.net/lists/linux-acpi/msg47058.html
 
 Signed-off-by: Vasilis Liaskovitis 
 vasilis.liaskovi...@profitbricks.com
 Reviewed-by: Thilo Fromm t...@thilo-fromm.de

Even if this is a linux bug, I have no issue with working around
it in qemu.

But I think proper testing needs to be done with rebased upport for 
cpu-del.
   
   Ok, I can try to rebase cpu-del support for testing. If there are cpu-del 
   bits
   already somewhere (Igor?) and not merged yet, please point me to 

Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects

2013-11-10 Thread Michael S. Tsirkin
On Fri, Nov 08, 2013 at 06:33:12PM +0100, Igor Mammedov wrote:
 On Fri, 8 Nov 2013 12:22:12 +0200
 Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com wrote:
 
  Hi,
  
  On Thu, Nov 07, 2013 at 03:03:42PM +0200, Michael S. Tsirkin wrote:
   On Thu, Nov 07, 2013 at 01:41:59PM +0100, Vasilis Liaskovitis wrote:
This patch adds a _PXM method to ACPI CPU objects for the pc machine. 
The _PXM
value is derived from the passed in guest info, same way as CPU SRAT 
entries.

The motivation for this patch is a CPU hot-unplug/hot-plug bug observed 
when
using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The 
linux
guest kernel parses the SRAT CPU entries at boot time and stores them 
in the
array __apicid_to_node. When a CPU is hot-removed, the linux guest 
kernel
resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel 
commit
c4c60524). When the removed cpu is hot-added again, the linux kernel 
looks up
the hot-added cpu object's _PXM method instead of somehow 
re-discovering the
SRAT entry info. With current qemu/seabios, the _PXM method is not 
found, and
the CPU is thus hot-plugged in the default NUMA node 0. (The problem 
does not
show up on initial hotplug of a cpu; the PXM method is still not found 
in this
case, but the kernel still has the correct proximity value from the 
CPU's SRAT
entry stored in __apicid_to_node)

ACPI spec mentions that the _PXM method is the correct way to determine
proximity information at hot-add time.
   
   Where does it say this?
   I found this:
   If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically
   added processor is not present in the System Resource Affinity Table
   (SRAT), a _PXM object must exist for the processor’s device or one of
   its ancestors in the ACPI Namespace.
   
   Does this mean that linux is buggy, and should be fixed up to look up
   the apic ID in SRAT?
  
  The quote above suggests that if SRAT is absent, _PXM should be present.
  Seabios/qemu provide SRAT entries, and  no _PXM. The fact that the kernel
  resets the parse SRAT info on hot-remove time looks like a kernel problem.
  
  But As Toshi Kani mentioned in the original thread, here is a quote from 
  ACPI
  5.0, stating _PXM and only _PXM should be used at hot-plug time:
  
  ===
  17.2.1 System Resource Affinity Table Definition
  
  This optional System Resource Affinity Table (SRAT) provides the boot
  time description of the processor and memory ranges belonging to a
  system locality. OSPM will consume the SRAT only at boot time. OSPM
  should use _PXM for any devices that are hot-added into the system after
  boot up.
  
  
  So in this sense, the kernel is correct (kernel only uses _PXM at hot-plug 
  time)
  , and qemu/Seabios should have _PXM methods for hot operations.
 
 in terms of RFC SHOULD doesn't mean MUST, and in my interpretation of above is
 that SRAT parsed once but it doesn't mean that OS should forget data from it.

Well it says OSPM will consume the SRAT only at boot time.
How do you interpret that?

 Anyway we surely can have both in QEMU.
  
   
So far, qemu/seabios do not provide this
method for CPUs. So regardless of kernel behaviour, it is a good idea 
to add
this _PXM method. Since ACPI table generation has recently been moved 
from
seabios to qemu, we do this in qemu.

Note that the above hot-remove/hot-add scenario has been tested on an 
older
qemu + non-upstreamed patches for cpu hot-removal support, and not on 
qemu
master (since cpu-del support is still not on master). The only testing 
done
with qemu/seabios master and this patch, are successful boots of 
multi-node
linux and windows8 guests.

For the initial discussion on seabios and linux-acpi lists see
http://www.spinics.net/lists/linux-acpi/msg47058.html

Signed-off-by: Vasilis Liaskovitis 
vasilis.liaskovi...@profitbricks.com
Reviewed-by: Thilo Fromm t...@thilo-fromm.de
   
   Even if this is a linux bug, I have no issue with working around
   it in qemu.
   
   But I think proper testing needs to be done with rebased upport for 
   cpu-del.
  
  Ok, I can try to rebase cpu-del support for testing. If there are cpu-del 
  bits
  already somewhere (Igor?) and not merged yet, please point me to them.
  
   
---
 hw/i386/acpi-build.c  |2 ++
 hw/i386/ssdt-proc.dsl |2 ++
 2 files changed, 4 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 6cfa044..9373f5e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val)
 #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 
2)
 #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 
4)
 #define 

Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects

2013-11-08 Thread Vasilis Liaskovitis
Hi,

On Thu, Nov 07, 2013 at 03:03:42PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 07, 2013 at 01:41:59PM +0100, Vasilis Liaskovitis wrote:
  This patch adds a _PXM method to ACPI CPU objects for the pc machine. The 
  _PXM
  value is derived from the passed in guest info, same way as CPU SRAT 
  entries.
  
  The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when
  using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux
  guest kernel parses the SRAT CPU entries at boot time and stores them in the
  array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel
  resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel 
  commit
  c4c60524). When the removed cpu is hot-added again, the linux kernel looks 
  up
  the hot-added cpu object's _PXM method instead of somehow re-discovering the
  SRAT entry info. With current qemu/seabios, the _PXM method is not found, 
  and
  the CPU is thus hot-plugged in the default NUMA node 0. (The problem does 
  not
  show up on initial hotplug of a cpu; the PXM method is still not found in 
  this
  case, but the kernel still has the correct proximity value from the CPU's 
  SRAT
  entry stored in __apicid_to_node)
  
  ACPI spec mentions that the _PXM method is the correct way to determine
  proximity information at hot-add time.
 
 Where does it say this?
 I found this:
 If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically
 added processor is not present in the System Resource Affinity Table
 (SRAT), a _PXM object must exist for the processor’s device or one of
 its ancestors in the ACPI Namespace.
 
 Does this mean that linux is buggy, and should be fixed up to look up
 the apic ID in SRAT?

The quote above suggests that if SRAT is absent, _PXM should be present.
Seabios/qemu provide SRAT entries, and  no _PXM. The fact that the kernel
resets the parse SRAT info on hot-remove time looks like a kernel problem.

But As Toshi Kani mentioned in the original thread, here is a quote from ACPI
5.0, stating _PXM and only _PXM should be used at hot-plug time:

===
17.2.1 System Resource Affinity Table Definition

This optional System Resource Affinity Table (SRAT) provides the boot
time description of the processor and memory ranges belonging to a
system locality. OSPM will consume the SRAT only at boot time. OSPM
should use _PXM for any devices that are hot-added into the system after
boot up.


So in this sense, the kernel is correct (kernel only uses _PXM at hot-plug time)
, and qemu/Seabios should have _PXM methods for hot operations.

 
  So far, qemu/seabios do not provide this
  method for CPUs. So regardless of kernel behaviour, it is a good idea to add
  this _PXM method. Since ACPI table generation has recently been moved from
  seabios to qemu, we do this in qemu.
  
  Note that the above hot-remove/hot-add scenario has been tested on an older
  qemu + non-upstreamed patches for cpu hot-removal support, and not on qemu
  master (since cpu-del support is still not on master). The only testing done
  with qemu/seabios master and this patch, are successful boots of multi-node
  linux and windows8 guests.
  
  For the initial discussion on seabios and linux-acpi lists see
  http://www.spinics.net/lists/linux-acpi/msg47058.html
  
  Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
  Reviewed-by: Thilo Fromm t...@thilo-fromm.de
 
 Even if this is a linux bug, I have no issue with working around
 it in qemu.
 
 But I think proper testing needs to be done with rebased upport for cpu-del.

Ok, I can try to rebase cpu-del support for testing. If there are cpu-del bits
already somewhere (Igor?) and not merged yet, please point me to them.

 
  ---
   hw/i386/acpi-build.c  |2 ++
   hw/i386/ssdt-proc.dsl |2 ++
   2 files changed, 4 insertions(+)
  
  diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
  index 6cfa044..9373f5e 100644
  --- a/hw/i386/acpi-build.c
  +++ b/hw/i386/acpi-build.c
  @@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val)
   #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2)
   #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4)
   #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start)
  +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start)
   #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start)
   #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start)
   
  @@ -724,6 +725,7 @@ build_ssdt(GArray *table_data, GArray *linker,
   proc[ACPI_PROC_OFFSET_CPUHEX+1] = acpi_get_hex(i);
   proc[ACPI_PROC_OFFSET_CPUID1] = i;
   proc[ACPI_PROC_OFFSET_CPUID2] = i;
  +proc[ACPI_PROC_OFFSET_CPUPXM] = guest_info-node_cpu[i];
   }
   
   /* build this code:
  diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl
  index 8229bfd..7eef8b2 100644
  --- a/hw/i386/ssdt-proc.dsl
  +++ 

Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects

2013-11-08 Thread Igor Mammedov
On Fri, 8 Nov 2013 12:22:12 +0200
Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com wrote:

 Hi,
 
 On Thu, Nov 07, 2013 at 03:03:42PM +0200, Michael S. Tsirkin wrote:
  On Thu, Nov 07, 2013 at 01:41:59PM +0100, Vasilis Liaskovitis wrote:
   This patch adds a _PXM method to ACPI CPU objects for the pc machine. The 
   _PXM
   value is derived from the passed in guest info, same way as CPU SRAT 
   entries.
   
   The motivation for this patch is a CPU hot-unplug/hot-plug bug observed 
   when
   using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The 
   linux
   guest kernel parses the SRAT CPU entries at boot time and stores them in 
   the
   array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel
   resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel 
   commit
   c4c60524). When the removed cpu is hot-added again, the linux kernel 
   looks up
   the hot-added cpu object's _PXM method instead of somehow re-discovering 
   the
   SRAT entry info. With current qemu/seabios, the _PXM method is not found, 
   and
   the CPU is thus hot-plugged in the default NUMA node 0. (The problem does 
   not
   show up on initial hotplug of a cpu; the PXM method is still not found in 
   this
   case, but the kernel still has the correct proximity value from the CPU's 
   SRAT
   entry stored in __apicid_to_node)
   
   ACPI spec mentions that the _PXM method is the correct way to determine
   proximity information at hot-add time.
  
  Where does it say this?
  I found this:
  If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically
  added processor is not present in the System Resource Affinity Table
  (SRAT), a _PXM object must exist for the processor’s device or one of
  its ancestors in the ACPI Namespace.
  
  Does this mean that linux is buggy, and should be fixed up to look up
  the apic ID in SRAT?
 
 The quote above suggests that if SRAT is absent, _PXM should be present.
 Seabios/qemu provide SRAT entries, and  no _PXM. The fact that the kernel
 resets the parse SRAT info on hot-remove time looks like a kernel problem.
 
 But As Toshi Kani mentioned in the original thread, here is a quote from ACPI
 5.0, stating _PXM and only _PXM should be used at hot-plug time:
 
 ===
 17.2.1 System Resource Affinity Table Definition
 
 This optional System Resource Affinity Table (SRAT) provides the boot
 time description of the processor and memory ranges belonging to a
 system locality. OSPM will consume the SRAT only at boot time. OSPM
 should use _PXM for any devices that are hot-added into the system after
 boot up.
 
 
 So in this sense, the kernel is correct (kernel only uses _PXM at hot-plug 
 time)
 , and qemu/Seabios should have _PXM methods for hot operations.

in terms of RFC SHOULD doesn't mean MUST, and in my interpretation of above is
that SRAT parsed once but it doesn't mean that OS should forget data from it.
Anyway we surely can have both in QEMU.

 
  
   So far, qemu/seabios do not provide this
   method for CPUs. So regardless of kernel behaviour, it is a good idea to 
   add
   this _PXM method. Since ACPI table generation has recently been moved from
   seabios to qemu, we do this in qemu.
   
   Note that the above hot-remove/hot-add scenario has been tested on an 
   older
   qemu + non-upstreamed patches for cpu hot-removal support, and not on qemu
   master (since cpu-del support is still not on master). The only testing 
   done
   with qemu/seabios master and this patch, are successful boots of 
   multi-node
   linux and windows8 guests.
   
   For the initial discussion on seabios and linux-acpi lists see
   http://www.spinics.net/lists/linux-acpi/msg47058.html
   
   Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
   Reviewed-by: Thilo Fromm t...@thilo-fromm.de
  
  Even if this is a linux bug, I have no issue with working around
  it in qemu.
  
  But I think proper testing needs to be done with rebased upport for cpu-del.
 
 Ok, I can try to rebase cpu-del support for testing. If there are cpu-del bits
 already somewhere (Igor?) and not merged yet, please point me to them.
 
  
   ---
hw/i386/acpi-build.c  |2 ++
hw/i386/ssdt-proc.dsl |2 ++
2 files changed, 4 insertions(+)
   
   diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
   index 6cfa044..9373f5e 100644
   --- a/hw/i386/acpi-build.c
   +++ b/hw/i386/acpi-build.c
   @@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val)
#define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2)
#define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4)
#define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start)
   +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start)
#define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start)
#define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start)

   @@ -724,6 +725,7 @@ build_ssdt(GArray 

[Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects

2013-11-07 Thread Vasilis Liaskovitis
This patch adds a _PXM method to ACPI CPU objects for the pc machine. The _PXM
value is derived from the passed in guest info, same way as CPU SRAT entries.

The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when
using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux
guest kernel parses the SRAT CPU entries at boot time and stores them in the
array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel
resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel commit
c4c60524). When the removed cpu is hot-added again, the linux kernel looks up
the hot-added cpu object's _PXM method instead of somehow re-discovering the
SRAT entry info. With current qemu/seabios, the _PXM method is not found, and
the CPU is thus hot-plugged in the default NUMA node 0. (The problem does not
show up on initial hotplug of a cpu; the PXM method is still not found in this
case, but the kernel still has the correct proximity value from the CPU's SRAT
entry stored in __apicid_to_node)

ACPI spec mentions that the _PXM method is the correct way to determine
proximity information at hot-add time. So far, qemu/seabios do not provide this
method for CPUs. So regardless of kernel behaviour, it is a good idea to add
this _PXM method. Since ACPI table generation has recently been moved from
seabios to qemu, we do this in qemu.

Note that the above hot-remove/hot-add scenario has been tested on an older
qemu + non-upstreamed patches for cpu hot-removal support, and not on qemu
master (since cpu-del support is still not on master). The only testing done
with qemu/seabios master and this patch, are successful boots of multi-node
linux and windows8 guests.

For the initial discussion on seabios and linux-acpi lists see
http://www.spinics.net/lists/linux-acpi/msg47058.html

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
Reviewed-by: Thilo Fromm t...@thilo-fromm.de
---
 hw/i386/acpi-build.c  |2 ++
 hw/i386/ssdt-proc.dsl |2 ++
 2 files changed, 4 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 6cfa044..9373f5e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val)
 #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2)
 #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4)
 #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start)
+#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start)
 #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start)
 #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start)
 
@@ -724,6 +725,7 @@ build_ssdt(GArray *table_data, GArray *linker,
 proc[ACPI_PROC_OFFSET_CPUHEX+1] = acpi_get_hex(i);
 proc[ACPI_PROC_OFFSET_CPUID1] = i;
 proc[ACPI_PROC_OFFSET_CPUID2] = i;
+proc[ACPI_PROC_OFFSET_CPUPXM] = guest_info-node_cpu[i];
 }
 
 /* build this code:
diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl
index 8229bfd..7eef8b2 100644
--- a/hw/i386/ssdt-proc.dsl
+++ b/hw/i386/ssdt-proc.dsl
@@ -47,6 +47,8 @@ DefinitionBlock (ssdt-proc.aml, SSDT, 0x01, BXPC, 
BXSSDT, 0x1)
  * also updating the C code.
  */
 Name(_HID, ACPI0007)
+ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_pxm
+Name(_PXM, 0xAA)
 External(CPMA, MethodObj)
 External(CPST, MethodObj)
 External(CPEJ, MethodObj)
-- 
1.7.10.4




Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects

2013-11-07 Thread Michael S. Tsirkin
On Thu, Nov 07, 2013 at 01:41:59PM +0100, Vasilis Liaskovitis wrote:
 This patch adds a _PXM method to ACPI CPU objects for the pc machine. The _PXM
 value is derived from the passed in guest info, same way as CPU SRAT entries.
 
 The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when
 using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux
 guest kernel parses the SRAT CPU entries at boot time and stores them in the
 array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel
 resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel commit
 c4c60524). When the removed cpu is hot-added again, the linux kernel looks up
 the hot-added cpu object's _PXM method instead of somehow re-discovering the
 SRAT entry info. With current qemu/seabios, the _PXM method is not found, and
 the CPU is thus hot-plugged in the default NUMA node 0. (The problem does not
 show up on initial hotplug of a cpu; the PXM method is still not found in this
 case, but the kernel still has the correct proximity value from the CPU's SRAT
 entry stored in __apicid_to_node)
 
 ACPI spec mentions that the _PXM method is the correct way to determine
 proximity information at hot-add time.

Where does it say this?
I found this:
If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically
added processor is not present in the System Resource Affinity Table
(SRAT), a _PXM object must exist for the processor’s device or one of
its ancestors in the ACPI Namespace.

Does this mean that linux is buggy, and should be fixed up to look up
the apic ID in SRAT?

 So far, qemu/seabios do not provide this
 method for CPUs. So regardless of kernel behaviour, it is a good idea to add
 this _PXM method. Since ACPI table generation has recently been moved from
 seabios to qemu, we do this in qemu.
 
 Note that the above hot-remove/hot-add scenario has been tested on an older
 qemu + non-upstreamed patches for cpu hot-removal support, and not on qemu
 master (since cpu-del support is still not on master). The only testing done
 with qemu/seabios master and this patch, are successful boots of multi-node
 linux and windows8 guests.
 
 For the initial discussion on seabios and linux-acpi lists see
 http://www.spinics.net/lists/linux-acpi/msg47058.html
 
 Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
 Reviewed-by: Thilo Fromm t...@thilo-fromm.de

Even if this is a linux bug, I have no issue with working around
it in qemu.

But I think proper testing needs to be done with rebased upport for cpu-del.

 ---
  hw/i386/acpi-build.c  |2 ++
  hw/i386/ssdt-proc.dsl |2 ++
  2 files changed, 4 insertions(+)
 
 diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
 index 6cfa044..9373f5e 100644
 --- a/hw/i386/acpi-build.c
 +++ b/hw/i386/acpi-build.c
 @@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val)
  #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2)
  #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4)
  #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start)
 +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start)
  #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start)
  #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start)
  
 @@ -724,6 +725,7 @@ build_ssdt(GArray *table_data, GArray *linker,
  proc[ACPI_PROC_OFFSET_CPUHEX+1] = acpi_get_hex(i);
  proc[ACPI_PROC_OFFSET_CPUID1] = i;
  proc[ACPI_PROC_OFFSET_CPUID2] = i;
 +proc[ACPI_PROC_OFFSET_CPUPXM] = guest_info-node_cpu[i];
  }
  
  /* build this code:
 diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl
 index 8229bfd..7eef8b2 100644
 --- a/hw/i386/ssdt-proc.dsl
 +++ b/hw/i386/ssdt-proc.dsl
 @@ -47,6 +47,8 @@ DefinitionBlock (ssdt-proc.aml, SSDT, 0x01, BXPC, 
 BXSSDT, 0x1)
   * also updating the C code.
   */
  Name(_HID, ACPI0007)
 +ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_pxm
 +Name(_PXM, 0xAA)

The ACPI spec says this should be a DWORD value:

Return Value:
An Integer (DWORD) containing a proximity domain identifier.


  External(CPMA, MethodObj)
  External(CPST, MethodObj)
  External(CPEJ, MethodObj)
 -- 
 1.7.10.4