[PATCH v10 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.

2016-07-26 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm(persistent)
2. apicid (physical cpu id)   <->   nodeid (persistent)
3. cpuid (logical cpu id) <->   apicid (not persistent, now persistent 
by step 2)
4. cpuid (logical cpu id) <->   nodeid (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in 
step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we 
should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 57 +++
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
 *  Extra Processor objects may be enumerated on MP systems with
 *  less than the max # of CPUs. They should be ignored _iff
 *  they are physically not present.
+*
+*  NOTE: Even if the processor has a cpuid, it may not present because
+*  cpuid <-> apicid mapping is persistent now.
 */
-   if (invalid_logical_cpuid(pr->id)) {
+   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id)
+u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENO

[PATCH v10 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.

2016-07-26 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/ia64/kernel/acpi.c   |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 
 drivers/acpi/bus.c|  1 +
 drivers/acpi/processor_core.c | 67 +++
 include/linux/acpi.h  |  3 ++
 6 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..bb36515 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
 #endif
return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 37248c3..0900264f 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
@@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
numa_set_node(cpu, nid);
}
 #endif
+   return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+   return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 262ca31..0fe5f54 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1124,6 +1124,7 @@ static int __init acpi_init(void)
acpi_sleep_proc_init();
acpi_wakeup_device_init();
acpi_debugger_init();
+   acpi_set_processor_mapping();
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..e814cd4 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,73 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int 
*cpuid)
+{
+   int type;
+   u32 acpi_id;
+   acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long tmp;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_get_type(handle, _type);
+   if (ACPI_FAILURE(status))
+   return false;
+
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = object.processor.proc_id;
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = tmp;
+   break;
+   default:
+   return false;
+   }
+
+   

[PATCH v10 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables

2016-07-26 Thread Dou Liyang
[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this:

proc_id   |pxm

0   <-> 0
1   <-> 0
2   <-> 1
3   <-> 1
89  <-> 0
89  <-> 0
89  <-> 0
89  <-> 1
89  <-> 1
89  <-> 2
89  <-> 3
.

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:

foreach Processor in DSDT
proc_id= get_ACPI_Processor_number(Processor)
if(the proc_id has alreadly existed )
mark both of them as being unreasonable;

The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDs
which mean that the processor objects in question are not valid.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 79 +++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+   int i;
+
+   if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+   return;
+
+   /*
+* Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+* already in the IDs, do nothing.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return;
+   }
+
+   /*
+* Secondly, compare the proc_id with unique IDs, if the proc_id is in
+* the IDs, put it in the duplicate IDs.
+*/
+   for (i = 0; i < nr_unique_ids; i++) {
+   if (unique_processor_ids[i] == proc_id) {
+   duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+   nr_duplicate_ids++;
+   return;
+   }
+   }
+
+   /*
+* Lastly, the proc_id is a unique ID, put it in the unique IDs.
+*/
+   unique_processor_ids[nr_unique_ids] = proc_id;
+   nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+   u32 lvl,
+   void *context,
+   void **rv)
+{
+   acpi_status status;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor object\n");
+   else
+   processor_validated_ids_update(object.processor.proc_id);
+
+   return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+   /* Search all processor nodes in ACPI namespace */
+   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+   ACPI_UINT32_MAX,
+   acpi_processor_ids_walk,
+   NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+   acpi_processor_duplication_valiate();
acpi_scan_add_handler_with_hotplug(_handler, "processor");
acpi_scan_add_handler(_container_handler);
 }
-- 
2.5.5





[PATCH v10 7/7] acpi: Provide the interface to validate the proc_id

2016-07-26 Thread Dou Liyang
When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, the false represents available.

When we establish all possible cpuid <-> nodeid mapping to handle the
cpu hotplugs, we will use the proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we
will stop the mapping.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 16 
 drivers/acpi/processor_core.c |  4 
 include/linux/acpi.h  |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+   int i;
+
+   /*
+* compare the proc_id with duplicate IDs, if the proc_id is already
+* in the duplicate IDs, return true, otherwise, return false.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return true;
+   }
+
+   return false;
+}
+
 void __init acpi_processor_init(void)
 {
acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index e814cd4..830c7ac 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -282,6 +282,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t 
*phys_id, int *cpuid)
if (ACPI_FAILURE(status))
return false;
acpi_id = object.processor.proc_id;
+
+   /* validate the acpi_id */
+   if(acpi_processor_validate_proc_id(acpi_id))
+   return false;
break;
case ACPI_TYPE_DEVICE:
status = acpi_evaluate_integer(handle, "_UID", NULL, );
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 30df63c..11bc794 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -254,6 +254,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
return phys_id == PHYS_CPUID_INVALID;
 }
 
+/* Validate the processor object's proc_id */
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5





[PATCH v10 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-07-26 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed

[PATCH v10 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-26 Thread Dou Liyang
5/19/212
https://lkml.org/lkml/2016/7/19/181
https://lkml.org/lkml/2016/7/25/99

Change log v9 -> v10:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset. In patch 5.
2. Fix auto build test ERROR on ia64/next. In patch 5.
3. Fix some comment.

Change log v8 -> v9:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset.

Change log v7 -> v8:
1. Provide the mechanism to validate processors in the ACPI tables.
2. Provide the interface to validate the proc_id when setting the mapping. 

Change log v6 -> v7:
1. Fix arm64 build failure.

Change log v5 -> v6:
1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build 
zonelists
   before per cpu areas were initialized.

Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall 
   back to proper nodes automatically.

Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 
2.
3. Fix some comment.

Dou Liyang (2):
  acpi: Provide the mechanism to validate processors in the ACPI tables
  acpi: Provide the interface to validate the proc_id

Gu Zheng (4):
  x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
boot time.
  x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
persistent cpuid <-> apicid mapping.
  x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
booting.

Tang Chen (1):
  x86, memhp, numa: Online memory-less nodes at boot time.

 arch/ia64/kernel/acpi.c   |   3 +-
 arch/x86/include/asm/mpspec.h |   1 +
 arch/x86/kernel/acpi/boot.c   |  10 ++--
 arch/x86/kernel/apic/apic.c   |  85 +---
 arch/x86/mm/numa.c|  27 +
 drivers/acpi/acpi_processor.c | 105 +-
 drivers/acpi/bus.c|   1 +
 drivers/acpi/processor_core.c | 128 +++---
 include/linux/acpi.h  |   6 ++
 9 files changed, 315 insertions(+), 51 deletions(-)

-- 
2.5.5





[PATCH v10 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.

2016-07-26 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  6 ++---
 arch/x86/kernel/apic/apic.c   | 61 ---
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE  BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9414f84..37248c3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
return -EINVAL;
}
 
-   if (!enabled) {
+   if (!enabled)
++disabled_cpus;
-   return -EINVAL;
-   }
 
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   return generic_processor_info(id, ver);
+   return __generic_processor_info(id, ver, enabled);
 }
 
 static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8e3c377..366fbbc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+   int i;
+
+   /*
+* cpuid <-> apicid mapping is persistent, so when a cpu is up,
+* check if the kernel has allocated a cpuid for it.
+*/
+   for (i = 0; i < nr_logical_cpuids; i++) {
+   if (cpuid_to_apicid[i] == apicid)
+   return i;
+   }
+
+   /* Allocate a new cpuid. */
+   if (nr_logical_cpuids >= nr_cpu_ids) {
+   WARN_ONCE(1, "Only %d processors supported."
+"Processor %d/0x%x and the rest are ignored.\n",
+nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+   return -1;
+   }
+
+   cpuid_to_apicid[nr_logical_cpuids] = apicid;
+   return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2079,

[PATCH v10 1/7] x86, memhp, numa: Online memory-less nodes at boot time.

2016-07-26 Thread Dou Liyang
From: Tang Chen <tangc...@cn.fujitsu.com>

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a persistent mapping. It means
that the cpu will be mapped to the node which it belongs to, and will never
be changed. If a node has only cpus but no memory, the cpus on it will be
mapped to a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9c086c5..2a87a28 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -723,22 +723,19 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-   int n, val;
-   int min_val = INT_MAX;
-   int best_node = -1;
+   unsigned long zones_size[MAX_NR_ZONES] = {0};
+   unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-   for_each_online_node(n) {
-   val = node_distance(node, n);
+   /* Allocate and initialize node data. Memory-less node is now online.*/
+   alloc_node_data(nid);
+   free_area_init_node(nid, zones_size, 0, zholes_size);
 
-   if (val < min_val) {
-   min_val = val;
-   best_node = n;
-   }
-   }
-
-   return best_node;
+   /*
+* All zonelists will be built later in start_kernel() after per cpu
+* areas are initialized.
+*/
 }
 
 /*
@@ -767,8 +764,10 @@ void __init init_cpu_to_node(void)
 
if (node == NUMA_NO_NODE)
continue;
+
if (!node_online(node))
-   node = find_near_online_node(node);
+   init_memory_less_node(node);
+
numa_set_node(cpu, node);
}
 }
-- 
2.5.5





Re: [PATCH v9 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-25 Thread Dou Liyang



在 2016年07月26日 07:20, Andrew Morton 写道:

On Mon, 25 Jul 2016 16:35:42 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.


Plan B is to hunt down and fix up all the workqueue structures at
hotplug-time.  Has that option been evaluated?



Yes, the option has been evaluate in this patch:
http://www.gossamer-threads.com/lists/linux/kernel/2116748



Your fix is x86-only and this bug presumably affects other
architectures, yes?I think a "Plan B" would fix all architectures?



Yes, the bug may presumably affect few architectures which support CPU 
hotplug and NUMA.


We have sent the "Plan B" in our community and got a lot of advice and 
ideas. Based on these suggestions, We carefully balance that two plan. 
Then we choice the first.




Thirdly, what is the merge path for these patches?  Is an x86
or ACPI maintainer working with you on them?


Yes, we get a lot of guidance and help from RJ who is an ACPI maintainer.


Thanks,

Dou




Re: [PATCH v9 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-26 Thread Dou Liyang

Hi, RJ

在 2016年07月26日 19:53, Rafael J. Wysocki 写道:

On Tuesday, July 26, 2016 11:59:38 AM Dou Liyang wrote:


在 2016年07月26日 07:20, Andrew Morton 写道:

On Mon, 25 Jul 2016 16:35:42 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.


Plan B is to hunt down and fix up all the workqueue structures at
hotplug-time.  Has that option been evaluated?



Yes, the option has been evaluate in this patch:
http://www.gossamer-threads.com/lists/linux/kernel/2116748



Your fix is x86-only and this bug presumably affects other
architectures, yes?I think a "Plan B" would fix all architectures?



Yes, the bug may presumably affect few architectures which support CPU
hotplug and NUMA.

We have sent the "Plan B" in our community and got a lot of advice and
ideas. Based on these suggestions, We carefully balance that two plan.
Then we choice the first.



Thirdly, what is the merge path for these patches?  Is an x86
or ACPI maintainer working with you on them?


Yes, we get a lot of guidance and help from RJ who is an ACPI maintainer.


FWIW, the patches are fine by me from the ACPI perspective.

If you want me to apply them, though, ACKs from the x86 and mm maintainers
will be necessary.



I will continue to investigate this bug and wait for maintainers's  advices.


Thanks,
Rafael





Thanks.
Dou




[PATCH v9 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.

2016-07-25 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/ia64/kernel/acpi.c   |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 +++
 drivers/acpi/bus.c|  1 +
 drivers/acpi/processor_core.c | 72 +++
 include/linux/acpi.h  |  2 ++
 6 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..bb36515 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
 #endif
return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 37248c3..0900264f 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
@@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
numa_set_node(cpu, nid);
}
 #endif
+   return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+   return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 262ca31..0fe5f54 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1124,6 +1124,7 @@ static int __init acpi_init(void)
acpi_sleep_proc_init();
acpi_wakeup_device_init();
acpi_debugger_init();
+   acpi_set_processor_mapping();
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..b44675b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,78 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int 
*cpuid)
+{
+   int type;
+   u32 acpi_id;
+   acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long tmp;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_get_type(handle, _type);
+   if (ACPI_FAILURE(status))
+   return false;
+
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = object.processor.proc_id;
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = tmp;
+   break;
+   default:
+   return false;
+   }
+
+   

[PATCH v9 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-25 Thread Dou Liyang
/2016/5/19/212
https://lkml.org/lkml/2016/7/19/181

Change log v8 -> v9:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset.

Change log v7 -> v8:
1. Provide the mechanism to validate processors in the ACPI tables.
2. Provide the interface to validate the proc_id when setting the mapping. 

Change log v6 -> v7:
1. Fix arm64 build failure.

Change log v5 -> v6:
1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build 
zonelists
   before per cpu areas were initialized.

Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall 
   back to proper nodes automatically.

Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 
2.
3. Fix some comment.

Dou Liyang (2):
  acpi: Provide the mechanism to validate processors in the ACPI tables
  acpi: Provide the interface to validate the proc_id

Gu Zheng (4):
  x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
boot time.
  x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
persistent cpuid <-> apicid mapping.
  x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
booting.

Tang Chen (1):
  x86, memhp, numa: Online memory-less nodes at boot time.

 arch/ia64/kernel/acpi.c   |   3 +-
 arch/x86/include/asm/mpspec.h |   1 +
 arch/x86/kernel/acpi/boot.c   |  10 ++--
 arch/x86/kernel/apic/apic.c   |  85 ---
 arch/x86/mm/numa.c|  27 +
 drivers/acpi/acpi_processor.c | 105 -
 drivers/acpi/bus.c|   1 +
 drivers/acpi/processor_core.c | 133 +++---
 include/linux/acpi.h  |   5 ++
 9 files changed, 319 insertions(+), 51 deletions(-)

-- 
2.5.5





[PATCH v9 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables

2016-07-25 Thread Dou Liyang
[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this:

proc_id   |pxm

0   <-> 0
1   <-> 0
2   <-> 1
3   <-> 1
89  <-> 0
89  <-> 0
89  <-> 0
89  <-> 1
89  <-> 1
89  <-> 2
89  <-> 3
.

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:

foreach Processor in DSDT
proc_id= get_ACPI_Processor_number(Processor)
if(the proc_id has alreadly existed )
mark both of them as being unreasonable;

The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDs
which mean that the processor objects in question are not valid.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 79 +++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+   int i;
+
+   if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+   return;
+
+   /*
+* Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+* already in the IDs, do nothing.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return;
+   }
+
+   /*
+* Secondly, compare the proc_id with unique IDs, if the proc_id is in
+* the IDs, put it in the duplicate IDs.
+*/
+   for (i = 0; i < nr_unique_ids; i++) {
+   if (unique_processor_ids[i] == proc_id) {
+   duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+   nr_duplicate_ids++;
+   return;
+   }
+   }
+
+   /*
+* Lastly, the proc_id is a unique ID, put it in the unique IDs.
+*/
+   unique_processor_ids[nr_unique_ids] = proc_id;
+   nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+   u32 lvl,
+   void *context,
+   void **rv)
+{
+   acpi_status status;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor object\n");
+   else
+   processor_validated_ids_update(object.processor.proc_id);
+
+   return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+   /* Search all processor nodes in ACPI namespace */
+   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+   ACPI_UINT32_MAX,
+   acpi_processor_ids_walk,
+   NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+   acpi_processor_duplication_valiate();
acpi_scan_add_handler_with_hotplug(_handler, "processor");
acpi_scan_add_handler(_container_handler);
 }
-- 
2.5.5





[PATCH v9 1/7] x86, memhp, numa: Online memory-less nodes at boot time.

2016-07-25 Thread Dou Liyang
From: Tang Chen <tangc...@cn.fujitsu.com>

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a persistent mapping. It means
that the cpu will be mapped to the node which it belongs to, and will never
be changed. If a node has only cpus but no memory, the cpus on it will be
mapped to a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9c086c5..2a87a28 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -723,22 +723,19 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-   int n, val;
-   int min_val = INT_MAX;
-   int best_node = -1;
+   unsigned long zones_size[MAX_NR_ZONES] = {0};
+   unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-   for_each_online_node(n) {
-   val = node_distance(node, n);
+   /* Allocate and initialize node data. Memory-less node is now online.*/
+   alloc_node_data(nid);
+   free_area_init_node(nid, zones_size, 0, zholes_size);
 
-   if (val < min_val) {
-   min_val = val;
-   best_node = n;
-   }
-   }
-
-   return best_node;
+   /*
+* All zonelists will be built later in start_kernel() after per cpu
+* areas are initialized.
+*/
 }
 
 /*
@@ -767,8 +764,10 @@ void __init init_cpu_to_node(void)
 
if (node == NUMA_NO_NODE)
continue;
+
if (!node_online(node))
-   node = find_near_online_node(node);
+   init_memory_less_node(node);
+
numa_set_node(cpu, node);
}
 }
-- 
2.5.5





[PATCH v9 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-07-25 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed

[PATCH v9 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.

2016-07-25 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm(persistent)
2. apicid (physical cpu id)   <->   nodeid (persistent)
3. cpuid (logical cpu id) <->   apicid (not persistent, now persistent 
by step 2)
4. cpuid (logical cpu id) <->   nodeid (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in 
step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we 
should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 57 +++
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
 *  Extra Processor objects may be enumerated on MP systems with
 *  less than the max # of CPUs. They should be ignored _iff
 *  they are physically not present.
+*
+*  NOTE: Even if the processor has a cpuid, it may not present because
+*  cpuid <-> apicid mapping is persistent now.
 */
-   if (invalid_logical_cpuid(pr->id)) {
+   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id)
+u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENO

[PATCH v9 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.

2016-07-25 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  6 ++---
 arch/x86/kernel/apic/apic.c   | 61 ---
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE  BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9414f84..37248c3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
return -EINVAL;
}
 
-   if (!enabled) {
+   if (!enabled)
++disabled_cpus;
-   return -EINVAL;
-   }
 
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   return generic_processor_info(id, ver);
+   return __generic_processor_info(id, ver, enabled);
 }
 
 static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8e3c377..366fbbc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+   int i;
+
+   /*
+* cpuid <-> apicid mapping is persistent, so when a cpu is up,
+* check if the kernel has allocated a cpuid for it.
+*/
+   for (i = 0; i < nr_logical_cpuids; i++) {
+   if (cpuid_to_apicid[i] == apicid)
+   return i;
+   }
+
+   /* Allocate a new cpuid. */
+   if (nr_logical_cpuids >= nr_cpu_ids) {
+   WARN_ONCE(1, "Only %d processors supported."
+"Processor %d/0x%x and the rest are ignored.\n",
+nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+   return -1;
+   }
+
+   cpuid_to_apicid[nr_logical_cpuids] = apicid;
+   return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2079,

[PATCH v9 7/7] acpi: Provide the interface to validate the proc_id

2016-07-25 Thread Dou Liyang
When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, the false represents available.

When we establish all possible cpuid <-> nodeid mapping to handle the
cpu hotplugs, we will use the proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we
will stop the mapping.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 16 
 drivers/acpi/processor_core.c |  4 
 include/linux/acpi.h  |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+   int i;
+
+   /*
+* compare the proc_id with duplicate IDs, if the proc_id is already
+* in the duplicate IDs, return true, otherwise, return false.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return true;
+   }
+
+   return false;
+}
+
 void __init acpi_processor_init(void)
 {
acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index b44675b..97adffb 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -282,6 +282,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t 
*phys_id, int *cpuid)
if (ACPI_FAILURE(status))
return false;
acpi_id = object.processor.proc_id;
+
+   /* validate the acpi_id */
+   if(acpi_processor_validate_proc_id(acpi_id))
+   return false;
break;
case ACPI_TYPE_DEVICE:
status = acpi_evaluate_integer(handle, "_UID", NULL, );
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 53b3014..94ceae1 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -254,6 +254,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
return phys_id == PHYS_CPUID_INVALID;
 }
 
+/*validate the processor object's proc_id*/
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5





[PATCH v8 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-19 Thread Dou Liyang
ange log v7 -> v8:
1. Provide the mechanism to validate processors in the ACPI tables.
2. Provide the interface to validate the proc_id when setting the mapping. 

Change log v6 -> v7:
1. Fix arm64 build failure.

Change log v5 -> v6:
1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build 
zonelists
   before per cpu areas were initialized.

Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall 
   back to proper nodes automatically.

Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 
2.
3. Fix some comment.

Dou Liyang (2):
  Provide the mechanism to validate processors in the ACPI tables
  Provide the interface to validate the proc_id which they give

Gu Zheng (4):
  x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
boot time.
  x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
persistent cpuid <-> apicid mapping.
  x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
booting.

Tang Chen (1):
  x86, memhp, numa: Online memory-less nodes at boot time.

 arch/ia64/kernel/acpi.c   |   3 +-
 arch/x86/include/asm/mpspec.h |   1 +
 arch/x86/kernel/acpi/boot.c   |  10 ++--
 arch/x86/kernel/apic/apic.c   |  85 +---
 arch/x86/mm/numa.c|  27 +
 drivers/acpi/acpi_processor.c | 105 ++-
 drivers/acpi/bus.c|   3 +
 drivers/acpi/processor_core.c | 126 +++---
 include/linux/acpi.h  |   5 ++
 9 files changed, 314 insertions(+), 51 deletions(-)

-- 
2.5.5





[PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.

2016-07-19 Thread Dou Liyang
From: Tang Chen <tangc...@cn.fujitsu.com>

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a 1-1 mapping. It means the
cpu will be mapped to the node it belongs to, and will never be changed.
If a node has only cpus but no memory, the cpus on it will be mapped to
a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9c086c5..2a87a28 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -723,22 +723,19 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-   int n, val;
-   int min_val = INT_MAX;
-   int best_node = -1;
+   unsigned long zones_size[MAX_NR_ZONES] = {0};
+   unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-   for_each_online_node(n) {
-   val = node_distance(node, n);
+   /* Allocate and initialize node data. Memory-less node is now online.*/
+   alloc_node_data(nid);
+   free_area_init_node(nid, zones_size, 0, zholes_size);
 
-   if (val < min_val) {
-   min_val = val;
-   best_node = n;
-   }
-   }
-
-   return best_node;
+   /*
+* All zonelists will be built later in start_kernel() after per cpu
+* areas are initialized.
+*/
 }
 
 /*
@@ -767,8 +764,10 @@ void __init init_cpu_to_node(void)
 
if (node == NUMA_NO_NODE)
continue;
+
if (!node_online(node))
-   node = find_near_online_node(node);
+   init_memory_less_node(node);
+
numa_set_node(cpu, node);
}
 }
-- 
2.5.5





[PATCH v8 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.

2016-07-19 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  6 ++---
 arch/x86/kernel/apic/apic.c   | 61 ---
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE  BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9414f84..37248c3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
return -EINVAL;
}
 
-   if (!enabled) {
+   if (!enabled)
++disabled_cpus;
-   return -EINVAL;
-   }
 
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   return generic_processor_info(id, ver);
+   return __generic_processor_info(id, ver, enabled);
 }
 
 static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8e3c377..366fbbc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+   int i;
+
+   /*
+* cpuid <-> apicid mapping is persistent, so when a cpu is up,
+* check if the kernel has allocated a cpuid for it.
+*/
+   for (i = 0; i < nr_logical_cpuids; i++) {
+   if (cpuid_to_apicid[i] == apicid)
+   return i;
+   }
+
+   /* Allocate a new cpuid. */
+   if (nr_logical_cpuids >= nr_cpu_ids) {
+   WARN_ONCE(1, "Only %d processors supported."
+"Processor %d/0x%x and the rest are ignored.\n",
+nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+   return -1;
+   }
+
+   cpuid_to_apicid[nr_logical_cpuids] = apicid;
+   return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2079,

[PATCH v8 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-07-19 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed

[PATCH v8 6/7] Provide the mechanism to validate processors in the ACPI tables

2016-07-19 Thread Dou Liyang
[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this: 

proc_id   |pxm

0  <->  0
1  <->  0
2   <-> 1
3  <->  1
89  <-> 0
89  <-> 0
89  <-> 0
89  <-> 1
89  <-> 1
89  <-> 2
89  <-> 3
.

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:
 
 foreach Processor in DSDT
  proc_id= get_ACPI_Processor_number(Processor)
   if(the proc_id has alreadly existed )
 mark both of them as being unreasonable;

The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDS
which mean that the processor objects in question are not valid. 

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 79 +++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+   int i;
+
+   if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+   return;
+
+   /*
+* Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+* already in the IDs, do nothing.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return;
+   }
+
+   /*
+* Secondly, compare the proc_id with unique IDs, if the proc_id is in
+* the IDs, put it in the duplicate IDs.
+*/
+   for (i = 0; i < nr_unique_ids; i++) {
+   if (unique_processor_ids[i] == proc_id) {
+   duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+   nr_duplicate_ids++;
+   return;
+   }
+   }
+
+   /*
+* Lastly, the proc_id is a unique ID, put it in the unique IDs.
+*/
+   unique_processor_ids[nr_unique_ids] = proc_id;
+   nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+   u32 lvl,
+   void *context,
+   void **rv)
+{
+   acpi_status status;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor object\n");
+   else
+   processor_validated_ids_update(object.processor.proc_id);
+
+   return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+   /* Search all processor nodes in ACPI namespace */
+   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+   ACPI_UINT32_MAX,
+   acpi_processor_ids_walk,
+   NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+   acpi_processor_duplication_valiate();
acpi_scan_add_handler_with_hotplug(_handler, "processor");
acpi_scan_add_handler(_container_handler);
 }
-- 
2.5.5





[PATCH v8 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.

2016-07-19 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm(persistent)
2. apicid (physical cpu id)   <->   nodeid (persistent)
3. cpuid (logical cpu id) <->   apicid (not persistent, now persistent 
by step 2)
4. cpuid (logical cpu id) <->   nodeid (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in 
step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we 
should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 57 +++
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
 *  Extra Processor objects may be enumerated on MP systems with
 *  less than the max # of CPUs. They should be ignored _iff
 *  they are physically not present.
+*
+*  NOTE: Even if the processor has a cpuid, it may not present because
+*  cpuid <-> apicid mapping is persistent now.
 */
-   if (invalid_logical_cpuid(pr->id)) {
+   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id)
+u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENO

[PATCH v8 7/7] Provide the interface to validate the proc_id which they give

2016-07-19 Thread Dou Liyang
When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, false represents available.

When we establish all possible cpuid <-> nodeid mapping, we will use the
proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we will
stop the mapping.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 16 
 drivers/acpi/processor_core.c |  4 
 include/linux/acpi.h  |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+   int i;
+
+   /*
+* compare the proc_id with duplicate IDs, if the proc_id is already
+* in the duplicate IDs, return true, otherwise, return false.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return true;
+   }
+
+   return false;
+}
+
 void __init acpi_processor_init(void)
 {
acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 69fb027..b8fad20 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -282,6 +282,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t 
*phys_id, int *cpuid)
if (ACPI_FAILURE(status))
return false;
acpi_id = object.processor.proc_id;
+
+   /* validate the acpi_id */
+   if(acpi_processor_validate_proc_id(acpi_id))
+   return false;
break;
case ACPI_TYPE_DEVICE:
status = acpi_evaluate_integer(handle, "_UID", NULL, );
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 53b3014..94ceae1 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -254,6 +254,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
return phys_id == PHYS_CPUID_INVALID;
 }
 
+/*validate the processor object's proc_id*/
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5





[PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.

2016-07-19 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/ia64/kernel/acpi.c   |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 
 drivers/acpi/bus.c|  3 ++
 drivers/acpi/processor_core.c | 65 +++
 include/linux/acpi.h  |  2 ++
 6 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..bb36515 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
 #endif
return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 37248c3..0900264f 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
@@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
numa_set_node(cpu, nid);
}
 #endif
+   return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+   return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 262ca31..d8b7272 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
acpi_sleep_proc_init();
acpi_wakeup_device_init();
acpi_debugger_init();
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+   acpi_set_processor_mapping();
+#endif
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..69fb027 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,71 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int 
*cpuid)
+{
+   int type;
+   u32 acpi_id;
+   acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long tmp;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_get_type(handle, _type);
+   if (ACPI_FAILURE(status))
+   return false;
+
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = object.processor.proc_id;
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = tmp;
+   break;
+   defaul

Re: [PATCH v8 7/7] Provide the interface to validate the proc_id which they give

2016-07-19 Thread Dou Liyang



在 2016年07月20日 02:53, Tejun Heo 写道:

On Tue, Jul 19, 2016 at 03:28:08PM +0800, Dou Liyang wrote:

When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, false represents available.

When we establish all possible cpuid <-> nodeid mapping, we will use the
proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we will
stop the mapping.

The patch title probably should include "acpi:" header.  I can't tell
much about the specifics of the acpi changes but I think this is the
right approach for handling cpu hotplugs.


I will change the title in the next version.

Thanks.

Thanks.







Re: [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.

2016-07-19 Thread Dou Liyang



在 2016年07月20日 04:06, Rafael J. Wysocki 写道:

On Tuesday, July 19, 2016 03:28:06 PM Dou Liyang wrote:

From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
  arch/ia64/kernel/acpi.c   |  3 +-
  arch/x86/kernel/acpi/boot.c   |  4 ++-
  drivers/acpi/acpi_processor.c |  5 
  drivers/acpi/bus.c|  3 ++
  drivers/acpi/processor_core.c | 65 +++
  include/linux/acpi.h  |  2 ++
  6 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..bb36515 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
   *  ACPI based hotplug CPU support
   */
  #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
  {
  #ifdef CONFIG_ACPI_NUMA
/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
  #endif
return 0;
  }
+EXPORT_SYMBOL(acpi_map_cpu2node);
  
  int additional_cpus __initdata = -1;
  
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c

index 37248c3..0900264f 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
  #ifdef CONFIG_ACPI_HOTPLUG_CPU
  #include 
  
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)

+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
  {
  #ifdef CONFIG_ACPI_NUMA
int nid;
@@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
numa_set_node(cpu, nid);
}
  #endif
+   return 0;
  }
+EXPORT_SYMBOL(acpi_map_cpu2node);
  
  int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)

  {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
  
  void __weak arch_unregister_cpu(int cpu) {}
  
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)

+{
+   return -ENODEV;
+}
+
  static int acpi_processor_hotadd_init(struct acpi_processor *pr)
  {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 262ca31..d8b7272 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
acpi_sleep_proc_init();
acpi_wakeup_device_init();
acpi_debugger_init();
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+   acpi_set_processor_mapping();
+#endif

This doesn't look nice.

What about providing an empty definition of acpi_set_processor_mapping()
for CONFIG_ACPI_HOTPLUG_CPU unset?


Good,  I  will do it.

Thanks,
Dou




return 0;
  }

Thanks,
Rafael









Re: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.

2016-07-19 Thread Dou Liyang

在 2016年07月20日 02:50, Tejun Heo 写道:


Hello,

On Tue, Jul 19, 2016 at 03:28:02PM +0800, Dou Liyang wrote:

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a 1-1 mapping. It means the

1-1 mapping means that each cpu is mapped to its own private node
which isn't the case.  Just call it a persistent mapping?


Yes, for cpus, each cpu is in a persistent node.
However, the opposite is not that.

I will modify it.

Thanks.
Dou




[PATCH v11 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.

2016-08-08 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

Change log v10 -> v11:
  deal with the "disabled_cpus" parameter in the same place.

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  7 +
 arch/x86/kernel/apic/apic.c   | 61 ---
 3 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE  BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9414f84..1f11463 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,10 @@ static int acpi_register_lapic(int id, u8 enabled)
return -EINVAL;
}
 
-   if (!enabled) {
-   ++disabled_cpus;
-   return -EINVAL;
-   }
-
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   return generic_processor_info(id, ver);
+   return __generic_processor_info(id, ver, enabled);
 }
 
 static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 4a3ee90..d1aba32 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+   int i;
+
+   /*
+* cpuid <-> apicid mapping is persistent, so when a cpu is up,
+* check if the kernel has allocated a cpuid for it.
+*/
+   for (i = 0; i < nr_logical_cpuids; i++) {
+   if (cpuid_to_apicid[i] == apicid)
+   return i;
+   }
+
+   /* Allocate a new cpuid. */
+   if (nr_logical_cpuids >= nr_cpu_ids) {
+   WARN_ONCE(1, "Only %d processors supported."
+"Processor %d/0x%x and the rest are ignored.\n",
+nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+   return -1;
+   }
+
+   cpuid_to_apicid[nr_logical_cpuids] = apicid;
+   return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot

[PATCH v11 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables

2016-08-08 Thread Dou Liyang
[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this:

proc_id   |pxm

0   <-> 0
1   <-> 0
2   <-> 1
3   <-> 1
89  <-> 0
89  <-> 0
89  <-> 0
89  <-> 1
89  <-> 1
89  <-> 2
89  <-> 3
.

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:

foreach Processor in DSDT
proc_id= get_ACPI_Processor_number(Processor)
if(the proc_id has alreadly existed )
mark both of them as being unreasonable;

The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDs
which mean that the processor objects in question are not valid.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 79 +++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+   int i;
+
+   if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+   return;
+
+   /*
+* Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+* already in the IDs, do nothing.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return;
+   }
+
+   /*
+* Secondly, compare the proc_id with unique IDs, if the proc_id is in
+* the IDs, put it in the duplicate IDs.
+*/
+   for (i = 0; i < nr_unique_ids; i++) {
+   if (unique_processor_ids[i] == proc_id) {
+   duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+   nr_duplicate_ids++;
+   return;
+   }
+   }
+
+   /*
+* Lastly, the proc_id is a unique ID, put it in the unique IDs.
+*/
+   unique_processor_ids[nr_unique_ids] = proc_id;
+   nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+   u32 lvl,
+   void *context,
+   void **rv)
+{
+   acpi_status status;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor object\n");
+   else
+   processor_validated_ids_update(object.processor.proc_id);
+
+   return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+   /* Search all processor nodes in ACPI namespace */
+   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+   ACPI_UINT32_MAX,
+   acpi_processor_ids_walk,
+   NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+   acpi_processor_duplication_valiate();
acpi_scan_add_handler_with_hotplug(_handler, "processor");
acpi_scan_add_handler(_container_handler);
 }
-- 
2.5.5





[PATCH v11 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.

2016-08-08 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/ia64/kernel/acpi.c   |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 
 drivers/acpi/bus.c|  1 +
 drivers/acpi/processor_core.c | 67 +++
 include/linux/acpi.h  |  3 ++
 6 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..bb36515 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
 #endif
return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 1f11463..69ebb10 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -692,7 +692,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
@@ -703,7 +703,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
numa_set_node(cpu, nid);
}
 #endif
+   return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+   return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 262ca31..0fe5f54 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1124,6 +1124,7 @@ static int __init acpi_init(void)
acpi_sleep_proc_init();
acpi_wakeup_device_init();
acpi_debugger_init();
+   acpi_set_processor_mapping();
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..e814cd4 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,73 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int 
*cpuid)
+{
+   int type;
+   u32 acpi_id;
+   acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long tmp;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_get_type(handle, _type);
+   if (ACPI_FAILURE(status))
+   return false;
+
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = object.processor.proc_id;
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = tmp;
+   break;
+   default:
+   return false;
+   }
+
+   

[PATCH v11 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.

2016-08-08 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm(persistent)
2. apicid (physical cpu id)   <->   nodeid (persistent)
3. cpuid (logical cpu id) <->   apicid (not persistent, now persistent 
by step 2)
4. cpuid (logical cpu id) <->   nodeid (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in 
step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we 
should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 57 +++
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
 *  Extra Processor objects may be enumerated on MP systems with
 *  less than the max # of CPUs. They should be ignored _iff
 *  they are physically not present.
+*
+*  NOTE: Even if the processor has a cpuid, it may not present because
+*  cpuid <-> apicid mapping is persistent now.
 */
-   if (invalid_logical_cpuid(pr->id)) {
+   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id)
+u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENO

[PATCH v11 0/7] Make cpuid <-> nodeid mapping persistent

2016-08-08 Thread Dou Liyang
212
https://lkml.org/lkml/2016/7/19/181
https://lkml.org/lkml/2016/7/25/99
https://lkml.org/lkml/2016/7/26/52

Change log v10 -> v11:
1. Reduce the number of repeat judgment of online/offline
2. Seperate out the functionality in the enable or disable situation

Change log v9 -> v10:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset. In patch 5.
2. Fix auto build test ERROR on ia64/next. In patch 5.
3. Fix some comment.

Change log v8 -> v9:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset.

Change log v7 -> v8:
1. Provide the mechanism to validate processors in the ACPI tables.
2. Provide the interface to validate the proc_id when setting the mapping. 

Change log v6 -> v7:
1. Fix arm64 build failure.

Change log v5 -> v6:
1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build 
zonelists
   before per cpu areas were initialized.

Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall 
   back to proper nodes automatically.

Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 
2.
3. Fix some comment.

Dou Liyang (2):
  acpi: Provide the mechanism to validate processors in the ACPI tables
  acpi: Provide the interface to validate the proc_id

Gu Zheng (4):
  x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
boot time.
  x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
persistent cpuid <-> apicid mapping.
  x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
booting.

Tang Chen (1):
  x86, memhp, numa: Online memory-less nodes at boot time.

 arch/ia64/kernel/acpi.c   |   3 +-
 arch/x86/include/asm/mpspec.h |   1 +
 arch/x86/kernel/acpi/boot.c   |  11 ++--
 arch/x86/kernel/apic/apic.c   |  77 +++--
 arch/x86/mm/numa.c|  27 +
 drivers/acpi/acpi_processor.c | 105 +-
 drivers/acpi/bus.c|   1 +
 drivers/acpi/processor_core.c | 128 +++---
 include/linux/acpi.h  |   6 ++
 9 files changed, 309 insertions(+), 50 deletions(-)

-- 
2.5.5





[PATCH v11 1/7] x86, memhp, numa: Online memory-less nodes at boot time.

2016-08-08 Thread Dou Liyang
From: Tang Chen <tangc...@cn.fujitsu.com>

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a persistent mapping. It means
that the cpu will be mapped to the node which it belongs to, and will never
be changed. If a node has only cpus but no memory, the cpus on it will be
mapped to a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9c086c5..2a87a28 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -723,22 +723,19 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-   int n, val;
-   int min_val = INT_MAX;
-   int best_node = -1;
+   unsigned long zones_size[MAX_NR_ZONES] = {0};
+   unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-   for_each_online_node(n) {
-   val = node_distance(node, n);
+   /* Allocate and initialize node data. Memory-less node is now online.*/
+   alloc_node_data(nid);
+   free_area_init_node(nid, zones_size, 0, zholes_size);
 
-   if (val < min_val) {
-   min_val = val;
-   best_node = n;
-   }
-   }
-
-   return best_node;
+   /*
+* All zonelists will be built later in start_kernel() after per cpu
+* areas are initialized.
+*/
 }
 
 /*
@@ -767,8 +764,10 @@ void __init init_cpu_to_node(void)
 
if (node == NUMA_NO_NODE)
continue;
+
if (!node_online(node))
-   node = find_near_online_node(node);
+   init_memory_less_node(node);
+
numa_set_node(cpu, node);
}
 }
-- 
2.5.5





[PATCH v11 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-08-08 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

Change log v10 -> v11:
  Reduce the number of repeat judgment of online/offline
  Seperate out the functionality in the enable or disable situation

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <g

[PATCH v11 7/7] acpi: Provide the interface to validate the proc_id

2016-08-08 Thread Dou Liyang
When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, the false represents available.

When we establish all possible cpuid <-> nodeid mapping to handle the
cpu hotplugs, we will use the proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we
will stop the mapping.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 16 
 drivers/acpi/processor_core.c |  4 
 include/linux/acpi.h  |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+   int i;
+
+   /*
+* compare the proc_id with duplicate IDs, if the proc_id is already
+* in the duplicate IDs, return true, otherwise, return false.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return true;
+   }
+
+   return false;
+}
+
 void __init acpi_processor_init(void)
 {
acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index e814cd4..830c7ac 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -282,6 +282,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t 
*phys_id, int *cpuid)
if (ACPI_FAILURE(status))
return false;
acpi_id = object.processor.proc_id;
+
+   /* validate the acpi_id */
+   if(acpi_processor_validate_proc_id(acpi_id))
+   return false;
break;
case ACPI_TYPE_DEVICE:
status = acpi_evaluate_integer(handle, "_UID", NULL, );
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 30df63c..11bc794 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -254,6 +254,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
return phys_id == PHYS_CPUID_INVALID;
 }
 
+/* Validate the processor object's proc_id */
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5





Re: [PATCH v10 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-08-02 Thread Dou Liyang

Hi tglx,

在 2016年07月29日 21:36, Thomas Gleixner 写道:

On Tue, 26 Jul 2016, Dou Liyang wrote:


1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to
   let the caller control if disabled cpus are ignored.


If I'm reading the patch correctly then the 'enabled' argument controls more
than the disabled cpus accounting. It also controls the modification of
num_processors and the present mask.


In the patch, they both need mapping to a logic cpu.
As you said, the 'enabled' controls extra functions:

1. num_processors parameter
2. physid_set method
3. set_cpu_present method




-int generic_processor_info(int apicid, int version)
+static int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2032,7 +2032,8 @@ int generic_processor_info(int apicid, int version)
   " Processor %d/0x%x ignored.\n",
   thiscpu, apicid);

-   disabled_cpus++;
+   if (enabled)
+   disabled_cpus++;
return -ENODEV;
}

@@ -2049,7 +2050,8 @@ int generic_processor_info(int apicid, int version)
" reached. Keeping one slot for boot cpu."
"  Processor %d/0x%x ignored.\n", max, thiscpu, apicid);

-   disabled_cpus++;
+   if (enabled)
+   disabled_cpus++;


This is utterly confusing. That code path cannot be reached when enabled is
false, because num_processors is 0 as we never increment it when enabled is
false.

That said, I really do not like this 'slap some argument on it and make it
work somehow' approach.

The proper solution for this is to seperate out the functionality which you
need for the preparation run (enabled = false) and make sure that the
information you need for the real run (enabled = true) is properly cached
somewhere so we don't have to evaluate the same thing over and over.


Thank you very much for your advice. That solution is very good for me.

I thought about the differences between them carefully. Firstly, I
intend to separate out the functionality in two functions. It's simple
but not good. Then, I try to put them together to judge just once.

After, considering the judgment statement independence and the order of
assignment. I remove all the "if (enabled)" code and do the unified
judgment like this:

@@ -2180,12 +2176,19 @@ int __generic_processor_info(int apicid, int
version, bool enabled)
apic->x86_32_early_logical_apicid(cpu);
 #endif
set_cpu_possible(cpu, true);
-   if (enabled)
+
+   if (enabled){
+   num_processors++;
+   physid_set(apicid, phys_cpu_present_map);
set_cpu_present(cpu, true);
+   }else{
+   disabled_cpus++;
+   }

return cpu;
 }

I hope that patch could consistent with your advice. And I will submit
the detailed modification in the next version patches.

Thanks,

Dou.




Re: [PATCH v11 0/7] Make cpuid <-> nodeid mapping persistent

2016-08-17 Thread Dou Liyang

Ping ...
May I ask for some community attention to this series?
My purpose is fixing the memory allocation failure sometimes
when hot-plugging it.

Thanks in advance.
dou

At 08/08/2016 04:37 PM, Dou Liyang wrote:

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.


For previous discussion, please refer to:
h

[PATCH] x86/apic: Fix a typo in a comment line

2017-02-06 Thread Dou Liyang
Add a missing character in the function description.
s/bringin /bringing /

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 5b7e43e..b9c282d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1245,7 +1245,7 @@ static void lapic_setup_esr(void)
 /**
  * setup_local_APIC - setup the local APIC
  *
- * Used to setup local APIC while initializing BSP or bringin up APs.
+ * Used to setup local APIC while initializing BSP or bringing up APs.
  * Always called with preemption disabled.
  */
 void setup_local_APIC(void)
-- 
2.5.5





Re: [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-21 Thread Dou Liyang

Hi, Xiaolong

At 02/21/2017 03:10 PM, Ye Xiaolong wrote:

On 02/21, Ye Xiaolong wrote:

On 02/20, Dou Liyang wrote:

Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.
As we know, It is implemented by the patches as follows: 2532fc318d, f7c28833c2,
8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply speaking:

Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
So, we get the mapping of
*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
each entities. we just use it directly.

So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
step1 and step2:
*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time. Here has already two
bugs we found.
1. Duplicated Processor IDs in DSDT.
It has been fixed by commit 8e089eaa19, fd74da217d.
2. The _PXM in DSDT is inconsistent with the one in MADT.
It may cause the bug, which is shown in:
https://lkml.org/lkml/2017/2/12/200
There may be more later. We shouldn't just only fix them everytime, we should
solve this problem from the source to avoid such problems happend again and
again.

Now, a simple and easy way is found, we revert our patches. Do the Step 2
at hot-plug time, not at booting time where we did some useless work.

It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
To Xiaolong:
Please help me to test it in the special machine.


Got it, I'll queue the tests on the previous machine and let you know the result
once I get it.


Previous kernel panic and incomplete run issue (described in [1]) in 0day
system is gone with this series.



Thanks very much, I am glad to hear that!


Tested-by: Xiaolong Ye <xiaolong...@intel.com>



I will add it in my next version.

Thanks,
Liyang


Here is the comparison:

$ compare -at dc6db24d2476cd09c0ecf2b8d80313539f737a89 
2e61bac54fad4c018afd23c118bce2399e504020
tests: 1
testcase/path_params/tbox_group/run: 
vm-scalability/300-never-never-1-1-swap-w-rand-performance/lkp-hsw-ep2

Here dc6db24d24 is previous first bad commit, 2e61bac54 is the head commit of 
your series
applied on top of latest tip of linus/master c945d0227d ("Merge branch 
'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

dc6db24d2476cd09  2e61bac54fad4c018afd23c118
  --
   fail:runs  %reproductionfail:runs
   | | |
   :12  12%   1:8 last_state.OOM
   :12  12%   1:8 
dmesg.page_allocation_failure:order:#,mode:#(GFP_USER|GFP_DMA32|__GFP_ZERO)
   :12  12%   1:8 dmesg.Mem-Info
 12:12-100%:8 dmesg.BUG:unable_to_handle_kernel
 12:12-100%:8 dmesg.Oops
 12:12-100%:8 dmesg.RIP:get_partial_node
  9:12 -75%:8 dmesg.RIP:_raw_spin_lock_irqsave
  3:12 -25%:8 
dmesg.general_protection_fault:#[##]SMP
  3:12 -25%:8 
dmesg.RIP:native_queued_spin_lock_slowpath
  3:12 -25%:8 
dmesg.Kernel_panic-not_syncing:Hard_LOCKUP
  2:12 -17%:8 dmesg.RIP:load_balance
  2:12 -17%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
  1:12  -8%:8 dmesg.RIP:resched_curr
  1:12  -8%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception
  5:12 -42%:8 
dmesg.WARNING:at_include/linux/uaccess.h:#__probe_kernel_read
  1:12  -8%:8 
dmesg.WARNING:at_lib/list_debug.c:#__list_add


[1] https://lkml.org/lkml/2017/2/12/200

Thanks,
Xiaolong



Thanks,
Xiaolong


Change log:
 v1 -> v2: 1. fix some comments.
   2. add the verification of duplicate processor id.

Dou Liyang (4):
 Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
 Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
 acpi: Fix the check handl

Re: [PATCH 1/2] acpi: Fix the mapping handle in case of declaring processors using the Device operator

2017-02-16 Thread Dou Liyang



At 02/16/2017 09:06 PM, Hanjun Guo wrote:

On 2017/2/16 18:38, Dou Liyang wrote:

In ACPI spec, we can declare processors using both Processor and
Device operator. But now, we just handle the mapping of processors
which are declared by Processor operator.

It misses the processors declared by Device operator.

The patch adds this case of the Device operator.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/processor_core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_core.c
b/drivers/acpi/processor_core.c
index 611a558..1aab5b0 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -344,8 +344,10 @@ void __init acpi_set_processor_mapping(void)
 {
 /* Set persistent cpu <-> node mapping for all processors. */
 acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
-ACPI_UINT32_MAX, set_processor_node_mapping,
-NULL, NULL, NULL);
+ACPI_UINT32_MAX, set_processor_node_mapping,
+NULL, NULL, NULL);


no need to update the code above.


Here is some format problem I fixed, but it looks like I didn't do
anything. I will modify it in next version.




+acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID,
set_processor_node_mapping,
+NULL, NULL);


It makes sense to me to add support for Processor devices of setting
persistent cpu <-> node mapping, but I just wondering if there is no
Processor device or Processor Operator for a processor entry(such as
local apic, the spec didn't say it's a mandatory) in MADT,



It is in DSDT. Declare processprs like:

Processor (ProcessorName, ProcessorID, PBlockAddress, PblockLength) { 
ObjectList }


Or

Device (DeviceName) { ObjectList }

how do we

set the mappings?


Step 1. we generate the logical CPU IDs by the Local APIC/x2APIC ID in
MADT. So, we have the mapping of CPU ID <-> Local Apic ID. We also can
get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.

 195 [0ECh 0236   1]Subtable Type : 00 [Processor Local 
APIC]

 196 [0EDh 0237   1]   Length : 08
 197 [0EEh 0238   1]*Processor ID : 40*
 198 [0EFh 0239   1]*Local Apic ID : 40*
 199 [0F0h 0240   4]Flags (decoded below) : 0001
 200Processor Enabled : 1

So, at last, we get the mapping of

*Processor ID/UID <-> Local Apic ID <-> CPU ID*

Step 2. we can get *processorID/_UID <-> Node ID(_PXM)* in DSDT.

So, we get the maaping of *Node ID <-> CPU ID* according to

*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> CPU ID*



BTW, multi places in the ACPI driver are using the same pattern here
to scan all the processors, maybe we can add a function then call it
to reduce some lines of code?



Yes, I think so.

Thanks,
Liyang.


Thanks
Hanjun
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html








[PATCH] x86/apic: Remove the extra judgement of skipped IO APIC setup

2017-02-23 Thread Dou Liyang
As the commit 2e63ad4bd5dd ("x86/apic: Do not init irq remapping
if ioapic is disabled") added the judgement of skipped IO APIC
setup at the beginning of enable_IR_x2apic(). It may be redundant
that we check it again when we try to enable the interrupt mapping.

So, remove the one in try_to_enable_IR() and refine them for
better readability.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8567c85..86e7bd8 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1610,24 +1610,15 @@ static inline void try_to_enable_x2apic(int remap_mode) 
{ }
 static inline void __x2apic_enable(void) { }
 #endif /* !CONFIG_X86_X2APIC */
 
-static int __init try_to_enable_IR(void)
-{
-#ifdef CONFIG_X86_IO_APIC
-   if (!x2apic_enabled() && skip_ioapic_setup) {
-   pr_info("Not enabling interrupt remapping due to skipped 
IO-APIC setup\n");
-   return -1;
-   }
-#endif
-   return irq_remapping_enable();
-}
-
 void __init enable_IR_x2apic(void)
 {
unsigned long flags;
int ret, ir_stat;
 
-   if (skip_ioapic_setup)
+   if (skip_ioapic_setup) {
+   pr_info("Not init interrupt remapping due to skipped IO-APIC 
setup\n");
return;
+   }
 
ir_stat = irq_remapping_prepare();
if (ir_stat < 0 && !x2apic_supported())
@@ -1645,7 +1636,7 @@ void __init enable_IR_x2apic(void)
 
/* If irq_remapping_prepare() succeeded, try to enable it */
if (ir_stat >= 0)
-   ir_stat = try_to_enable_IR();
+   ir_stat = irq_remapping_enable();
/* ir_stat contains the remap mode or an error code */
try_to_enable_x2apic(ir_stat);
 
-- 
2.5.5





[PATCH 1/2] Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"

2017-02-19 Thread Dou Liyang
Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time. 

Now, we revert our patches. Do the last mapping of "cpuid <-> nodeid" at
hot-plug time, not at booting time where we did some useless work.
It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

The patch revert the commit dc6db24d24:
  "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting".

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/acpi/boot.c   |  2 +-
 drivers/acpi/acpi_processor.c |  5 ---
 drivers/acpi/bus.c|  1 -
 drivers/acpi/processor_core.c | 73 ---
 include/linux/acpi.h  |  3 --
 5 files changed, 1 insertion(+), 83 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 64422f8..32846a2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -709,7 +709,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 3de3b6b..f43a586 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,11 +182,6 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
-int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
-{
-   return -ENODEV;
-}
-
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 95855cb..d4455e4 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1207,7 +1207,6 @@ static int __init acpi_init(void)
acpi_wakeup_device_init();
acpi_debugger_init();
acpi_setup_sb_notify_handler();
-   acpi_set_processor_mapping();
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 611a558..a843862 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -278,79 +278,6 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
-static bool __init
-map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
-{
-   int type, id;
-   u32 acpi_id;
-   acpi_status status;
-   acpi_object_type acpi_type;
-   unsigned long long tmp;
-   union acpi_object object = { 0 };
-   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
-
-   status = acpi_get_type(handle, _type);
-   if (ACPI_FAILURE(status))
-   return false;
-
-   switch (acpi_type) {
-   case ACPI_TYPE_PROCESSOR:
-   status = acpi_evaluate_object(handle, NULL, NULL, );
-   if (ACPI_FAILURE(status))
-   return false;
-   acpi_id = object.processor.proc_id;
-
-   /* validate the acpi_id */
-   if(acpi_processor_validate_proc_id(acpi_id))
-   return false;
-   break;
-   case ACPI_TYPE_DEVICE:
-   status = acpi_evaluate_integer(handle, "_UID", NULL, );
-   if (ACPI_FAILURE(status))
-   return false;
-   acpi_id = tmp;
-   break;
-   default:
-   return false;
-   }
-
-   type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
-
-   *phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
-   id = acpi_map_cpuid(*phys_id, acpi_id);
-
-   if (id < 0)
-   return false;
-   *cpuid = id;
-   return true;
-}
-
-static acpi_status __init
-set_processor_node_mapping(acpi_handle handle, u32 lvl, void *context,
-  void **rv)
-{
-   phys_cpuid_t phys_id;
-   int cpu_id;
-
-   if (!map_processor(handle, _id, _id))
-   return AE_ERROR;
-
-   acpi_map_cpu2node(handle, cpu_id, phys_id);
-   return AE_OK;
-}
-
-void __init acpi_set_processor_mapping(void)
-{
-   /* Set persistent cpu <-> node mapping for all processors. */
-   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
-   ACPI_UINT32_MAX, set_processor_node_mapping,
-   NULL, NULL, NULL);
-}
-#else
-void __init acpi_set_processor_mapping(void) {}
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
-
 #ifdef CONFIG_ACPI_

[PATCH 2/2] Revert"x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid"

2017-02-19 Thread Dou Liyang
After we never do the last mapping of "cpuid <-> nodeid" at booting time. we
also no need to enable MADT APIs to return disabled apicid.

So, The patch work for reverting the commit 8ad893faf2:
"x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid"

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/processor_core.c | 60 ---
 1 file changed, 22 insertions(+), 38 deletions(-)

diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index a843862..b933061 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
+u32 acpi_id, phys_cpuid_t *apic_id)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,13 +48,12 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
-   bool ignore_disabled)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -66,13 +65,12 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
-   bool ignore_disabled)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration) {
@@ -89,13 +87,12 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
  * Retrieve the ARM CPU physical identifier (MPIDR)
  */
 static int map_gicc_mpidr(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
-   bool ignore_disabled)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
 {
struct acpi_madt_generic_interrupt *gicc =
container_of(entry, struct acpi_madt_generic_interrupt, header);
 
-   if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
+   if (!(gicc->flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
/* device_declaration means Device object in DSDT, in the
@@ -112,7 +109,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header 
*entry,
 }
 
 static phys_cpuid_t map_madt_entry(struct acpi_table_madt *madt,
-  int type, u32 acpi_id, bool ignore_disabled)
+  int type, u32 acpi_id)
 {
unsigned long madt_end, entry;
phys_cpuid_t phys_id = PHYS_CPUID_INVALID;  /* CPU hardware ID */
@@ -130,20 +127,16 @@ static phys_cpuid_t map_madt_entry(struct acpi_table_madt 
*madt,
struct acpi_subtable_header *header =
(struct acpi_subtable_header *)entry;
if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
-   if (!map_lapic_id(header, acpi_id, _id,
- ignore_disabled))
+   if (!map_lapic_id(header, acpi_id, _id))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
-   if (!map_x2apic_id(header, type, acpi_id, _id,
-  ignore_disabled))
+   if (!map_x2apic_id(header, type, acpi_id, _id))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
-   if (!map_lsapic_id(header, type, acpi_id, _id,
-  ignore_disabled))
+   if (!map_lsapic_id(header, type, acpi_id, _id))
   

[PATCH 0/2] Revert works for the mapping of cpuid <-> nodeid

2017-02-19 Thread Dou Liyang
Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.
As we know, It is implemented by the patches as follows: 2532fc318d, f7c28833c2,
8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply speaking:

Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
So, we get the mapping of
*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
each entities. we just use it directly.

So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
step1 and step2:
*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time. Here has already two
bugs we found.
1. Duplicated Processor IDs in DSDT.
It has been fixed by commit 8e089eaa19, fd74da217d.
2. The _PXM in DSDT is inconsistent with the one in MADT.
It may cause the bug, which is shown in:
https://lkml.org/lkml/2017/2/12/200
There may be more later. We shouldn't just only fix them everytime, we should
solve this problem from the source to avoid such problems happend again and
again.

Now, a simple and easy way is found, we revert our patches. Do the Step 2 
at hot-plug time, not at booting time where we did some useless work.

It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
To Xiaolong: 
Please help me to test it in the special machine.

Dou Liyang (2):
  Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping
  Revert"x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled
apicid"

 arch/x86/kernel/acpi/boot.c   |   2 +-
 drivers/acpi/acpi_processor.c |   5 --
 drivers/acpi/bus.c|   1 -
 drivers/acpi/processor_core.c | 133 +++---
 include/linux/acpi.h  |   3 -
 5 files changed, 23 insertions(+), 121 deletions(-)

-- 
2.5.5





[PATCH v2 1/4] Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"

2017-02-20 Thread Dou Liyang
Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time.

Now, we revert our patches. Do the last mapping of "cpuid <-> nodeid" at
hot-plug time, not at booting time where we did some useless work.
It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

The patch revert the commit dc6db24d24:
  "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting".

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/acpi/boot.c   |  2 +-
 drivers/acpi/acpi_processor.c |  5 ---
 drivers/acpi/bus.c|  1 -
 drivers/acpi/processor_core.c | 73 ---
 include/linux/acpi.h  |  3 --
 5 files changed, 1 insertion(+), 83 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 64422f8..32846a2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -709,7 +709,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 3de3b6b..f43a586 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,11 +182,6 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
-int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
-{
-   return -ENODEV;
-}
-
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 95855cb..d4455e4 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1207,7 +1207,6 @@ static int __init acpi_init(void)
acpi_wakeup_device_init();
acpi_debugger_init();
acpi_setup_sb_notify_handler();
-   acpi_set_processor_mapping();
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 611a558..a843862 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -278,79 +278,6 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
-static bool __init
-map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
-{
-   int type, id;
-   u32 acpi_id;
-   acpi_status status;
-   acpi_object_type acpi_type;
-   unsigned long long tmp;
-   union acpi_object object = { 0 };
-   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
-
-   status = acpi_get_type(handle, _type);
-   if (ACPI_FAILURE(status))
-   return false;
-
-   switch (acpi_type) {
-   case ACPI_TYPE_PROCESSOR:
-   status = acpi_evaluate_object(handle, NULL, NULL, );
-   if (ACPI_FAILURE(status))
-   return false;
-   acpi_id = object.processor.proc_id;
-
-   /* validate the acpi_id */
-   if(acpi_processor_validate_proc_id(acpi_id))
-   return false;
-   break;
-   case ACPI_TYPE_DEVICE:
-   status = acpi_evaluate_integer(handle, "_UID", NULL, );
-   if (ACPI_FAILURE(status))
-   return false;
-   acpi_id = tmp;
-   break;
-   default:
-   return false;
-   }
-
-   type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
-
-   *phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
-   id = acpi_map_cpuid(*phys_id, acpi_id);
-
-   if (id < 0)
-   return false;
-   *cpuid = id;
-   return true;
-}
-
-static acpi_status __init
-set_processor_node_mapping(acpi_handle handle, u32 lvl, void *context,
-  void **rv)
-{
-   phys_cpuid_t phys_id;
-   int cpu_id;
-
-   if (!map_processor(handle, _id, _id))
-   return AE_ERROR;
-
-   acpi_map_cpu2node(handle, cpu_id, phys_id);
-   return AE_OK;
-}
-
-void __init acpi_set_processor_mapping(void)
-{
-   /* Set persistent cpu <-> node mapping for all processors. */
-   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
-   ACPI_UINT32_MAX, set_processor_node_mapping,
-   NULL, NULL, NULL);
-}
-#else
-void __init acpi_set_processor_mapping(void) {}
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
-
 #ifdef CONFIG_ACPI_

[PATCH v2 4/4] acpi: Move the verification of duplicate proc_id from booting time to hot-plug time

2017-02-20 Thread Dou Liyang
After we revert the the mapping of "cpuid <-> nodeid" fixed at the
booting time. and do it at the hot-plug time. we should also do the
verification of duplicate proc_id at the time.

The patch rename the verfication function and move it to
drivers/acpi::acpi_processor_get_info.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 13 ++---
 include/linux/acpi.h  |  2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index eb500e1..2483383 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -280,6 +280,13 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
pr->acpi_id = value;
}
 
+   if(duplicate_processor_id(pr->acpi_id)) {
+   dev_err(>dev,
+   "Failed to get unique processor _UID (0x%x)\n",
+   pr->acpi_id);
+   return -ENODEV;
+   }
+
pr->phys_id = acpi_get_phys_id(pr->handle, device_declaration,
pr->acpi_id);
if (invalid_phys_cpuid(pr->phys_id))
@@ -580,7 +587,7 @@ static struct acpi_scan_handler processor_container_handler 
= {
 static int nr_unique_ids __initdata;
 
 /* The number of the duplicate processor IDs */
-static int nr_duplicate_ids __initdata;
+static int nr_duplicate_ids;
 
 /* Used to store the unique processor IDs */
 static int unique_processor_ids[] __initdata = {
@@ -588,7 +595,7 @@ static int unique_processor_ids[] __initdata = {
 };
 
 /* Used to store the duplicate processor IDs */
-static int duplicate_processor_ids[] __initdata = {
+static int duplicate_processor_ids[] = {
[0 ... NR_CPUS - 1] = -1,
 };
 
@@ -672,7 +679,7 @@ void __init acpi_processor_check_duplicates(void)
NULL, NULL);
 }
 
-bool __init acpi_processor_validate_proc_id(int proc_id)
+bool duplicate_processor_id(int proc_id)
 {
int i;
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index d180cbd..b692a70 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -287,7 +287,7 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 }
 
 /* Validate the processor object's proc_id */
-bool acpi_processor_validate_proc_id(int proc_id);
+bool duplicate_processor_id(int proc_id);
 
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
-- 
2.5.5





[PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-20 Thread Dou Liyang
Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.
As we know, It is implemented by the patches as follows: 2532fc318d, f7c28833c2,
8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply speaking:

Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
So, we get the mapping of
*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
each entities. we just use it directly.

So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
step1 and step2:
*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time. Here has already two
bugs we found.
1. Duplicated Processor IDs in DSDT.
It has been fixed by commit 8e089eaa19, fd74da217d.
2. The _PXM in DSDT is inconsistent with the one in MADT.
It may cause the bug, which is shown in:
https://lkml.org/lkml/2017/2/12/200
There may be more later. We shouldn't just only fix them everytime, we should
solve this problem from the source to avoid such problems happend again and
again.

Now, a simple and easy way is found, we revert our patches. Do the Step 2 
at hot-plug time, not at booting time where we did some useless work.

It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
To Xiaolong: 
Please help me to test it in the special machine.

Change log:
  v1 -> v2: 1. fix some comments.
2. add the verification of duplicate processor id.

Dou Liyang (4):
  Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
  acpi: Fix the check handle in case of declaring processors using the
Device operator
  acpi: Move the verification of duplicate proc_id from booting time to
hot-plug time

 arch/x86/kernel/acpi/boot.c   |   2 +-
 drivers/acpi/acpi_processor.c |  50 +++-
 drivers/acpi/bus.c|   1 -
 drivers/acpi/processor_core.c | 133 +++---
 include/linux/acpi.h  |   5 +-
 5 files changed, 59 insertions(+), 132 deletions(-)

-- 
2.5.5





[PATCH v2 3/4] acpi: Fix the check handle in case of declaring processors using the Device operator

2017-02-20 Thread Dou Liyang
In ACPI spec, we can declare processors using both Processor and
Device operator. And before we use the ACPI table, we should check
the correctness for all processors in ACPI namespace.

But, Currently, the check handle is just include only the processors
which are declared by Processor operator. It misses the processors
declared by Device operator.

The patch adds the case of Device operator.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index f43a586..eb500e1 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -633,25 +633,43 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,
  void **rv)
 {
acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long uid;
union acpi_object object = { 0 };
struct acpi_buffer buffer = { sizeof(union acpi_object),  };
 
-   status = acpi_evaluate_object(handle, NULL, NULL, );
-   if (ACPI_FAILURE(status))
-   acpi_handle_info(handle, "Not get the processor object\n");
-   else
-   processor_validated_ids_update(object.processor.proc_id);
+   status = acpi_get_type(handle, _type);
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor 
object\n");
+   else
+   processor_validated_ids_update(
+   object.processor.proc_id);
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   processor_validated_ids_update(uid);
+   break;
+   default:
+   return false;
+   }
 
return AE_OK;
 }
 
-static void __init acpi_processor_check_duplicates(void)
+void __init acpi_processor_check_duplicates(void)
 {
-   /* Search all processor nodes in ACPI namespace */
+   /* check the correctness for all processors in ACPI namespace */
acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
ACPI_UINT32_MAX,
acpi_processor_ids_walk,
NULL, NULL, NULL);
+   acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, acpi_processor_ids_walk,
+   NULL, NULL);
 }
 
 bool __init acpi_processor_validate_proc_id(int proc_id)
-- 
2.5.5





[PATCH v2 2/4] Revert"x86/acpi: Enable MADT APIs to return disabled apicids"

2017-02-20 Thread Dou Liyang
After we never do the last mapping of "cpuid <-> nodeid" at booting time. we
also no need to enable MADT APIs to return disabled apicid.

So, The patch work for reverting the commit 8ad893faf2:
"x86/acpi: Enable MADT APIs to return disabled apicids"

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/processor_core.c | 60 ---
 1 file changed, 22 insertions(+), 38 deletions(-)

diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index a843862..b933061 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
+u32 acpi_id, phys_cpuid_t *apic_id)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,13 +48,12 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
-   bool ignore_disabled)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -66,13 +65,12 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
-   bool ignore_disabled)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration) {
@@ -89,13 +87,12 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
  * Retrieve the ARM CPU physical identifier (MPIDR)
  */
 static int map_gicc_mpidr(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
-   bool ignore_disabled)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
 {
struct acpi_madt_generic_interrupt *gicc =
container_of(entry, struct acpi_madt_generic_interrupt, header);
 
-   if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
+   if (!(gicc->flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
/* device_declaration means Device object in DSDT, in the
@@ -112,7 +109,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header 
*entry,
 }
 
 static phys_cpuid_t map_madt_entry(struct acpi_table_madt *madt,
-  int type, u32 acpi_id, bool ignore_disabled)
+  int type, u32 acpi_id)
 {
unsigned long madt_end, entry;
phys_cpuid_t phys_id = PHYS_CPUID_INVALID;  /* CPU hardware ID */
@@ -130,20 +127,16 @@ static phys_cpuid_t map_madt_entry(struct acpi_table_madt 
*madt,
struct acpi_subtable_header *header =
(struct acpi_subtable_header *)entry;
if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
-   if (!map_lapic_id(header, acpi_id, _id,
- ignore_disabled))
+   if (!map_lapic_id(header, acpi_id, _id))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
-   if (!map_x2apic_id(header, type, acpi_id, _id,
-  ignore_disabled))
+   if (!map_x2apic_id(header, type, acpi_id, _id))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
-   if (!map_lsapic_id(header, type, acpi_id, _id,
-  ignore_disabled))
+   if (!map_lsapic_id(header, type, acpi_id, _id))
break;
  

Re: [lkp] [x86/acpi] dc6db24d24: BUG: unable to handle kernel paging request at 0000116007090008

2017-02-12 Thread Dou Liyang

Hi, Xiaolong

At 02/13/2017 09:37 AM, Ye Xiaolong wrote:

On 11/21, Dou Liyang wrote:

Hi, Xiaolong,

At 11/21/2016 09:31 AM, Ye Xiaolong wrote:

On 11/18, Dou Liyang wrote:

Hi xiaolong

At 11/18/2016 02:16 PM, Ye Xiaolong wrote:

Hi, liyang

Sorry for the late.

On 10/31, Dou Liyang wrote:

Hi, Xiaolong,

I research the ACPI table for a long time, and I found that:
The reason for this bug is the duplicate IDs "0xFF" in DSDT.
it has already been fixed in the committed id
8e089eaa1999def4bb954caa91941f29b0672b6a and
fd74da217df7d4bd25e95411da64e0b92762842e which is after the
dc6db24d2476cd09c0ecf2b8d80313539f737a89 .

could you help me to Verify my thoughts in the LKP.



I've queued the same test jobs for commit fd74da217d, I'll notify you
once I get the results.




Hi, Liyang,

Results show that the reported error is gone with commit 
fd74da217df7d4bd25e95411da64e0b92762842e
below is the comparison.



thanks a lot. that means it has been fixed.


Sorry for my neglect, the result for fd74da217df7d4bd25e95411da showed no dmesg
because it's incomplete run and has no demsg stat at all.


Is that means:

you have already tested the Linux branch which contains the commit
fd74da217df7d. and it doesn't work well.

Btw, Why the test is incomplete run ?


The bug still persists in v4.9, v4.10-rcx, the lastest kernel head,


If the dmesg and stat of the test is NULL, How do you prove that the
bug still exists?


could you help to check?



Yes, I think we first should make the test with commit fd74da217df7d
work in the specific test machine.

test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 
128G memory


Am I right? waiting your response.

Thanks,
Liyang


Thanks,
Xiaolong





compare -at dc6db24d2476cd09c0ecf2b8d80313539f737a89 
fd74da217df7d4bd25e95411da64e0b92762842e
tests: 1
testcase/path_params/tbox_group/run: 
vm-scalability/300-never-never-1-1-swap-w-rand-performance/lkp-hsw-ep2

dc6db24d2476cd09  fd74da217df7d4bd25e95411da
  --
  fail:runs  %reproductionfail:runs
  | | |
12:12-100%:3 dmesg.BUG:unable_to_handle_kernel
12:12-100%:3 dmesg.Oops
12:12-100%:3 dmesg.RIP:get_partial_node
 9:12 -75%:3 dmesg.RIP:_raw_spin_lock_irqsave
 3:12 -25%:3 
dmesg.general_protection_fault:#[##]SMP
 3:12 -25%:3 
dmesg.RIP:native_queued_spin_lock_slowpath
 3:12 -25%:3 
dmesg.Kernel_panic-not_syncing:Hard_LOCKUP
 2:12 -17%:3 dmesg.RIP:load_balance
 2:12 -17%:3 
dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
 1:12  -8%:3 dmesg.RIP:resched_curr
 1:12  -8%:3 
dmesg.Kernel_panic-not_syncing:Fatal_exception
 5:12 -42%:3 
dmesg.WARNING:at_include/linux/uaccess.h:#__probe_kernel_read
 1:12  -8%:3 
dmesg.WARNING:at_lib/list_debug.c:#__list_add





2. About the LKP-tests, I want run the tests in my own pc.
I use the debain sid as an OS. the .yaml file can be installed and
job splited, but it can't be run correctly.

Is the linux source code must be in /tmp/?
And if I need to modify the .yaml file to fit my pc.



Could you paste the error log for me to analyze?


Yes.  let me tidy up it ah. :)



And, I am very interesting in LKP-Test. when I built it, I met some
problems.

here is the error log:

root@debian:/home/douly/lkp-tests# lkp run 
./job-unlink2-performance-04c197c080f2ed7a022f79701455c6837f4b9573-debian-x86_64-2016-08-31.cgz.yaml

IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
2016-11-21 15:21:01 ./runtest.py unlink2 32 both 1 54 72
/home/douly/lkp-tests/bin/log_cmd: 7: exec: ./runtest.py: not found
kill 18805 vmstat -n 10
kill 18803 dmesg --follow --decode
kill 18829 /lkp/benchmarks/perf-stat/perf stat -a -I 1000 -x  -e 
cpu-clock,task-clock,page-faults,context-switches,cpu-migrations,minor-faults,major-faults
--log-fd 1 --
kill 18806 vmstat -n 1
wait for background monitors: 18811 18813 18830 18833 18832 18819
18821 18826 18818 18815 18810 18814 18825 18827 proc-stat meminfo
oom-killer uptime nfs-hang softirqs diskstats sched_debug
latency_stats interrupts proc-vmstat slabinfo turbostat perf-profile
Error:
The /tmp/lkp-root/perf.data file has no samples!

Thanks,

Dou.

















Re: [lkp] [x86/acpi] dc6db24d24: BUG: unable to handle kernel paging request at 0000116007090008

2017-02-13 Thread Dou Liyang

Hi Xiaolong

[...]


Sorry for my neglect, the result for fd74da217df7d4bd25e95411da showed no dmesg
because it's incomplete run and has no demsg stat at all.


Is that means:

you have already tested the Linux branch which contains the commit
fd74da217df7d. and it doesn't work well.

Btw, Why the test is incomplete run ?


Yes, We've got plenty test results for kernel that contains fd74da217df7d such 
as v4.9,
v4.10-rc1, v4.10-rc2, they all have the same dmesg errors.


Understood! :)


For the incomplete run, it may happen sometimes due to kernel panic during boot 
time and
0day failed to capture its dmesg stat.




The bug still persists in v4.9, v4.10-rcx, the lastest kernel head,


If the dmesg and stat of the test is NULL, How do you prove that the
bug still exists?


This "dmesg stat is empty" refer to test for kernel image which head commit is 
fd74da217df7d,
not for all test results.




could you help to check?



Yes, I think we first should make the test with commit fd74da217df7d
work in the specific test machine.

test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
with 128G memory

Am I right? waiting your response.


Yes, currently we just found this issue on a specific machine, and I've queued 
the
same jobs to other machines to see whether they have the same issue.



I will investigate it.


[...]

Thanks,
Liyang.




[PATCH 2/2] acpi: Fix the check handle in case of declaring processors using the Device operator

2017-02-16 Thread Dou Liyang
In ACPI spec, we can declare processors using both Processor and
Device operator. And before we use the ACPI table, we should check
the correctness for all processors in ACPI namespace.

But, Currently, the check handle is just include only the processors
which are declared by Processor operator. It misses the processors
declared by Device operator.

The patch adds the case of Device operator.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 3de3b6b..ff569cb 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -638,25 +638,43 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,
  void **rv)
 {
acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long uid;
union acpi_object object = { 0 };
struct acpi_buffer buffer = { sizeof(union acpi_object),  };
 
-   status = acpi_evaluate_object(handle, NULL, NULL, );
-   if (ACPI_FAILURE(status))
-   acpi_handle_info(handle, "Not get the processor object\n");
-   else
-   processor_validated_ids_update(object.processor.proc_id);
+   status = acpi_get_type(handle, _type);
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor 
object\n");
+   else
+   processor_validated_ids_update(
+   object.processor.proc_id);
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   processor_validated_ids_update(uid);
+   break;
+   default:
+   return false;
+   }
 
return AE_OK;
 }
 
-static void __init acpi_processor_check_duplicates(void)
+void __init acpi_processor_check_duplicates(void)
 {
-   /* Search all processor nodes in ACPI namespace */
+   /* check the correctness for all processors in ACPI namespace */
acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
ACPI_UINT32_MAX,
acpi_processor_ids_walk,
NULL, NULL, NULL);
+   acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, acpi_processor_ids_walk,
+   NULL, NULL);
 }
 
 bool __init acpi_processor_validate_proc_id(int proc_id)
-- 
2.5.5





[PATCH 1/2] acpi: Fix the mapping handle in case of declaring processors using the Device operator

2017-02-16 Thread Dou Liyang
In ACPI spec, we can declare processors using both Processor and
Device operator. But now, we just handle the mapping of processors
which are declared by Processor operator.

It misses the processors declared by Device operator.

The patch adds this case of the Device operator.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/processor_core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 611a558..1aab5b0 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -344,8 +344,10 @@ void __init acpi_set_processor_mapping(void)
 {
/* Set persistent cpu <-> node mapping for all processors. */
acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
-   ACPI_UINT32_MAX, set_processor_node_mapping,
-   NULL, NULL, NULL);
+   ACPI_UINT32_MAX, set_processor_node_mapping,
+   NULL, NULL, NULL);
+   acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, set_processor_node_mapping,
+   NULL, NULL);
 }
 #else
 void __init acpi_set_processor_mapping(void) {}
-- 
2.5.5





[PATCH] x86/acpi: Fix a warning message in logical CPU IDs allocation

2017-02-27 Thread Dou Liyang
Current warning message regarded the "nr_cpu_ids - 1" as the limit
number of the CPUs. It may be confused us, for example:
we have two CPUs, nr_cpu_ids = 2, but the warning message may
indicate that we just have 1 CPU, which likes that:
Only 1 processors supported.Processor 2/0x2 and the rest
are ignored.

Fix the warning message, replace "nr_cpu_ids - 1" with "nr_cpu_ids".
And the warning message can be like that:
APIC: NR_CPUS/possible_cpus limit of 2 reached. Processor 2/0x2
and the rest are ignored.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8567c85..fcdd15e 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2062,10 +2062,10 @@ static int allocate_logical_cpuid(int apicid)
 
/* Allocate a new cpuid. */
if (nr_logical_cpuids >= nr_cpu_ids) {
-   WARN_ONCE(1, "Only %d processors supported."
+   WARN_ONCE(1, "APIC: NR_CPUS/possible_cpus limit of %i reached. "
 "Processor %d/0x%x and the rest are ignored.\n",
-nr_cpu_ids - 1, nr_logical_cpuids, apicid);
-   return -1;
+nr_cpu_ids, nr_logical_cpuids, apicid);
+   return -EINVAL;
}
 
cpuid_to_apicid[nr_logical_cpuids] = apicid;
-- 
2.5.5





Re: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:2066

2017-02-27 Thread Dou Liyang

Hi Marco,

After Linux-4.9, we also mapped the logical CPU IDs for the disabled CPUs.

The reason of the warning maybe that:

The max number of the CPU in "dl360g5" is 2 (NR_CPUS:2), but, the
kernel mapped one of the number to a disabled CPUs, so one of the
enable CPUs will never have a cpu_id, so it will never be online.

you can dump the ACPI table of the machine to confirm the reason.


Thanks,

Liyang

At 02/27/2017 07:50 PM, Marco Berizzi wrote:

Hi Folks,

I'm getting this error with linux-4.10.1 on a very old hp netserver dl360g5
running slackware 14.2
The last working version is linux-4.8.17 (where I don't see this error, and
both cpu0 and cpu1 are enabled.)

[0.00] [ cut here ]
[0.00] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:2066
__generic_processor_info+0x289/0x350
[0.00] Only 1 processors supported.Processor 2/0x2 and the rest are
ignored.
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.10.1 #1
[0.00] Hardware name: HP ProLiant DL360 G5, BIOS P58 05/02/2011
[0.00] Call Trace:
[0.00]  ? dump_stack+0x46/0x5d
[0.00]  ? __warn+0xb4/0xd0
[0.00]  ? warn_slowpath_fmt+0x4a/0x50
[0.00]  ? __early_ioremap+0x13c/0x1b8
[0.00]  ? __generic_processor_info+0x289/0x350
[0.00]  ? acpi_register_lapic+0x3d/0x6c
[0.00]  ? acpi_parse_lapic+0x3e/0x43
[0.00]  ? acpi_parse_entries_array+0xf4/0x152
[0.00]  ? acpi_table_parse_entries_array+0x9f/0xb8
[0.00]  ? acpi_boot_init+0xde/0x494
[0.00]  ? acpi_parse_ioapic+0x74/0x74
[0.00]  ? dmi_ignore_irq0_timer_override+0x26/0x26
[0.00]  ? setup_arch+0x611/0x674
[0.00]  ? start_kernel+0x4d/0x31f
[0.00]  ? start_cpu+0x14/0x14
[0.00] ---[ end trace  ]---

Any response are welcome.
This is the full dmesg and config:

[0.00] Linux version 4.10.1 (root@Trappist) (gcc version 5.3.0 (GCC) )
#1 SMP Mon Feb 27 12:14:49 CET 2017
[0.00] Command line: BOOT_IMAGE=Linux ro root=6802 vt.default_utf8=0
[0.00] x86/fpu: Legacy x87 FPU detected.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009f3ff] usable
[0.00] BIOS-e820: [mem 0x0009f400-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x7fe43fff] usable
[0.00] BIOS-e820: [mem 0x7fe44000-0x7fe4bfff] ACPI
data
[0.00] BIOS-e820: [mem 0x7fe4c000-0x7fe4cfff] usable
[0.00] BIOS-e820: [mem 0x7fe4d000-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xe000-0xefff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfecf] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee0] reserved
[0.00] BIOS-e820: [mem 0xffc0-0x] reserved
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.4 present.
[0.00] DMI: HP ProLiant DL360 G5, BIOS P58 05/02/2011
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x7fe4d max_arch_pfn = 0x4
[0.00] MTRR default type: write-back
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 008000 mask 3F8000 uncachable
[0.00]   1 base 10 mask 30 uncachable
[0.00]   2 base 20 mask 20 uncachable
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
[0.00] found SMP MP-table at [mem 0x000f4f80-0x000f4f8f] mapped at
[880f4f80]
[0.00] Base memory trampoline at [88099000] 99000 size 24576
[0.00] BRK [0x01778000, 0x01778fff] PGTABLE
[0.00] BRK [0x01779000, 0x01779fff] PGTABLE
[0.00] BRK [0x0177a000, 0x0177afff] PGTABLE
[0.00] BRK [0x0177b000, 0x0177bfff] PGTABLE
[0.00] BRK [0x0177c000, 0x0177cfff] PGTABLE
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000F4F00 24 (v02 HP)
[0.00] ACPI: XSDT 0x7FE447C0 7C (v01 HP ProLiant
0002 Ò?   162E)
[0.00] ACPI: FACP 0x7FE44840 F4 (v03 HP ProLiant
0002 Ò?   162E)
[0.00] ACPI BIOS Warning (bug): Invalid length for
FADT/Pm1aControlBlock: 32, using default 16 (20160930/tbfadt-708)
[

Re: [PATCH v2 3/4] acpi: Fix the check handle in case of declaring processors using the Device operator

2017-03-02 Thread Dou Liyang

Hi tglx,

At 03/01/2017 07:12 PM, Thomas Gleixner wrote:

On Mon, 20 Feb 2017, Dou Liyang wrote:


In ACPI spec, we can declare processors using both Processor and
Device operator. And before we use the ACPI table, we should check
the correctness for all processors in ACPI namespace.

But, Currently, the check handle is just include only the processors
which are declared by Processor operator. It misses the processors
declared by Device operator.

The patch adds the case of Device operator.


See the comments in the previous mails. They apply here as well.

Though this changelog is actively confusing. The subject line says:

  acpi: Fix the check handle in case of declaring processors using the Device
operator

Aside of being a way too long subject, it suggests that there is just a
missing check for the case where a processor is declared via the Device
operator. But that's not what the patch is doing.

It implements the distinction between Device and Processor operator, which
is missing in acpi_processor_ids_walk() right now.

So the proper changelog (if I understand the patch correctly) would be:

Subject: acpi/processor: Implement DEVICE operator for processor enumeration

  ACPI allows to declare processors either with the PROCESSOR or with the
  DEVICE operator. The current implementation handles only the PROCESSOR
  operator.

  On a system which uses the DEVICE operator for processor enumeration the
  evaluation fails.

  Check for the ACPI type of the ACPI handle and evaluate PROCESSOR and
  DEVICE types seperately.

Hmm?



Yes, you are right. I didn't explain clearly.
I will modify in my next version.


 {
acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long uid;
union acpi_object object = { 0 };
struct acpi_buffer buffer = { sizeof(union acpi_object),  };

-   status = acpi_evaluate_object(handle, NULL, NULL, );
-   if (ACPI_FAILURE(status))
-   acpi_handle_info(handle, "Not get the processor object\n");
-   else
-   processor_validated_ids_update(object.processor.proc_id);
+   status = acpi_get_type(handle, _type);


Shouldn't the status be checked here?


oops, Yes. Need to be checked.




+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor 
object\n");
+   else
+   processor_validated_ids_update(
+   object.processor.proc_id);
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   processor_validated_ids_update(uid);
+   break;
+   default:
+   return false;


This is inconsistent vs. the failure handling in the PROCESSOR and DEVICE
case and the default case does not give any information either.

What about this:

switch (acpi_type) {
case ACPI_TYPE_PROCESSOR:
status = acpi_evaluate_object(handle, NULL, NULL, );
if (ACPI_FAILURE(status))
goto err;
uid = object.processor.proc_id;
break;

case ACPI_TYPE_DEVICE:
status = acpi_evaluate_integer(handle, "_UID", NULL, );
if (ACPI_FAILURE(status))
goto err;
break;
default:
goto err;
}

processor_validated_ids_update(uid);
return true;

err:
acpi_handle_info(handle, "Invalid processor object\n");
return false;
}



Looks good than mine.

Thanks,
Liyang.


Thanks,

tglx








Re: [PATCH v2 1/4] Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"

2017-03-02 Thread Dou Liyang

Hi tglx,

Thank you very much for your guidance! It makes me more profound
understanding of the changelog. And you also rewrote my changelog for
giving me an example.

I am so grateful that you can help me so carefully.
Once I heard the charm of the open source community, Now i can
really feel it. I love it so much.

I will try to improve myself and help others.  :)

Thanks,
Liyang.

At 03/01/2017 06:51 PM, Thomas Gleixner wrote:

On Mon, 20 Feb 2017, Dou Liyang wrote:


Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time.

Now, we revert our patches. Do the last mapping of "cpuid <-> nodeid" at
hot-plug time, not at booting time where we did some useless work.
It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

The patch revert the commit dc6db24d24:
  "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting".


That changelog needs some massaging. Something like this:

  The mapping of "cpuid <-> nodeid" is established at boot time via ACPI
  tables to keep associations of workqueues and other node related items
  consistent across cpu hotplug.

  But, ACPI tables are unreliable and failures with that boot time mapping
  have been reported on machines where the ACPI table and the physical
  information which is retrieved at actual hotplug is inconsistent.

  Revert the mapping implementation so it can be replaced with a less error
  prone approach.

This clearly describes:

  1) The context

  2) The problem

  3) The solution (revert)

You don't have to explain what the new solution will be in the changelog of
the revert. For the revert it's only relevant WHY we do the revert.

Please avoid writing changelogs in 'we' form. Write it pure technical, like
a manual.

Also avoid phrases like: "The patch/This patch". We all know already that
this is a patch, otherwise it wouldn't have been sent.

Documentation/process/submitting-patches.rst says:

  Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour.

Thanks,

tglx









Re: [PATCH v2 4/4] acpi: Move the verification of duplicate proc_id from booting time to hot-plug time

2017-03-02 Thread Dou Liyang

Hi tglx,

At 03/01/2017 07:26 PM, Thomas Gleixner wrote:

On Mon, 20 Feb 2017, Dou Liyang wrote:

Please make your subject line short and a precise summary phrase, not an
overlong sentence.


After we revert the the mapping of "cpuid <-> nodeid" fixed at the
booting time. and do it at the hot-plug time. we should also do the
verification of duplicate proc_id at the time.


The revert is completely irrelevant to this change, really. The reference
is just confusing.



Yes, Maybe I should split them like before.



The patch rename the verfication function and move it to
drivers/acpi::acpi_processor_get_info.


See previous mails 

Let me give you another changelog example:



Thanks again.


Subject: acpi/processor: Check for duplicate processor ids at hotplug time

  The check for duplicate processor ids happens at boot time based on the
  ACPI table contents, but the final sanity checks for a processor happen
  at hotplug time.

  At hotplug time, where the physical information is available, which might
  differ from the ACPI table information, a check for duplicate processor
  ids is missing.

  Add it to the hotplug checks and rename the function so it better
  reflects its purpose.

Hmm?


Yes, thanks again. I learned a lot in that patchset.





-bool __init acpi_processor_validate_proc_id(int proc_id)
+bool duplicate_processor_id(int proc_id)


Please keep the acpi_ prefix. acpi_duplicate_processor_id().


OK, I will.

Thanks,

Liyang.


Thanks,

tglx








Re: [PATCH v2 2/4] Revert"x86/acpi: Enable MADT APIs to return disabled apicids"

2017-03-02 Thread Dou Liyang

Hi tglx,

At 03/01/2017 06:52 PM, Thomas Gleixner wrote:

On Mon, 20 Feb 2017, Dou Liyang wrote:


After we never do the last mapping of "cpuid <-> nodeid" at booting time. we
also no need to enable MADT APIs to return disabled apicid.

So, The patch work for reverting the commit 8ad893faf2:
"x86/acpi: Enable MADT APIs to return disabled apicids"


Again, this changelog is confusing. A simple:

  Remove the leftovers of the boot time 'cpuid <-> nodeid' mapping approach.

would be sufficient and entirely clear.



Yes, I see, I will rewrite it in next version.

Thanks.

Liyang.


Thanks,

tglx









Re: [PATCH] x86/acpi: Fix a warning message in logical CPU IDs allocation

2017-03-01 Thread Dou Liyang

Hi Ingo,

At 03/01/2017 05:10 PM, Ingo Molnar wrote:


* Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


Current warning message regarded the "nr_cpu_ids - 1" as the limit
number of the CPUs. It may be confused us, for example:
we have two CPUs, nr_cpu_ids = 2, but the warning message may
indicate that we just have 1 CPU, which likes that:
Only 1 processors supported.Processor 2/0x2 and the rest
are ignored.

Fix the warning message, replace "nr_cpu_ids - 1" with "nr_cpu_ids".
And the warning message can be like that:
APIC: NR_CPUS/possible_cpus limit of 2 reached. Processor 2/0x2
and the rest are ignored.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>


The patch is correct, but the title is wrong (it's 'apic', not 'acpi'), plus the
changelog is unreadable. Furthermore the changelog does not declare the changing
of the return code to -EINVAL ...

I fixed all that in the commit below, but please be more careful in the future.



Got it! I see, I will be more careful. :)

Thanks,
Liyang.


Thanks,

Ingo

===>

From bb3f0a52630c84807fca9bdd76ac2f5dcec82689 Mon Sep 17 00:00:00 2001

From: Dou Liyang <douly.f...@cn.fujitsu.com>
Date: Tue, 28 Feb 2017 13:50:52 +0800
Subject: [PATCH] x86/apic: Fix a warning message in logical CPU IDs allocation

The current warning message in allocate_logical_cpuid() is somewhat confusing:

  Only 1 processors supported.Processor 2/0x2 and the rest are ignored.

As it might imply that there's only one CPU in the system - while what we ran
into here is a kernel limitation.

Fix the warning message to clarify all that:

  APIC: NR_CPUS/possible_cpus limit of 2 reached. Processor 2/0x2 and the rest 
are ignored.

( Also update the error return from -1 to -EINVAL, which is the more
  canonical return value. )

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: b...@alien8.de
Cc: nicsta...@gmail.com
Cc: wanpeng...@hotmail.com
Link: 
http://lkml.kernel.org/r/1488261052-25753-1-git-send-email-douly.f...@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/apic/apic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 4261b3282ad9..11088b86e5c7 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2062,10 +2062,10 @@ static int allocate_logical_cpuid(int apicid)

/* Allocate a new cpuid. */
if (nr_logical_cpuids >= nr_cpu_ids) {
-   WARN_ONCE(1, "Only %d processors supported."
+   WARN_ONCE(1, "APIC: NR_CPUS/possible_cpus limit of %i reached. "
 "Processor %d/0x%x and the rest are ignored.\n",
-nr_cpu_ids - 1, nr_logical_cpuids, apicid);
-   return -1;
+nr_cpu_ids, nr_logical_cpuids, apicid);
+   return -EINVAL;
}

cpuid_to_apicid[nr_logical_cpuids] = apicid;








Re: [PATCH] x86/apic: Remove the extra judgement of skipped IO APIC setup

2017-03-01 Thread Dou Liyang

Dear Ingo,

At 03/01/2017 05:04 PM, Ingo Molnar wrote:
[...]

+   pr_info("Not init interrupt remapping due to skipped IO-APIC 
setup\n");


So you replaced a perfectly readable kernel message:

 -  pr_info("Not enabling interrupt remapping due to skipped IO-APIC 
setup\n");

... with an unreadable one:

 +  pr_info("Not init interrupt remapping due to skipped IO-APIC 
setup\n");

Why?


I am very sorry.

Because of my weak English skills :) . I am trying to improve my
English ability.



Also, the changelog is pretty much unreadable as well:


As the commit 2e63ad4bd5dd ("x86/apic: Do not init irq remapping
if ioapic is disabled") added the judgement of skipped IO APIC
setup at the beginning of enable_IR_x2apic(). It may be redundant
that we check it again when we try to enable the interrupt mapping.

So, remove the one in try_to_enable_IR() and refine them for
better readability.


I edited it to:


Thanks very much ! it became very clear.



   The following commit:

 2e63ad4bd5dd ("x86/apic: Do not init irq remapping if ioapic is disabled")

   ... added a check for skipped IO-APIC setup to enable_IR_x2apic(), but this


Could you tell me what is the meaning of "..." . How to use it?


   check is also duplicated in try_to_enable_IR() - and it will never succeed in
   calling irq_remapping_enable().

   Remove the whole irq_remapping_enable() complication: if the IO-APIC is
   disabled we cannot enable IRQ remapping.

And I restored the original pr_info() message as well.


Yes. Thanks!

Sincerely,

Liyang





Thanks,

Ingo








[PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-08-25 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-

[PATCH v12 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.

2016-08-25 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  7 +
 arch/x86/kernel/apic/apic.c   | 61 ---
 3 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE  BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 90d84c3..abd939c 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -176,15 +176,10 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 
enabled)
return -EINVAL;
}
 
-   if (!enabled) {
-   ++disabled_cpus;
-   return -EINVAL;
-   }
-
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   cpu = generic_processor_info(id, ver);
+   cpu = __generic_processor_info(id, ver, enabled);
if (cpu >= 0)
early_per_cpu(x86_cpu_to_acpiid, cpu) = acpiid;
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index e5612a9..7aa9863 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2024,7 +2024,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+   int i;
+
+   /*
+* cpuid <-> apicid mapping is persistent, so when a cpu is up,
+* check if the kernel has allocated a cpuid for it.
+*/
+   for (i = 0; i < nr_logical_cpuids; i++) {
+   if (cpuid_to_apicid[i] == apicid)
+   return i;
+   }
+
+   /* Allocate a new cpuid. */
+   if (nr_logical_cpuids >= nr_cpu_ids) {
+   WARN_ONCE(1, "Only %d processors supported."
+"Processor %d/0x%x and the rest are ignored.\n",
+nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+   return -1;
+   }
+
+   cpuid_to_apicid[nr_logical_cpuids] = apicid;
+   return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = phy

[PATCH v12 7/7] acpi: Provide the interface to validate the proc_id

2016-08-25 Thread Dou Liyang
When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, the false represents available.

When we establish all possible cpuid <-> nodeid mapping to handle the
cpu hotplugs, we will use the proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we
will stop the mapping.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 16 
 drivers/acpi/processor_core.c |  4 
 include/linux/acpi.h  |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+   int i;
+
+   /*
+* compare the proc_id with duplicate IDs, if the proc_id is already
+* in the duplicate IDs, return true, otherwise, return false.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return true;
+   }
+
+   return false;
+}
+
 void __init acpi_processor_init(void)
 {
acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 7827c71..bf72097 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -301,6 +301,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t 
*phys_id, int *cpuid)
if (ACPI_FAILURE(status))
return false;
acpi_id = object.processor.proc_id;
+
+   /* validate the acpi_id */
+   if(acpi_processor_validate_proc_id(acpi_id))
+   return false;
break;
case ACPI_TYPE_DEVICE:
status = acpi_evaluate_integer(handle, "_UID", NULL, );
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index ea67776..929ff8f 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -267,6 +267,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
return phys_id == PHYS_CPUID_INVALID;
 }
 
+/* Validate the processor object's proc_id */
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5





[PATCH v12 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.

2016-08-25 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm(persistent)
2. apicid (physical cpu id)   <->   nodeid (persistent)
3. cpuid (logical cpu id) <->   apicid (not persistent, now persistent 
by step 2)
4. cpuid (logical cpu id) <->   nodeid (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in 
step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we 
should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 60 +++
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
 *  Extra Processor objects may be enumerated on MP systems with
 *  less than the max # of CPUs. They should be ignored _iff
 *  they are physically not present.
+*
+*  NOTE: Even if the processor has a cpuid, it may not present because
+*  cpuid <-> apicid mapping is persistent now.
 */
-   if (invalid_logical_cpuid(pr->id)) {
+   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 9125d7d..fd59ae8 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id)
+u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENO

[PATCH v12 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.

2016-08-25 Thread Dou Liyang
From: Gu Zheng <guz.f...@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.f...@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangc...@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/ia64/kernel/acpi.c   |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 
 drivers/acpi/bus.c|  1 +
 drivers/acpi/processor_core.c | 67 +++
 include/linux/acpi.h  |  3 ++
 6 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 92b7bc9..6534871 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
 #endif
return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index abd939c..807037c 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -700,7 +700,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
@@ -711,7 +711,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
numa_set_node(cpu, nid);
}
 #endif
+   return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+   return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 85b7d07..a760dac 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1193,6 +1193,7 @@ static int __init acpi_init(void)
acpi_wakeup_device_init();
acpi_debugger_init();
acpi_setup_sb_notify_handler();
+   acpi_set_processor_mapping();
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index fd59ae8..7827c71 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -280,6 +280,73 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int 
*cpuid)
+{
+   int type;
+   u32 acpi_id;
+   acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long tmp;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_get_type(handle, _type);
+   if (ACPI_FAILURE(status))
+   return false;
+
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = object.processor.proc_id;
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = tmp;
+   break;
+   default:
+   return false;
+   }
+
+   

[PATCH v12 1/7] x86, memhp, numa: Online memory-less nodes at boot time.

2016-08-25 Thread Dou Liyang
From: Tang Chen <tangc...@cn.fujitsu.com>

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a persistent mapping. It means
that the cpu will be mapped to the node which it belongs to, and will never
be changed. If a node has only cpus but no memory, the cpus on it will be
mapped to a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.f...@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index fb68210..3f35b48 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -722,22 +722,19 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-   int n, val;
-   int min_val = INT_MAX;
-   int best_node = -1;
+   unsigned long zones_size[MAX_NR_ZONES] = {0};
+   unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-   for_each_online_node(n) {
-   val = node_distance(node, n);
+   /* Allocate and initialize node data. Memory-less node is now online.*/
+   alloc_node_data(nid);
+   free_area_init_node(nid, zones_size, 0, zholes_size);
 
-   if (val < min_val) {
-   min_val = val;
-   best_node = n;
-   }
-   }
-
-   return best_node;
+   /*
+* All zonelists will be built later in start_kernel() after per cpu
+* areas are initialized.
+*/
 }
 
 /*
@@ -766,8 +763,10 @@ void __init init_cpu_to_node(void)
 
if (node == NUMA_NO_NODE)
continue;
+
if (!node_online(node))
-   node = find_near_online_node(node);
+   init_memory_less_node(node);
+
numa_set_node(cpu, node);
}
 }
-- 
2.5.5





[PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent

2016-08-25 Thread Dou Liyang
apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.


For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
https://lkml.org/lkml/2015/7/7/200
https://lkml.org/lkml/2015/9/27/209
https://lkml.org/lkml/2016/5/19/212
https://lkml.org/lkml/2016/7/19/181
https://lkml.org/lkml/2016/7/25/99
https://lkml.org/lkml/2016/7/26/52
https://lkml.org/lkml/2016/8/8/96

Change log v11 -> v12:
1. Rebase
2. Add a short summary

Change log v10 -> v11:
1. Reduce the number of repeat judgment of online/offline
2. Seperate out the functionality in the enable or disable situation

Change log v9 -> v10:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset. In patch 5.
2. Fix auto build test ERROR on ia64/next. In patch 5.
3. Fix some comment.

Change log v8 -> v9:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset.

Change log v7 -> v8:
1. Provide the mechanism to validate processors in the ACPI tables.
2. Provide the interface to validate the proc_id when setting the mapping. 

Change log v6 -> v7:
1. Fix arm64 build failure.

Change log v5 -> v6:
1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build 
zonelists
   before per cpu areas were initialized.

Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall 
   back to proper nodes automatically.

Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 
2.
3. Fix some comment.

Dou Liyang (2):
  acpi: Provide the mechanism to validate processors in the ACPI tables
  acpi: Provide the interface to validate the proc_id

Gu Zheng (4):
  x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
boot time.
  x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
persistent cpuid <-> apicid mapping.
  x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
booting.

Tang Chen (1):
  x86, memhp, numa: Online memory-less nodes at boot time.

 arch/ia64/kernel/acpi.c   |   3 +-
 arch/x86/include/asm/mpspec.h |   1 +
 arch/x86/kernel/acpi/boot.c   |  11 ++--
 arch/x86/kernel/apic/apic.c   |  77 +++--
 arch/x86/mm/numa.c|  27 +
 drivers/acpi/acpi_processor.c | 105 -
 drivers/acpi/bus.c|   1 +
 drivers/acpi/processor_core.c | 131 +++---
 include/linux/acpi.h  |   6 ++
 9 files changed, 311 insertions(+), 51 deletions(-)

-- 
2.5.5





[PATCH v12 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables

2016-08-25 Thread Dou Liyang
[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this:

proc_id   |pxm

0   <-> 0
1   <-> 0
2   <-> 1
3   <-> 1
89  <-> 0
89  <-> 0
89  <-> 0
89  <-> 1
89  <-> 1
89  <-> 2
89  <-> 3
.

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:

foreach Processor in DSDT
proc_id= get_ACPI_Processor_number(Processor)
if(the proc_id has alreadly existed )
mark both of them as being unreasonable;

The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDs
which mean that the processor objects in question are not valid.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 79 +++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+   int i;
+
+   if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+   return;
+
+   /*
+* Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+* already in the IDs, do nothing.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return;
+   }
+
+   /*
+* Secondly, compare the proc_id with unique IDs, if the proc_id is in
+* the IDs, put it in the duplicate IDs.
+*/
+   for (i = 0; i < nr_unique_ids; i++) {
+   if (unique_processor_ids[i] == proc_id) {
+   duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+   nr_duplicate_ids++;
+   return;
+   }
+   }
+
+   /*
+* Lastly, the proc_id is a unique ID, put it in the unique IDs.
+*/
+   unique_processor_ids[nr_unique_ids] = proc_id;
+   nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+   u32 lvl,
+   void *context,
+   void **rv)
+{
+   acpi_status status;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor object\n");
+   else
+   processor_validated_ids_update(object.processor.proc_id);
+
+   return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+   /* Search all processor nodes in ACPI namespace */
+   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+   ACPI_UINT32_MAX,
+   acpi_processor_ids_walk,
+   NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+   acpi_processor_duplication_valiate();
acpi_scan_add_handler_with_hotplug(_handler, "processor");
acpi_scan_add_handler(_container_handler);
 }
-- 
2.5.5





Re: [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent

2016-08-25 Thread Dou Liyang

Hi all,

These patches are used to fixing the memory allocation failure.
and it's fine from the ACPI perspective.

I hope that RJ<r...@rjwysocki.net> can apply them.

Due to these patches are also related to x86 and mm,
so, I need the ACKs from the x86 and mm maintainers.   :)

Thanks,
Dou.

At 08/25/2016 04:35 PM, Dou Liyang wrote:

[Summary]

Use ACPI tables: MADT, DSDT.
1. Create cpuid in order based on Local Apic ID in MADT(apicid).
2. Obtain the nodeid by the proc_id in DSDT.
3. Make the cpuid <-> nodeid mapping persistent.

The mapping relations:

proc_id in DSDT <--> Processor ID in MADT(acpiid) <--> Local Apic ID in 
MADT(apicid)
^^
||
v    v 
   pxm in DSDT cpuid
^
|
v
 nodeid

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpu

Re: [PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-08-25 Thread Dou Liyang

Hi tglx,

At 08/25/2016 04:35 PM, Dou Liyang wrote:

 arch/x86/kernel/apic/apic.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index cea4fc1..e5612a9 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2024,7 +2024,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
 }

-int generic_processor_info(int apicid, int version)
+static int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2090,7 +2090,6 @@ int generic_processor_info(int apicid, int version)
return -EINVAL;
}

-   num_processors++;
if (apicid == boot_cpu_physical_apicid) {


I move the "num_processors++" below.
Because I think that if "apicid == boot_cpu_physical_apicid" is true,
The "disabled_cpus" will plus one that may conflict with the
"num_processors++"

Is my thought right?


/*
 * x86_bios_cpu_apicid is required to have processors listed
@@ -2113,6 +2112,7 @@ int generic_processor_info(int apicid, int version)

pr_warning("APIC: Package limit reached. Processor %d/0x%x 
ignored.\n",
   thiscpu, apicid);
+
disabled_cpus++;
return -ENOSPC;
}
@@ -2132,7 +2132,6 @@ int generic_processor_info(int apicid, int version)
apic_version[boot_cpu_physical_apicid], cpu, version);
}

-   physid_set(apicid, phys_cpu_present_map);
if (apicid > max_physical_apicid)
max_physical_apicid = apicid;

@@ -2145,11 +2144,22 @@ int generic_processor_info(int apicid, int version)
apic->x86_32_early_logical_apicid(cpu);
 #endif
set_cpu_possible(cpu, true);
-   set_cpu_present(cpu, true);
+
+   if (enabled) {
+   num_processors++;
+   physid_set(apicid, phys_cpu_present_map);
+   set_cpu_present(cpu, true);
+   } else
+   disabled_cpus++;



I remove all the "if (enabled)" code and do the unified
judgment here.

Thanks,
Dou




[PATCH] x86: Put the num_processors++ code in a more suitable position

2016-09-05 Thread Dou Liyang
This is a code optimization.

If checking the topology package map of apicid and cpu is failure,
it will stop generating the processor info for that apicid and the
disabled_cpus will plus one. However, the num-processors has already
been added one above. That may cause the number of processors incorrect.

Just put the num_processors++ code in the more suitable position.
it makes sure that the num-processors will not conflict with the
disabled_cpus.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 50c95af..f3e9b2d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2093,7 +2093,6 @@ int generic_processor_info(int apicid, int version)
return -EINVAL;
}
 
-   num_processors++;
if (apicid == boot_cpu_physical_apicid) {
/*
 * x86_bios_cpu_apicid is required to have processors listed
@@ -2116,10 +2115,13 @@ int generic_processor_info(int apicid, int version)
 
pr_warning("APIC: Package limit reached. Processor %d/0x%x 
ignored.\n",
   thiscpu, apicid);
+
disabled_cpus++;
return -ENOSPC;
}
 
+   num_processors++;
+
/*
 * Validate version
 */
-- 
2.5.5





Re: [PATCH] x86: Put the num_processors++ code in a more suitable position

2016-09-06 Thread Dou Liyang

Hi David,

At 09/07/2016 05:23 AM, David Rientjes wrote:

On Tue, 6 Sep 2016, Dou Liyang wrote:


This is a code optimization.



Not sure that it's optimization, it's just for correctness.


Yes, I see. I will improve it in next version.

Thanks,
Dou




If checking the topology package map of apicid and cpu is failure,
it will stop generating the processor info for that apicid and the
disabled_cpus will plus one. However, the num-processors has already
been added one above. That may cause the number of processors incorrect.

Just put the num_processors++ code in the more suitable position.
it makes sure that the num-processors will not conflict with the
disabled_cpus.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>


Acked-by: David Rientjes <rient...@google.com>







[PATCH v2] x86: Put the num_processors++ code in a more suitable position

2016-09-06 Thread Dou Liyang
This code is just for correctness.

If checking the topology package map of apicid and cpu is failure,
it will stop generating the processor info for that apicid and the
disabled_cpus will plus one. However, the num-processors has already
been added one above. That may cause the number of processors incorrect.

Just put the num_processors++ code in the more suitable position.
it makes sure that the num-processors will not conflict with the
disabled_cpus.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Acked-by: David Rientjes <rient...@google.com>
---
 arch/x86/kernel/apic/apic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 50c95af..f3e9b2d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2093,7 +2093,6 @@ int generic_processor_info(int apicid, int version)
return -EINVAL;
}
 
-   num_processors++;
if (apicid == boot_cpu_physical_apicid) {
/*
 * x86_bios_cpu_apicid is required to have processors listed
@@ -2116,10 +2115,13 @@ int generic_processor_info(int apicid, int version)
 
pr_warning("APIC: Package limit reached. Processor %d/0x%x 
ignored.\n",
   thiscpu, apicid);
+
disabled_cpus++;
return -ENOSPC;
}
 
+   num_processors++;
+
/*
 * Validate version
 */
-- 
2.5.5





Re: [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent

2016-09-02 Thread Dou Liyang

Ping...

At 08/25/2016 04:35 PM, Dou Liyang wrote:

[Summary]

Use ACPI tables: MADT, DSDT.
1. Create cpuid in order based on Local Apic ID in MADT(apicid).
2. Obtain the nodeid by the proc_id in DSDT.
3. Make the cpuid <-> nodeid mapping persistent.

The mapping relations:

proc_id in DSDT <--> Processor ID in MADT(acpiid) <--> Local Apic ID in 
MADT(apicid)
^^
||
v    v 
   pxm in DSDT cpuid
^
|
v
 nodeid

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <

Re: [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent

2016-09-13 Thread Dou Liyang

Ping...

At 09/02/2016 02:57 PM, Dou Liyang wrote:

Ping...

At 08/25/2016 04:35 PM, Dou Liyang wrote:

[Summary]

Use ACPI tables: MADT, DSDT.
1. Create cpuid in order based on Local Apic ID in MADT(apicid).
2. Obtain the nodeid by the proc_id in DSDT.
3. Make the cpuid <-> nodeid mapping persistent.

The mapping relations:

proc_id in DSDT <--> Processor ID in MADT(acpiid) <--> Local Apic ID
in MADT(apicid)
^^
||
v    v 
   pxm in DSDT cpuid
^
|
v
 nodeid

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And
workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug
happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and
the like.

When a pool workqueue is initialized, if its cpumask belongs to a
node, its
pool->node will be mapped to that node. And memory used by this
workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct
workqueue_attrs *attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to
that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,

wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to
an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default
order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); -->
Here, useing the wrong node.

..

return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and
nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't
change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is
setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This
mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time.
cpuid is
   allocated, lower ids first, and released at CPU hotremove time,
reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd
time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is
not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the
possible
cpus at boot time, and make it persistent. And according to
init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and
cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or
found in
MADT (Multiple APIC Description Table). So we finish the job in the
following steps:

1. Enable apic registeration flow to handle both enabled and disabled
cpus.
   This is done by introducing an extra parameter to
generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid
mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <

Re: [x86-tip] strange nr_cpus= boot regression

2016-09-26 Thread Dou Liyang

Hi tglx,

I'm sorry for the late reply.
Awfully sorry that I could not do anything help.

In fact, it's my fault.
I should re-base my patches after the commit c291b0151585 in time.

I learned a lot from it.
Thank a lot, and once again my apologies.

Thanks,

Dou

At 09/27/2016 01:36 AM, Thomas Gleixner wrote:

CC'ed: Dou Liyang

On Mon, 26 Sep 2016, Mike Galbraith wrote:


I've encountered a strange regression in tip, symptom is that if you
boot with nr_cpus=nr_you_have, what actually boots is nr_you_have/2.
 Do not pass nr_cpus=, and all is well.


What's the number of possible cpus in your system?


Bisection repeatedly goes as below, pointing to the nodeid merge,
despite both timers/core and x86/apic (nodeid) being fine.  Take tip
HEAD, extract all of the commits from nodeid (plus the fix), and revert
them in a quilt tree, the tree remains busted.


So you remove all the nodeid commits from tip/master and it's still broken?


Checkout the timers/core merge commit, and merge nodeid with that, it is
indeed bad.



Bisecting  takes you right the merge commit, with no commit
being 'bad', see logs.


That's more than strange. An empty merge commit being the culprit.

Thanks,

tglx







Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping

2016-10-06 Thread Dou Liyang

Hi Yinghai,

At 10/06/2016 12:53 PM, Yinghai Lu wrote:

On Wed, Oct 5, 2016 at 7:04 AM, Thomas Gleixner  wrote:

@@ -176,6 +177,11 @@ static int acpi_register_lapic(int id, u
 return -EINVAL;
 }

+if (!enabled && (id == disabled_id)) {
+++disabled_cpus;
+return -EINVAL;
+}


Why would you need that disabled_id thing at all? The proper fix is to let
the apic driver detect the issue and this boils down to a 5 lines
change. Does the patch below fix the issue for you?
8<
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2076,6 +2076,11 @@ int __generic_processor_info(int apicid,
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
phys_cpu_present_map);

+   if (!apic->apic_id_valid(apicid)) {
+   disabled_cpus++;
+   return -EINVAL;
+   }
+
/*
 * boot_cpu_physical_apicid is designed to have the apicid
 * returned by read_apic_id(), i.e, the apicid of the




No, That does not fix the issue.

the system have x2apic pre_enabled from BIOS, so at the time
apic is set to _x2apic_cluster.

early_acpi_boot_init ==> early_acpi_process_madt ==> acpi_parse_madt
==> default_acpi_madt_oem_check

default_acpi_madt_oem_check
  ==> apic_x2apic_cluster/x2apic_acpi_madt_oem_check ==> x2apic_enabled
  ==> apic = _x2apic_cluster

and
static int x2apic_apic_id_valid(int apicid)
{
return 1;
}

To make your change work, may need to update x2apic_apic_id_valid to

static int x2apic_apic_id_valid(int apicid)
{
if (apicid == 0xff || apicid == -1)
return 0;



I seem to remember that in x2APIC Spec the x2APIC ID may be at 255 or
greater.
If we do that judgment, it may be affect x2APIC's work in some other places.

I saw the MADT, the main reason may be that we define 0xff to acpi_id
in LAPIC mode.
As you said, it was like:
[   42.107902] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.120125] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.132361] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
...

How about doing the acpi_id check when we parse it in
acpi_parse_lapic().

8<

--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -233,6 +233,11 @@ acpi_parse_lapic(struct acpi_subtable_header * 
header, const unsigned long end)


acpi_table_print_madt_entry(header);

+   if (processor->id >= 255) {
+   ++disabled_cpus;
+   return -EINVAL;
+   }
+
/*
 * We need to register disabled CPU as well to permit
 * counting disabled CPUs. This allows us to size


Thanks

Dou


return 1;
}


Thanks

Yinghai







Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping

2016-10-06 Thread Dou Liyang

Hi Yinghai

At 10/07/2016 05:20 AM, Yinghai Lu wrote:

On Thu, Oct 6, 2016 at 1:06 AM, Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


I seem to remember that in x2APIC Spec the x2APIC ID may be at 255 or
greater.


Good to know. Maybe later when one package have more cores like 30 cores etc.


If we do that judgment, it may be affect x2APIC's work in some other places.

I saw the MADT, the main reason may be that we define 0xff to acpi_id
in LAPIC mode.
As you said, it was like:
[   42.107902] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.120125] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.132361] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
...

How about doing the acpi_id check when we parse it in
acpi_parse_lapic().

8<

--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -233,6 +233,11 @@ acpi_parse_lapic(struct acpi_subtable_header * header,
const unsigned long end)

acpi_table_print_madt_entry(header);

+   if (processor->id >= 255) {
+   ++disabled_cpus;
+   return -EINVAL;
+   }
+
/*
 * We need to register disabled CPU as well to permit
 * counting disabled CPUs. This allows us to size


Yes, that should work. but should do the same thing for x2apic

in acpi_parse_x2apic should have


+   if (processor->local_apic_id == -1) {
+   ++disabled_cpus;
+   return -EINVAL;
+   }


that is the reason why i want to extend acpi_register_lapic()
to take extra disabled_id (one is 0xff and another is 0x)
so could save some lines.



Yes, I understood.
But I think adding an extra disabled_id is not a good way for
validating the apic_id. If the disabled_id is not just one id(-1 or
255), may be two or more, even be a range. what should we do for
extending our code?

Firstly, I am not sure that the "-1" could appear in the MADT, even if
the ACPI tables is unreasonable.

Seondly, I guess if we need the check, there are some reserved methods
in the kernel, such as "default_apic_id_valid", "x2apic_apic_id_valid"
and so on. we should extend all of them and use them for check.


CC'ed: Rafael and Lv

May I ask a question?

Is it possible that the "-1/ox" could appear in the MADT which 
is one of the ACPI tables?




Thanks

Yinghai







[PATCH 1/2] x86/acpi: Fix the local APIC id validation in case of 0xff

2016-10-08 Thread Dou Liyang
In MADT, the 0xff is an invalid local APIC id.

When the kernel uses both the local APIC id and x2apic id, it may
affect x2apic.

Only add validation when the kernel parse the local APIC ids.

Reported-by: Yinghai Lu <ying...@kernel.org>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/acpi/boot.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 32a7d70..d642c95 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -233,6 +233,10 @@ acpi_parse_lapic(struct acpi_subtable_header * header, 
const unsigned long end)
 
acpi_table_print_madt_entry(header);
 
+   /* the 0xff is an invalid local APIC id */
+   if (processor->id == 0xff)
+   return -EINVAL;
+
/*
 * We need to register disabled CPU as well to permit
 * counting disabled CPUs. This allows us to size
-- 
2.5.5





[PATCH 0/2] Fix the local APIC id validation in case of 0xff

2016-10-08 Thread Dou Liyang
The patches are for the problem which is in below link.

https://lkml.org/lkml/2016/10/4/39

Dou Liyang (2):
  x86/acpi: Fix the local APIC id validation in case of 0xff
  x86/acpi: Fix error handling steps in parsing the lapic/x2apic entry

 arch/x86/kernel/acpi/boot.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

-- 
2.5.5





[PATCH 2/2] x86/acpi: Fix error handling steps in parsing the lapic/x2apic entry

2016-10-08 Thread Dou Liyang
Originally, in acpi_parse_x2apic(), when the apic_id is invalid and
enabled is false, the acpi_register_lapic() also can be executed.
This does not make sense.

Optimize the decision logic to avoid performing meaningless operations
if the apic_id is invalid.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/acpi/boot.c | 33 +
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index d642c95..343e752 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -203,17 +203,20 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, 
const unsigned long end)
apic_id = processor->local_apic_id;
enabled = processor->lapic_flags & ACPI_MADT_ENABLED;
 #ifdef CONFIG_X86_X2APIC
+   if (!apic->apic_id_valid(apic_id)) {
+   if (enabled)
+   printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
+   return -EINVAL;
+   }
+
/*
-* We need to register disabled CPU as well to permit
-* counting disabled CPUs. This allows us to size
-* cpus_possible_map more accurately, to permit
-* to not preallocating memory for all NR_CPUS
-* when we use CPU hotplug.
-*/
-   if (!apic->apic_id_valid(apic_id) && enabled)
-   printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
-   else
-   acpi_register_lapic(apic_id, processor->uid, enabled);
+   * We need to register disabled CPU as well to permit
+   * counting disabled CPUs. This allows us to size
+   * cpus_possible_map more accurately, to permit
+   * to not preallocating memory for all NR_CPUS
+   * when we use CPU hotplug.
+   */
+   acpi_register_lapic(apic_id, processor->uid, enabled);
 #else
printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
 #endif
@@ -225,6 +228,7 @@ static int __init
 acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end)
 {
struct acpi_madt_local_apic *processor = NULL;
+   u8 enabled;
 
processor = (struct acpi_madt_local_apic *)header;
 
@@ -233,9 +237,14 @@ acpi_parse_lapic(struct acpi_subtable_header * header, 
const unsigned long end)
 
acpi_table_print_madt_entry(header);
 
+   enabled = processor->lapic_flags & ACPI_MADT_ENABLED;
+
/* the 0xff is an invalid local APIC id */
-   if (processor->id == 0xff)
+   if (processor->id == 0xff) {
+   if (enabled)
+   printk(KERN_WARNING PREFIX "lapic entry ignored\n");
return -EINVAL;
+   }
 
/*
 * We need to register disabled CPU as well to permit
@@ -246,7 +255,7 @@ acpi_parse_lapic(struct acpi_subtable_header * header, 
const unsigned long end)
 */
acpi_register_lapic(processor->id,  /* APIC ID */
processor->processor_id, /* ACPI ID */
-   processor->lapic_flags & ACPI_MADT_ENABLED);
+   enabled);
 
return 0;
 }
-- 
2.5.5





Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping

2016-10-07 Thread Dou Liyang

Hi tglx,

At 10/07/2016 09:07 PM, Thomas Gleixner wrote:

On Thu, 6 Oct 2016, Dou Liyang wrote:


+   if (processor->id >= 255) {
+   ++disabled_cpus;


Incrementing disabled_cpus here is simply wrong because 0xff is an invalid
local APIC id. So we can simply return -EINVAL and be done with it.



Yes, It is.


+   return -EINVAL;


Thanks,

tglx




Thanks,

Dou




Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping

2016-10-07 Thread Dou Liyang

Hi tglx,

At 10/07/2016 09:00 PM, Thomas Gleixner wrote:

On Fri, 7 Oct 2016, Thomas Gleixner wrote:

On Fri, 7 Oct 2016, Dou Liyang wrote:

Is it possible that the "-1/ox" could appear in the MADT which is one
of the ACPI tables?


According to the SDM the x2apic id is a 32bit ID, so 0x is a
legitimate value.


Yes, I see.



The ACPI spec says that bit 0 of the x2apic flags field tells whether the
logical processor is present or not. So the proper check for x2apic is that
flag.

The lapic structure has the same flag, but the kernel ignores the flags for
both lapic and x2apic.


It seems the kernel uses the flags in this sentence:

enabled = processor->lapic_flags & ACPI_MADT_ENABLED;




I'm going to apply the minimal fix of checking for id == 0xff in
acpi_lapic_parse() for now, but this needs to be revisited and fixed
proper.


Yes, I will do it.


Thanks

Dou.




Re: [lkp] [x86/acpi] dc6db24d24: BUG: unable to handle kernel paging request at 0000116007090008

2016-10-20 Thread Dou Liyang

Hi xiaolong,

Thank you very much for report.

I was just investigating the related problem in another patches.


At 10/20/2016 09:16 AM, kernel test robot wrote:


FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit dc6db24d2476cd09c0ecf2b8d80313539f737a89 ("x86/acpi: Set persistent cpuid <-> 
nodeid mapping when booting")

in testcase: vm-scalability
with following parameters:

runtime: 300
thp_enabled: never
thp_defrag: never
nr_task: 1
nr_pmem: 1
test: swap-w-rand
cpufreq_governor: performance


The motivation behind this suite is to exercise functions and regions of the 
mm/ of the Linux kernel which are of interest to us.


on test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G 
memory



For this bug, I want to reproduce it completely.
I hope you can give me the ACPI table about the test machine above.

Thanks,

Dou.


caused below changes:


+--+++
|  | 8ad893faf2 
| dc6db24d24 |
+--+++
| boot_successes   | 7  
| 0  |
| boot_failures| 9  
| 16 |
| invoked_oom-killer:gfp_mask=0x   | 6  
| 2  |
| Mem-Info | 6  
| 2  |
| Out_of_memory:Kill_process   | 6  
||
| page_allocation_failure:order:#,mode:#(GFP_KERNEL|__GFP_NORETRY) | 2  
||
| warn_alloc_failed+0x | 2  
||
| BUG:kernel_hang_in_test_stage| 2  
| 2  |
| BUG:kernel_reboot-without-warning_in_test_stage  | 1  
||
| BUG:unable_to_handle_kernel  | 0  
| 12 |
| Oops | 0  
| 12 |
| RIP:get_partial_node | 0  
| 12 |
| calltrace:devtmpfsd  | 0  
| 12 |
| RIP:_raw_spin_lock_irqsave   | 0  
| 9  |
| general_protection_fault:#[##]SMP| 0  
| 3  |
| RIP:native_queued_spin_lock_slowpath | 0  
| 3  |
| Kernel_panic-not_syncing:Hard_LOCKUP | 0  
| 3  |
| RIP:load_balance | 0  
| 2  |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt| 0  
| 2  |
| WARNING:at_lib/list_debug.c:#__list_add  | 0  
| 1  |
| calltrace:_do_fork   | 0  
| 1  |
| RIP:resched_curr | 0  
| 1  |
| Kernel_panic-not_syncing:Fatal_exception | 0  
| 1  |
| WARNING:at_include/linux/uaccess.h:#__probe_kernel_read  | 0  
| 5  |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0  
| 2  |
+--+++



[9.531507] pci :80:02.2:   bridge window [mem 
0x387fffd0-0x387fffef 64bit pref]
[9.541378] pci_bus :80: on NUMA node 2
[9.546734] ACPI: Enabled 4 GPEs in block 00 to 3F
[9.586911] BUG: unable to handle kernel paging request at 116007090008
[9.595109] IP: [] get_partial_node+0x2c/0x1c0
[9.602933] PGD 0
[9.605503] Oops:  [#1] SMP
[9.609264] Modules linked in:
[9.613005] CPU: 24 PID: 585 Comm: kdevtmpfs Not tainted 
4.8.0-rc1-00300-gdc6db24d #1
[9.622193] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
SE5C610.86B.01.01.0008.021120151325 02/11/2015
[9.634299] task: 88006804 task.stack: 880068024000
[9.641168] RIP: 0010:[]  [] 
get_partial_node+0x2c/0x1c0
[9.651890] RSP: :8800680279f0  EFLAGS: 00010006
[9.658079] RAX: 0002 RBX: 0246 RCX: 02098020
[9.666308] RDX: 882053b9cfc0 RSI: 11600709 RDI: 880076804dc0
[9.674535] RBP: 880068027a90 R08: 882053b9cfb0 R09: 
[9.682764] R10: 880068027c88 R11: 000b R12: 880076804dc0
[9.690994] R13:  R14: 

Re: [x86/acpi] 04c197c080: BUG: unable to handle kernel paging request at 0000003000000010

2016-11-01 Thread Dou Liyang

Hi xiaolong,

Sorry for the late reply.

I think I should need to explain for this.

Firstly, please ignore this bug. Because, this patch has be discarded.
The work of this patch has already in the upstream(f3bf1dbe64).

Secondly, I think the cause of the bug is:

I use the "-EINVAL" incorrectly.

Thanks,
Dou.

This patch is repeated with him
At 10/11/2016 10:15 AM, kernel test robot wrote:

FYI, we noticed the following commit:

https://github.com/0day-ci/linux 
Dou-Liyang/Fix-the-local-APIC-id-validation-in-case-of-0xff/20161008-154907
commit 04c197c080f2ed7a022f79701455c6837f4b9573 ("x86/acpi: Fix the local APIC id 
validation in case of 0xff")

in testcase: will-it-scale
with following parameters:

test: unlink2
cpufreq_governor: performance


Will It Scale takes a testcase and runs it from 1 through to n parallel copies 
to see if the testcase will scale. It builds both a process and threads based 
test in order to see any differences between the two.


on test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G 
memory

caused below changes:


+---+++
|   | 1e1a4b0f54 | 
04c197c080 |
+---+++
| boot_successes| 2  | 4
  |
| boot_failures | 0  | 7
  |
| BUG:unable_to_handle_kernel   | 0  | 3
  |
| Oops  | 0  | 3
  |
| RIP:check_timer   | 0  | 3
  |
| calltrace:native_smp_prepare_cpus | 0  | 3
  |
| Kernel_panic-not_syncing:Fatal_exception  | 0  | 3
  |
| PANIC:double_fault| 0  | 2
  |
| Bad_pagetable | 0  | 1
  |
| RIP:copy_user_enhanced_fast_string| 0  | 1
  |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0  | 1
  |
| Kernel_panic-not_syncing:Machine_halted   | 0  | 1
  |
| RIP:vgacon_scroll | 0  | 1
  |
| invoked_oom-killer:gfp_mask=0x| 0  | 4
  |
| Mem-Info  | 0  | 4
  |
+---+++



[0.492621] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[0.499130] ...trying to set up timer (IRQ0) through the 8259A ...
[0.506027] . (found apic 2 pin 0) ...
[0.510601] BUG: unable to handle kernel paging request at 00300010
[0.518391] IP: [] check_timer+0x21d/0x61e
[0.524722] PGD 0
[0.526974] Oops:  [#1] SMP
[0.530477] Modules linked in:
[0.533901] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-00989-g04c197c #1
[0.541865] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
SE5C610.86B.01.01.0008.021120151325 02/11/2015
[0.553521] task: 882023b0 task.stack: c9000c468000
[0.560129] RIP: 0010:[]  [] 
check_timer+0x21d/0x61e
[0.569170] RSP: :c9000c46bd90  EFLAGS: 00010082
[0.575095] RAX: 0030 RBX:  RCX: 81e5cb48
[0.583058] RDX: 0001 RSI: 0046 RDI: 0046
[0.591031] RBP: c9000c46be08 R08:  R09: 
[0.598994] R10: 0040 R11: 0208 R12: 0002
[0.606957] R13: 0002 R14: 0002 R15: 88103f00ae20
[0.614927] FS:  () GS:88103f40() 
knlGS:
[0.623958] CS:  0010 DS:  ES:  CR0: 80050033
[0.630368] CR2: 00300010 CR3: 00207ee06000 CR4: 001406f0
[0.638331] Stack:
[0.640574]   0246  
88103f002080
[0.648870]  88103f002080 0017 0001 
c9000c46bdd8
[0.657165]  8145869d c9000c46bde8 0718 
0001
[0.665459] Call Trace:
[0.668193]  [] ? radix_tree_lookup+0xd/0x10
[0.674710]  [] setup_IO_APIC+0x17d/0x1c5
[0.680937]  [] apic_bsp_setup+0xa1/0xac
[0.687059]  [] native_smp_prepare_cpus+0x297/0x317
[0.694259]  [] kernel_init_freeable+0xcf/0x225
[0.701072]  [] ? rest_init+0x90/0x90
[0.706911]  [] kernel_init+0xe/0x100
[0.712744]  [] ret_from_fork+0x25/0x30
[0.718776] Code: ff 48 c7 c7 50 6d c9 81 e8 31 54 17 ff 89 da 44 89 ee 48 c7 c7 
90 6d c9 81 e8 20 54 17 ff 48 8b 45 a8 48 8b 00 48 39 45 a8 74 1a <44>

[PATCH v2] x86/apic: Fix two typos in comments

2017-01-05 Thread Dou Liyang
s/ID/IDs/
s/inr_logical_cpuidi/nr_logical_cpuids/
s/generic_processor_info()/__generic_processor_info()/

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 5b7e43e..5c4fdcf 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2028,8 +2028,8 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 /*
  * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
  * contiguously, it equals to current allocated max logical CPU ID plus 1.
- * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
- * nr_logical_cpuids is nr_cpu_ids.
+ * All allocated CPU IDs should be in the [0, nr_logical_cpuids) range,
+ * so the maximum of nr_logical_cpuids is nr_cpu_ids.
  *
  * NOTE: Reserve 0 for BSP.
  */
@@ -2094,7 +2094,7 @@ int __generic_processor_info(int apicid, int version, 
bool enabled)
 * Since fixing handling of boot_cpu_physical_apicid requires
 * another discussion and tests on each platform, we leave it
 * for now and here we use read_apic_id() directly in this
-* function, generic_processor_info().
+* function, __generic_processor_info().
 */
if (disabled_cpu_apicid != BAD_APICID &&
disabled_cpu_apicid != read_apic_id() &&
-- 
2.5.5





Re: [PATCH] x86/apic: Fix two typos in comments

2017-01-05 Thread Dou Liyang

Hi, Ingo

At 01/05/2017 04:15 PM, Ingo Molnar wrote:


* Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


s/inr_logical_cpuidi/nr_logical_cpuids/
s/generic_processor_info()/__generic_processor_info()/

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 5b7e43e..c32a3ad 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2028,7 +2028,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 /*
  * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
  * contiguously, it equals to current allocated max logical CPU ID plus 1.
- * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * All allocated CPU ID should be in [0, nr_logical_cpuids), so the maximum of


There's another typo in that sentence as well, and the wording should be 
clarified
as well while at it. Something like this would work for me:


+ * All allocated CPU IDs should be in the [0, nr_logical_cpuids) range,
+ * so the maximum of




Yes, It is. :)

Thanks,

Dou




[PATCH] x86/apic: Fix two typos in comments

2016-12-25 Thread Dou Liyang
s/inr_logical_cpuidi/nr_logical_cpuids/
s/generic_processor_info()/__generic_processor_info()/

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 5b7e43e..c32a3ad 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2028,7 +2028,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 /*
  * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
  * contiguously, it equals to current allocated max logical CPU ID plus 1.
- * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * All allocated CPU ID should be in [0, nr_logical_cpuids), so the maximum of
  * nr_logical_cpuids is nr_cpu_ids.
  *
  * NOTE: Reserve 0 for BSP.
@@ -2094,7 +2094,7 @@ int __generic_processor_info(int apicid, int version, 
bool enabled)
 * Since fixing handling of boot_cpu_physical_apicid requires
 * another discussion and tests on each platform, we leave it
 * for now and here we use read_apic_id() directly in this
-* function, generic_processor_info().
+* function, __generic_processor_info().
 */
if (disabled_cpu_apicid != BAD_APICID &&
disabled_cpu_apicid != read_apic_id() &&
-- 
2.5.5





[RFC PATCH 2/6] x86/apic: Construct a framework for setuping APIC mode as soon as possible

2017-03-29 Thread Dou Liyang
Now, there are two ways to setup local apic and io-apic in X86 arch:
  1. In an SMP-capable system, it will be done when preparing the
cpus in native_smp_prepare_boot_cpu().
  2. If UP_LATE_INIT is y, it will be done in smp_init()

And, there are many switches in kernel which can determine the way of
APIC mode setup, as shown below:

  1. kconfig :
 CONFIG_X86_64; CONFIG_X86_LOCAL_APIC; CONFIG_x86_IO_APIC
  2. kernel option: disable_apic; skip_ioapic_setup
  3. BIOS : boot_cpu_has(X86_FEATURE_APIC)
  4. MP table: smp_found_config
  5. ACPI: acpi_lapic; acpi_ioapic; nr_ioapic

The setup is late which cause the dump-capture kernel hangs with 'notsc'
option in 1st kernel option. and the use of these switches is messily.

Before make the APIC mode setup earlier, construct a framework first to
prepare for the work and make the logic clear.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/apic.h |  2 ++
 arch/x86/kernel/apic/apic.c | 73 +
 arch/x86/kernel/irqinit.c   |  3 ++
 3 files changed, 78 insertions(+)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 20ac73c..c973f18 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -172,6 +172,8 @@ static inline void disable_local_APIC(void) { }
 # define setup_secondary_APIC_clock x86_init_noop
 static inline void lapic_update_tsc_freq(void) { }
 static inline void apic_virture_wire_mode_setup(void) {}
+static inline void init_bsp_APIC(void) {}
+
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index f4fc949..bf4ccd0 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1153,6 +1153,63 @@ void __init sync_Arb_IDs(void)
APIC_INT_LEVELTRIG | APIC_DM_INIT);
 }
 
+enum apic_bsp_mode {
+   APIC_BSP_MODEL_PIC = 0,
+   APIC_BSP_MODEL_VIRTUAL_WIRE,
+   APIC_BSP_MODEL_SYMMETRIC_IO,
+   APIC_BSP_MODEL_COUNT
+};
+
+static int __init apic_bsp_mode_check(void)
+{
+
+   /* Check kernel option */
+   if (disable_apic) {
+   pr_info("APIC disabled by kernel option\n");
+   return APIC_BSP_MODEL_PIC;
+   }
+   /* Check BOIS */
+#ifdef CONFIG_X86_64
+   /* On 64-bit, The APIC is integrated, So, must have APIC feature */
+   if (!boot_cpu_has(X86_FEATURE_APIC)) {
+   disable_apic = 1;
+   pr_info("Apic disabled by BIOS\n");
+   return APIC_BSP_MODEL_PIC;
+   }
+#else
+   if (!boot_cpu_has(X86_FEATURE_APIC) &&
+   APIC_INTEGRATED(boot_cpu_apic_version)) {
+   pr_err("BIOS bug, local APIC #%d not detected!...\n",
+   boot_cpu_physical_apicid);
+   pr_err("... forcing use of dummy APIC emulation (tell your hw 
vendor)\n");
+   return APIC_BSP_MODEL_PIC;
+   }
+#endif
+   /*
+* Check MP table, if neither an integrated nor a separate chip
+* doesn't exist.
+*/
+   if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config) {
+   pr_info("BOIS don't support APIC, and no SMP configuration.\n");
+   return APIC_BSP_MODEL_PIC;
+   }
+
+   /* Check MP table, ps: if the virtual wire has been setup */
+   if (!smp_found_config) {
+   disable_ioapic_support();
+
+   /* Check local APIC, if SMP_NO_CONFIG */
+   if (!acpi_lapic)
+   pr_info("SMP motherboard not detected\n");
+
+   return APIC_BSP_MODEL_VIRTUAL_WIRE;
+   }
+
+   /* Other checks of ACPI options will be done in each setup function */
+
+   return APIC_BSP_MODEL_SYMMETRIC_IO;
+}
+
 /*
  * Setup the through-local-APIC virtual wire mode.
  */
@@ -1202,6 +1259,22 @@ void apic_virture_wire_mode_setup(void)
apic_write(APIC_LVT1, value);
 }
 
+/* init the interrupt routing model for the BSP */
+void __init init_bsp_APIC(void)
+{
+   switch (apic_bsp_mode_check()) {
+   case APIC_BSP_MODEL_PIC:
+   pr_info("Keep in PIC mode(8259)\n");
+   return;
+   case APIC_BSP_MODEL_VIRTUAL_WIRE:
+   pr_info("switch to virtual wire model.\n");
+   return;
+   case APIC_BSP_MODEL_SYMMETRIC_IO:
+   pr_info("switch to symmectic I/O model.\n");
+   return;
+   }
+}
+
 static void lapic_setup_esr(void)
 {
unsigned int oldvalue, value, maxlvt;
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index b6ef4ea..f30fb16 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -197,4 +197,7 @@ void __init native_init_IRQ(void)
 #ifdef CONFIG_X86_32
irq_ctx_init(smp_processor_id());
 #endif
+
+   /* init the IRQ Mode for BSP */
+   init_bsp_APIC();
 }
-- 
2.5.5





[RFC PATCH 3/6] x86/apic: Extract APIC timer related code from apic_bsp_setup()

2017-03-29 Thread Dou Liyang
The apic_bsp_setup() contains the APIC timer related code, which
leads to hard reuse the local APIC and I/O APIC setup independently.

Extract the related code to a single function for setuping APIC in
advance.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/io_apic.h |  2 ++
 arch/x86/kernel/apic/apic.c| 28 +---
 arch/x86/kernel/apic/io_apic.c |  4 +---
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 6cbf2cf..535ca00 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -189,6 +189,7 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
return x86_io_apic_ops.read(apic, reg);
 }
 
+extern void check_timer(void);
 extern void setup_IO_APIC(void);
 extern void enable_IO_APIC(void);
 extern void disable_IO_APIC(void);
@@ -230,6 +231,7 @@ static inline void io_apic_init_mappings(void) { }
 #define native_io_apic_readNULL
 #define native_disable_io_apic NULL
 
+static inline void check_timer(void) { }
 static inline void setup_IO_APIC(void) { }
 static inline void enable_IO_APIC(void) { }
 static inline void setup_ioapic_dest(void) { }
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index bf4ccd0..0ba8a85 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2324,6 +2324,25 @@ static void __init apic_bsp_up_setup(void)
physid_set_mask_of_physid(boot_cpu_physical_apicid, 
_cpu_present_map);
 }
 
+/* Setup local APIC timer and get the Id*/
+static int __init apic_bsp_timer_setup(void)
+{
+   int id;
+
+   if (x2apic_mode)
+   id = apic_read(APIC_LDR);
+   else
+   id = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
+
+   if (!skip_ioapic_setup && nr_ioapics && nr_legacy_irqs())
+   check_timer();
+
+   /* Setup local timer */
+   x86_init.timers.setup_percpu_clockev();
+
+   return id;
+}
+
 /**
  * apic_bsp_setup - Setup function for local apic and io-apic
  * @upmode:Force UP mode (for APIC_init_uniprocessor)
@@ -2340,17 +2359,12 @@ int __init apic_bsp_setup(bool upmode)
apic_bsp_up_setup();
setup_local_APIC();
 
-   if (x2apic_mode)
-   id = apic_read(APIC_LDR);
-   else
-   id = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
-
enable_IO_APIC();
end_local_APIC_setup();
irq_remap_enable_fault_handling();
setup_IO_APIC();
-   /* Setup local timer */
-   x86_init.timers.setup_percpu_clockev();
+
+   id = apic_bsp_timer_setup();
return id;
 }
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 347bb9f..e19b88f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2047,7 +2047,7 @@ static int mp_alloc_timer_irq(int ioapic, int pin)
  *
  * FIXME: really need to revamp this for all platforms.
  */
-static inline void __init check_timer(void)
+void __init check_timer(void)
 {
struct irq_data *irq_data = irq_get_irq_data(0);
struct mp_chip_data *data = irq_data->chip_data;
@@ -2278,8 +2278,6 @@ void __init setup_IO_APIC(void)
sync_Arb_IDs();
setup_IO_APIC_irqs();
init_IO_APIC_traps();
-   if (nr_legacy_irqs())
-   check_timer();
 
ioapic_initialized = 1;
 }
-- 
2.5.5





[RFC PATCH 1/6] x86/apic: Replace init_bsp_APIC() with apic_virture_wire_mode_setup()

2017-03-29 Thread Dou Liyang
The init_bsp_APIC() setups the virtual wire mode through the local
APIC.

The function name is unsuitable which might imply that the BSP's
APIC will be initialized here, actually, where it will be done is
almost at the end of start_kernel(). And the CONFIG X86_64 is also
imply the X86_LOCAL_APIC is y.

Clarify it, also remove the redundant macros to increase
readability

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/apic.h | 2 ++
 arch/x86/kernel/apic/apic.c | 4 ++--
 arch/x86/kernel/irqinit.c   | 5 ++---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 730ef65..20ac73c 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -127,6 +127,7 @@ extern void disconnect_bsp_APIC(int virt_wire_setup);
 extern void disable_local_APIC(void);
 extern void lapic_shutdown(void);
 extern void sync_Arb_IDs(void);
+extern void apic_virture_wire_mode_setup(void);
 extern void init_bsp_APIC(void);
 extern void setup_local_APIC(void);
 extern void init_apic_mappings(void);
@@ -170,6 +171,7 @@ static inline void disable_local_APIC(void) { }
 # define setup_boot_APIC_clock x86_init_noop
 # define setup_secondary_APIC_clock x86_init_noop
 static inline void lapic_update_tsc_freq(void) { }
+static inline void apic_virture_wire_mode_setup(void) {}
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8ccb7ef..f4fc949 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1154,9 +1154,9 @@ void __init sync_Arb_IDs(void)
 }
 
 /*
- * An initial setup of the virtual wire mode.
+ * Setup the through-local-APIC virtual wire mode.
  */
-void __init init_bsp_APIC(void)
+void apic_virture_wire_mode_setup(void)
 {
unsigned int value;
 
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 1423ab1..b6ef4ea 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -72,9 +72,8 @@ void __init init_ISA_irqs(void)
struct irq_chip *chip = legacy_pic->chip;
int i;
 
-#if defined(CONFIG_X86_64) || defined(CONFIG_X86_LOCAL_APIC)
-   init_bsp_APIC();
-#endif
+   apic_virture_wire_mode_setup();
+
legacy_pic->init(0);
 
for (i = 0; i < nr_legacy_irqs(); i++)
-- 
2.5.5





[RFC PATCH 0/6] Unify the Interrupt Mode and setup it as soon as possible

2017-03-29 Thread Dou Liyang
According to Ingo's and Eric's advice[1,2], Try my best to optimize the 
init of Interrupt Mode for x86.

The MP specification defines three different interrupt modes as follows:

 1. PIC Mode
 2. Virtual Wire Mode
 3. Symmetic I/O Mode

Currently, In kernel,

1. Setup the Virtual Wire Mode during the IRQ initialization(
step 1 in the following figure).
2. Enable and Setup the Symmetic I/O Mode either during the
SMP-capabe system prepares CPUs(step 2) or during the UP system 
initializes itself(step 3).

  start_kernel
+---+
|
+--> ...
|
|setup_arch
+--> +---+
|
|init_IRQ
+-> +--+-+
|  |init_ISA_irqs
|  +--> +-++
| | ++
+---> +-->  | 1.init_bsp_APIC|
| ...   ++
+--->
| rest_init
+--->---+-+
|   |   kernel_init
|   +> +-+
|  |   kernel_init_freeable
|  +->  +-+
|   | smp_prepare_cpus
|   +---> ++-+
|   |  |   +---+
|   |  +-> |2.  apic_bsp_setup |
|   |  +---+
|   |
v   | smp_init
+---> +---++
  |+---+
  +--> |3.  apic_bsp_setup |
   +---+

The purpose of this patchset is Unifing these setup steps and executing as
soon as possible as follows:

   start_kernel
---+
|
|
|
| init_IRQ
+>---++
||
||  ++
|+> | 4. init_bsp_APIC   |
|   ++
v

By the way, Also fix a bug about kexec[3].


Some doubts, need help:

1. Patchset has influence on IOMMU in enable_IR_x2apic(). Not sure
it can be in advance?

2. Due to 

Commit 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on kexec")

 ..., patchset also needs TSC and uses the "cpu_khz" in setup_local_APIC().
And a warning[4] will be triggered when crashed/on kexec. Not sure how to
modify?

[1]. https://lkml.org/lkml/2016/8/2/929
[2]. https://lkml.org/lkml/2016/8/1/506
[3]. https://lkml.org/lkml/2016/7/25/1118
[4]. WARN_ON(max_loops <= 0) in setup_local_APIC()

Dou Liyang (6):
  x86/apic: Replace init_bsp_APIC() with apic_virture_wire_mode_setup()
  x86/apic: Construct a framework for setuping APIC mode as soon as
possible
  x86/apic: Extract APIC timer related code from apic_bsp_setup()
  x86/apic: Make the APIC mode setup earlier for SMP-capable system
  x86/apic: Make the APIC mode setup earlier for UP system
  x86/apic: Remove the apic_virture_wire_mode_setup()

 arch/x86/include/asm/apic.h|   7 +-
 arch/x86/include/asm/io_apic.h |   2 +
 arch/x86/kernel/apic/apic.c| 218 -
 arch/x86/kernel/apic/io_apic.c |   4 +-
 arch/x86/kernel/irqinit.c  |   6 +-
 arch/x86/kernel/smpboot.c  |  68 ++---
 6 files changed, 149 insertions(+), 156 deletions(-)

-- 
2.5.5





[RFC PATCH 4/6] x86/apic: Make the APIC mode setup earlier for SMP-capable system

2017-03-29 Thread Dou Liyang
In the SMP-capable system, enable and setup the APIC mode in
native_smp_prepare_boot_cpu() which almost be called at the end
of start_kernel().

The MP table or ACPI has been read earlier, and time_init() which
is called before the APIC mode setup may need the IRQ.

Move the APIC mode setup code to init_IRQ(). Do it at the end of
IRQ initialization for SMP-capable system.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/apic.h |  3 ++-
 arch/x86/kernel/apic/apic.c | 39 ---
 arch/x86/kernel/smpboot.c   | 10 +-
 3 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index c973f18..be2abc3 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -146,7 +146,8 @@ static inline int apic_force_enable(unsigned long addr)
 extern int apic_force_enable(unsigned long addr);
 #endif
 
-extern int apic_bsp_setup(bool upmode);
+extern int apic_bsp_timer_setup(void);
+extern void apic_bsp_setup(bool upmode);
 extern void apic_ap_setup(void);
 
 /*
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 0ba8a85..ce8f88d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1157,6 +1157,7 @@ enum apic_bsp_mode {
APIC_BSP_MODEL_PIC = 0,
APIC_BSP_MODEL_VIRTUAL_WIRE,
APIC_BSP_MODEL_SYMMETRIC_IO,
+   APIC_BSP_MODEL_SYMMETRIC_IO_NO_ROUTING,
APIC_BSP_MODEL_COUNT
 };
 
@@ -1207,7 +1208,29 @@ static int __init apic_bsp_mode_check(void)
 
/* Other checks of ACPI options will be done in each setup function */
 
+#ifdef CONFIG_SMP
+   if (read_apic_id() != boot_cpu_physical_apicid) {
+   pr_info("Boot APIC ID in local APIC unexpected (%d vs %d)",
+   read_apic_id(), boot_cpu_physical_apicid);
+
+   disable_ioapic_support();
+   /*Do nothing, just switch back to PIC here */
+   return APIC_BSP_MODEL_PIC;
+   }
+
+   /*
+* If SMP should be disabled, then really disable it!
+* No need setup apic routing ?
+*/
+   if (!setup_max_cpus) {
+   pr_info("SMP mode deactivated\n");
+   return APIC_BSP_MODEL_SYMMETRIC_IO_NO_ROUTING;
+   }
+
return APIC_BSP_MODEL_SYMMETRIC_IO;
+#else
+   return APIC_BSP_MODEL_PIC;
+#endif
 }
 
 /*
@@ -1271,6 +1294,12 @@ void __init init_bsp_APIC(void)
return;
case APIC_BSP_MODEL_SYMMETRIC_IO:
pr_info("switch to symmectic I/O model.\n");
+   default_setup_apic_routing();
+   apic_bsp_setup(false);
+   return;
+   case APIC_BSP_MODEL_SYMMETRIC_IO_NO_ROUTING:
+   pr_info("switch to symmectic I/O model with no apic 
routing.\n");
+   apic_bsp_setup(false);
return;
}
 }
@@ -2325,7 +2354,7 @@ static void __init apic_bsp_up_setup(void)
 }
 
 /* Setup local APIC timer and get the Id*/
-static int __init apic_bsp_timer_setup(void)
+int __init apic_bsp_timer_setup(void)
 {
int id;
 
@@ -2350,10 +2379,8 @@ static int __init apic_bsp_timer_setup(void)
  * Returns:
  * apic_id of BSP APIC
  */
-int __init apic_bsp_setup(bool upmode)
+void __init apic_bsp_setup(bool upmode)
 {
-   int id;
-
connect_bsp_APIC();
if (upmode)
apic_bsp_up_setup();
@@ -2363,9 +2390,6 @@ int __init apic_bsp_setup(bool upmode)
end_local_APIC_setup();
irq_remap_enable_fault_handling();
setup_IO_APIC();
-
-   id = apic_bsp_timer_setup();
-   return id;
 }
 
 /*
@@ -2404,6 +2428,7 @@ int __init APIC_init_uniprocessor(void)
 
default_setup_apic_routing();
apic_bsp_setup(true);
+   apic_bsp_timer_setup();
return 0;
 }
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bd1f1ad..a556281 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1332,20 +1332,12 @@ void __init native_smp_prepare_cpus(unsigned int 
max_cpus)
return;
case SMP_FORCE_UP:
disable_smp();
-   apic_bsp_setup(false);
return;
case SMP_OK:
break;
}
 
-   if (read_apic_id() != boot_cpu_physical_apicid) {
-   panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
-read_apic_id(), boot_cpu_physical_apicid);
-   /* Or can we switch back to PIC here? */
-   }
-
-   default_setup_apic_routing();
-   cpu0_logical_apicid = apic_bsp_setup(false);
+   cpu0_logical_apicid = apic_bsp_timer_setup();
 
pr_info("CPU0: ");
print_cpu_info(_data(0));
-- 
2.5.5





[RFC PATCH 5/6] x86/apic: Make the APIC mode setup earlier for UP system

2017-03-29 Thread Dou Liyang
The SMP-capable system has already enable and setup the APIC mode
as soon as possible.

Do it for UP system and make the code clear.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/apic.h |  2 ++
 arch/x86/kernel/apic/apic.c | 81 +
 arch/x86/kernel/smpboot.c   | 58 
 3 files changed, 38 insertions(+), 103 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index be2abc3..fb06fe5 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -55,6 +55,8 @@ extern unsigned int lapic_timer_frequency;
 
 #ifdef CONFIG_SMP
 extern void __inquire_remote_apic(int apicid);
+extern int disable_smp_by_APIC;
+
 #else /* CONFIG_SMP */
 static inline void __inquire_remote_apic(int apicid)
 {
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ce8f88d..c93c33d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -169,6 +169,9 @@ __setup("apicpmtimer", setup_apicpmtimer);
 
 unsigned long mp_lapic_addr;
 int disable_apic;
+/* disable smp flag according to APIC configuration */
+int disable_smp_by_APIC;
+
 /* Disable local APIC timer from the kernel commandline or via dmi quirk */
 static int disable_apic_timer __initdata;
 /* Local APIC timer works in C2 */
@@ -1157,13 +1160,13 @@ enum apic_bsp_mode {
APIC_BSP_MODEL_PIC = 0,
APIC_BSP_MODEL_VIRTUAL_WIRE,
APIC_BSP_MODEL_SYMMETRIC_IO,
+   APIC_BSP_MODEL_SYMMETRIC_IO_NO_CONFIG,
APIC_BSP_MODEL_SYMMETRIC_IO_NO_ROUTING,
APIC_BSP_MODEL_COUNT
 };
 
-static int __init apic_bsp_mode_check(void)
+static int __init apic_bsp_mode_check(int *upmode)
 {
-
/* Check kernel option */
if (disable_apic) {
pr_info("APIC disabled by kernel option\n");
@@ -1200,8 +1203,11 @@ static int __init apic_bsp_mode_check(void)
disable_ioapic_support();
 
/* Check local APIC, if SMP_NO_CONFIG */
-   if (!acpi_lapic)
-   pr_info("SMP motherboard not detected\n");
+   if (!acpi_lapic) {
+   *upmode = true;
+   pr_info("SMP motherboard not detected.\n");
+   return APIC_BSP_MODEL_SYMMETRIC_IO_NO_CONFIG;
+   }
 
return APIC_BSP_MODEL_VIRTUAL_WIRE;
}
@@ -1229,7 +1235,13 @@ static int __init apic_bsp_mode_check(void)
 
return APIC_BSP_MODEL_SYMMETRIC_IO;
 #else
-   return APIC_BSP_MODEL_PIC;
+#ifdef CONFIG_UP_LATE_INIT
+   /* In UP system, If it supports late init */
+   *upmode = true;
+   return APIC_BSP_MODEL_SYMMETRIC_IO;
+#else
+   return APIC_BSP_MODEL_PIC;
+#endif
 #endif
 }
 
@@ -1285,9 +1297,12 @@ void apic_virture_wire_mode_setup(void)
 /* init the interrupt routing model for the BSP */
 void __init init_bsp_APIC(void)
 {
-   switch (apic_bsp_mode_check()) {
+   int upmode = false;
+
+   switch (apic_bsp_mode_check()) {
case APIC_BSP_MODEL_PIC:
pr_info("Keep in PIC mode(8259)\n");
+   disable_smp_by_APIC = 1;
return;
case APIC_BSP_MODEL_VIRTUAL_WIRE:
pr_info("switch to virtual wire model.\n");
@@ -1295,13 +1310,17 @@ void __init init_bsp_APIC(void)
case APIC_BSP_MODEL_SYMMETRIC_IO:
pr_info("switch to symmectic I/O model.\n");
default_setup_apic_routing();
-   apic_bsp_setup(false);
-   return;
+   break;
+   case APIC_BSP_MODEL_SYMMETRIC_IO_NO_CONFIG:
+   pr_info("switch to symmectic I/O model with no SMP config.\n");
+   disable_smp_by_APIC = 2;
+   default_setup_apic_routing();
+   break;
case APIC_BSP_MODEL_SYMMETRIC_IO_NO_ROUTING:
pr_info("switch to symmectic I/O model with no apic 
routing.\n");
-   apic_bsp_setup(false);
-   return;
+   break;
}
+   apic_bsp_setup(upmode);
 }
 
 static void lapic_setup_esr(void)
@@ -2392,50 +2411,10 @@ void __init apic_bsp_setup(bool upmode)
setup_IO_APIC();
 }
 
-/*
- * This initializes the IO-APIC and APIC hardware if this is
- * a UP kernel.
- */
-int __init APIC_init_uniprocessor(void)
-{
-   if (disable_apic) {
-   pr_info("Apic disabled\n");
-   return -1;
-   }
-#ifdef CONFIG_X86_64
-   if (!boot_cpu_has(X86_FEATURE_APIC)) {
-   disable_apic = 1;
-   pr_info("Apic disabled by BIOS\n");
-   return -1;
-   }
-#else
-   if (!smp_found_config && !boot_cpu_has(X86_FEATURE_APIC))
-   return -1;
-
-   /*
-* Complain if the BIOS pretend

[RFC PATCH 6/6] x86/apic: Remove the apic_virture_wire_mode_setup()

2017-03-29 Thread Dou Liyang
Currently, enable and setup the interrupt mode has been advanced
and has already included the virtual wire mode.

Remove the apic_virture_wire_mode_setup() which works for the
virtual wire mode originally.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/include/asm/apic.h |  2 --
 arch/x86/kernel/apic/apic.c | 51 +
 arch/x86/kernel/irqinit.c   |  2 --
 3 files changed, 1 insertion(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index fb06fe5..a9f73f4 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -129,7 +129,6 @@ extern void disconnect_bsp_APIC(int virt_wire_setup);
 extern void disable_local_APIC(void);
 extern void lapic_shutdown(void);
 extern void sync_Arb_IDs(void);
-extern void apic_virture_wire_mode_setup(void);
 extern void init_bsp_APIC(void);
 extern void setup_local_APIC(void);
 extern void init_apic_mappings(void);
@@ -174,7 +173,6 @@ static inline void disable_local_APIC(void) { }
 # define setup_boot_APIC_clock x86_init_noop
 # define setup_secondary_APIC_clock x86_init_noop
 static inline void lapic_update_tsc_freq(void) { }
-static inline void apic_virture_wire_mode_setup(void) {}
 static inline void init_bsp_APIC(void) {}
 
 #endif /* !CONFIG_X86_LOCAL_APIC */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index c93c33d..06d87fd 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1245,55 +1245,6 @@ static int __init apic_bsp_mode_check(int *upmode)
 #endif
 }
 
-/*
- * Setup the through-local-APIC virtual wire mode.
- */
-void apic_virture_wire_mode_setup(void)
-{
-   unsigned int value;
-
-   /*
-* Don't do the setup now if we have a SMP BIOS as the
-* through-I/O-APIC virtual wire mode might be active.
-*/
-   if (smp_found_config || !boot_cpu_has(X86_FEATURE_APIC))
-   return;
-
-   /*
-* Do not trust the local APIC being empty at bootup.
-*/
-   clear_local_APIC();
-
-   /*
-* Enable APIC.
-*/
-   value = apic_read(APIC_SPIV);
-   value &= ~APIC_VECTOR_MASK;
-   value |= APIC_SPIV_APIC_ENABLED;
-
-#ifdef CONFIG_X86_32
-   /* This bit is reserved on P4/Xeon and should be cleared */
-   if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
-   (boot_cpu_data.x86 == 15))
-   value &= ~APIC_SPIV_FOCUS_DISABLED;
-   else
-#endif
-   value |= APIC_SPIV_FOCUS_DISABLED;
-   value |= SPURIOUS_APIC_VECTOR;
-   apic_write(APIC_SPIV, value);
-
-   /*
-* Set up the virtual wire mode.
-*/
-   apic_write(APIC_LVT0, APIC_DM_EXTINT);
-   value = APIC_DM_NMI;
-   if (!lapic_is_integrated()) /* 82489DX */
-   value |= APIC_LVT_LEVEL_TRIGGER;
-   if (apic_extnmi == APIC_EXTNMI_NONE)
-   value |= APIC_LVT_MASKED;
-   apic_write(APIC_LVT1, value);
-}
-
 /* init the interrupt routing model for the BSP */
 void __init init_bsp_APIC(void)
 {
@@ -1306,7 +1257,7 @@ void __init init_bsp_APIC(void)
return;
case APIC_BSP_MODEL_VIRTUAL_WIRE:
pr_info("switch to virtual wire model.\n");
-   return;
+   break;
case APIC_BSP_MODEL_SYMMETRIC_IO:
pr_info("switch to symmectic I/O model.\n");
default_setup_apic_routing();
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index f30fb16..dc2deca 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -72,8 +72,6 @@ void __init init_ISA_irqs(void)
struct irq_chip *chip = legacy_pic->chip;
int i;
 
-   apic_virture_wire_mode_setup();
-
legacy_pic->init(0);
 
for (i = 0; i < nr_legacy_irqs(); i++)
-- 
2.5.5





Re: [RFC PATCH 0/6] Unify the Interrupt Mode and setup it as soon as possible

2017-03-29 Thread Dou Liyang

Hi Baoquan,

At 03/30/2017 10:08 AM, Baoquan He wrote:

Hi Liyang,

This is awesome. I planned to do this after kaslr back porting, glad to
see your posting. I like below diagram and the idea of patch 2/6
framework. Will review and see what I can do to help since rhel bug from
FJ is assigned to me.



Thanks very much for your join! We have investigated the bug almost
half a year. :)

In my opinion,
If we plan to refactor the process of APIC initialization for the bug.
There must be lots of work need to be done. This patchset is just the
first step. When I test it, I am thinking about:

1. The check and logic in each enable and setup LAPIC/IOAPIC functions.
2. The process of IRQ remapping.
3. The check and init of APIC timer.
4. The relationship between the various switches, such as If
the smp_found_config is 1, the acpi_lapic must be 1.

And following work to me are:

1. Use more test cases to test.
2. learn the IOMMU.
3. trace the APIC timer code.
4. make the check logic more clear.

Hope to be helpful to you.


Thanks for the effort!

And add Joerg to this thread since he knows IOMMU very well.


oops, Yes, I forgot it, Thanks!

Thanks
Liyang



Thanks
Baoquan

On 03/29/17 at 10:55pm, Dou Liyang wrote:

According to Ingo's and Eric's advice[1,2], Try my best to optimize the
init of Interrupt Mode for x86.

The MP specification defines three different interrupt modes as follows:

 1. PIC Mode
 2. Virtual Wire Mode
 3. Symmetic I/O Mode

Currently, In kernel,

1. Setup the Virtual Wire Mode during the IRQ initialization(
step 1 in the following figure).
2. Enable and Setup the Symmetic I/O Mode either during the
SMP-capabe system prepares CPUs(step 2) or during the UP system
initializes itself(step 3).

  start_kernel
+---+
|
+--> ...
|
|setup_arch
+--> +---+
|
|init_IRQ
+-> +--+-+
|  |init_ISA_irqs
|  +--> +-++
| | ++
+---> +-->  | 1.init_bsp_APIC|
| ...   ++
+--->
| rest_init
+--->---+-+
|   |   kernel_init
|   +> +-+
|  |   kernel_init_freeable
|  +->  +-+
|   | smp_prepare_cpus
|   +---> ++-+
|   |  |   +---+
|   |  +-> |2.  apic_bsp_setup |
|   |  +---+
|   |
v   | smp_init
+---> +---++
  |+---+
  +--> |3.  apic_bsp_setup |
   +---+

The purpose of this patchset is Unifing these setup steps and executing as
soon as possible as follows:

   start_kernel
---+
|
|
|
| init_IRQ
+>---++
||
||  ++
|+> | 4. init_bsp_APIC   |
|   ++
v

By the way, Also fix a bug about kexec[3].


Some doubts, need help:

1. Patchset has influence on IOMMU in enable_IR_x2apic(). Not sure
it can be in advance?

2. Due to

Commit 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on kexec")

 ..., patchset also needs TSC and uses the "cpu_khz" in setup_local_APIC().
And a warning[4] will be triggered when crashed/on kexec. Not sure how to
modify?

[1]. https://lkml.org/lkml/2016/8/2/929
[2]. https://lkml.org/lkml/2016/8/1/506
[3]. https://lkml.org/lkml/2016/7/25/1118
[4]. WARN_ON(max_loops <= 0) in setup_local_APIC()

Dou Liyang (6):
  x86/apic: Replace init_bsp_APIC() with apic_virture_wire_mode_setup()
  x86/apic: Construct a framework for setuping APIC mode as soon as
possible
  x86/apic: Extract APIC timer related code from apic_bsp_setup()
  x86/apic: Make the APIC mode setup earlier for SMP-capable system
  x86/apic: Make the APIC mode setup earlier for UP system
  x86/apic: Remove the apic_virture_wire_mode_setup()

 arch/x86/include/asm/apic.h|   7 +-
 arch/x86/include/asm/io_apic.h |   2 +
 arch/x86/kernel/apic/apic.c| 218 -
 arch/x86/kernel/apic/io_apic.c |   4 +-
 arch/x86/kernel/irqinit.c  |   6 +-
 arch/x86/kernel/smpboot.c  |  68 ++---
 6 files changed, 149 insertions(+), 156 deletions(-)

--
2.5.5












Re: [RFC PATCH 0/6] Unify the Interrupt Mode and setup it as soon as possible

2017-03-29 Thread Dou Liyang



At 03/30/2017 11:03 AM, Dou Liyang wrote:

Hi Baoquan,

At 03/30/2017 10:08 AM, Baoquan He wrote:

Hi Liyang,

This is awesome. I planned to do this after kaslr back porting, glad to
see your posting. I like below diagram and the idea of patch 2/6
framework. Will review and see what I can do to help since rhel bug from
FJ is assigned to me.



Thanks very much for your join! We have investigated the bug almost
half a year. :)

In my opinion,
If we plan to refactor the process of APIC initialization for the bug.
There must be lots of work need to be done. This patchset is just the
first step. When I test it, I am thinking about:

1. The check and logic in each enable and setup LAPIC/IOAPIC functions.
2. The process of IRQ remapping.
3. The check and init of APIC timer.
4. The relationship between the various switches, such as If
the smp_found_config is 1, the acpi_lapic must be 1.

And following work to me are:

1. Use more test cases to test.
2. learn the IOMMU.
3. trace the APIC timer code.
4. make the check logic more clear.

Hope to be helpful to you.


Thanks for the effort!

And add Joerg to this thread since he knows IOMMU very well.




ahh,

--cc j...@8bytes.org, not j...@8types.org

Thanks
Liyang


oops, Yes, I forgot it, Thanks!

Thanks
Liyang



Thanks
Baoquan

On 03/29/17 at 10:55pm, Dou Liyang wrote:

According to Ingo's and Eric's advice[1,2], Try my best to optimize the
init of Interrupt Mode for x86.

The MP specification defines three different interrupt modes as follows:

 1. PIC Mode
 2. Virtual Wire Mode
 3. Symmetic I/O Mode

Currently, In kernel,

1. Setup the Virtual Wire Mode during the IRQ initialization(
step 1 in the following figure).
2. Enable and Setup the Symmetic I/O Mode either during the
SMP-capabe system prepares CPUs(step 2) or during the UP system
initializes itself(step 3).

  start_kernel
+---+
|
+--> ...
|
|setup_arch
+--> +---+
|
|init_IRQ
+-> +--+-+
|  |init_ISA_irqs
|  +--> +-++
| | ++
+---> +-->  | 1.init_bsp_APIC|
| ...   ++
+--->
| rest_init
+--->---+-+
|   |   kernel_init
|   +> +-+
|  |   kernel_init_freeable
|  +->  +-+
|   | smp_prepare_cpus
|   +---> ++-+
|   |  |   +---+
|   |  +-> |2.  apic_bsp_setup |
|   |  +---+
|   |
v   | smp_init
+---> +---++
  |+---+
  +--> |3.  apic_bsp_setup |
   +---+

The purpose of this patchset is Unifing these setup steps and
executing as
soon as possible as follows:

   start_kernel
---+
|
|
|
| init_IRQ
+>---++
||
||  ++
|+> | 4. init_bsp_APIC   |
|   ++
v

By the way, Also fix a bug about kexec[3].


Some doubts, need help:

1. Patchset has influence on IOMMU in enable_IR_x2apic(). Not sure
it can be in advance?

2. Due to

Commit 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on
kexec")

 ..., patchset also needs TSC and uses the "cpu_khz" in
setup_local_APIC().
And a warning[4] will be triggered when crashed/on kexec. Not sure
how to
modify?

[1]. https://lkml.org/lkml/2016/8/2/929
[2]. https://lkml.org/lkml/2016/8/1/506
[3]. https://lkml.org/lkml/2016/7/25/1118
[4]. WARN_ON(max_loops <= 0) in setup_local_APIC()

Dou Liyang (6):
  x86/apic: Replace init_bsp_APIC() with apic_virture_wire_mode_setup()
  x86/apic: Construct a framework for setuping APIC mode as soon as
possible
  x86/apic: Extract APIC timer related code from apic_bsp_setup()
  x86/apic: Make the APIC mode setup earlier for SMP-capable system
  x86/apic: Make the APIC mode setup earlier for UP system
  x86/apic: Remove the apic_virture_wire_mode_setup()

 arch/x86/include/asm/apic.h|   7 +-
 arch/x86/include/asm/io_apic.h |   2 +
 arch/x86/kernel/apic/apic.c| 218
-
 arch/x86/kernel/apic/io_apic.c |   4 +-
 arch/x86/kernel/irqinit.c  |   6 +-
 arch/x86/kernel/smpboot.c  |  68 ++---
 6 files changed, 149 insertions(+), 156 deletions(-)

--
2.5.5












  1   2   3   4   5   6   7   8   9   10   >