Re: [LKP] [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-03-16 Thread Aaron Lu
On Thu, Mar 16, 2017 at 09:28:55AM +0100, Thomas Gleixner wrote:
> On Thu, 16 Mar 2017, Aaron Lu wrote:
> > 
> > What is the status of the patch?
> > 
> > I still get oops during boot on a EP machine with today's Linus tree's
> > head commit 69eea5a4ab9c("Merge branch 'for-linus' of 
> > git://git.kernel.dk/linux-block")
> 
> I have it applied to tip/x86/acpi. I was not aware that this is a urgent
> issue to be forwarded to Linus ASAP.
> 
> I'll send it Linus wards in the next days.

That would be great, thanks.


Re: [LKP] [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-03-16 Thread Aaron Lu
On Thu, Mar 16, 2017 at 09:28:55AM +0100, Thomas Gleixner wrote:
> On Thu, 16 Mar 2017, Aaron Lu wrote:
> > 
> > What is the status of the patch?
> > 
> > I still get oops during boot on a EP machine with today's Linus tree's
> > head commit 69eea5a4ab9c("Merge branch 'for-linus' of 
> > git://git.kernel.dk/linux-block")
> 
> I have it applied to tip/x86/acpi. I was not aware that this is a urgent
> issue to be forwarded to Linus ASAP.
> 
> I'll send it Linus wards in the next days.

That would be great, thanks.


Re: [LKP] [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-03-16 Thread Thomas Gleixner
On Thu, 16 Mar 2017, Aaron Lu wrote:
> 
> What is the status of the patch?
> 
> I still get oops during boot on a EP machine with today's Linus tree's
> head commit 69eea5a4ab9c("Merge branch 'for-linus' of 
> git://git.kernel.dk/linux-block")

I have it applied to tip/x86/acpi. I was not aware that this is a urgent
issue to be forwarded to Linus ASAP.

I'll send it Linus wards in the next days.

Thanks,

tglx


Re: [LKP] [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-03-16 Thread Thomas Gleixner
On Thu, 16 Mar 2017, Aaron Lu wrote:
> 
> What is the status of the patch?
> 
> I still get oops during boot on a EP machine with today's Linus tree's
> head commit 69eea5a4ab9c("Merge branch 'for-linus' of 
> git://git.kernel.dk/linux-block")

I have it applied to tip/x86/acpi. I was not aware that this is a urgent
issue to be forwarded to Linus ASAP.

I'll send it Linus wards in the next days.

Thanks,

tglx


Re: [LKP] [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-03-16 Thread Aaron Lu
On Wed, Feb 22, 2017 at 09:56:51AM +0800, Dou Liyang wrote:
> Hi, Xiaolong
> 
> At 02/21/2017 03:10 PM, Ye Xiaolong wrote:
> > On 02/21, Ye Xiaolong wrote:
> > > On 02/20, Dou Liyang wrote:
> > > > Currently, We make the mapping of "cpuid <-> nodeid" fixed at the 
> > > > booting time.
> > > > It keeps consistent with the WorkQueue and avoids some bugs which may 
> > > > be caused
> > > > by the dynamic assignment.
> > > > As we know, It is implemented by the patches as follows: 2532fc318d, 
> > > > f7c28833c2,
> > > > 8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply 
> > > > speaking:
> > > > 
> > > > Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
> > > > We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
> > > > get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
> > > > So, we get the mapping of
> > > > *Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
> > > > 
> > > > Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
> > > > The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
> > > > each entities. we just use it directly.
> > > > 
> > > > So, at last we get the maaping of *Node ID <-> Logical CPU ID* 
> > > > according to
> > > > step1 and step2:
> > > > *Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU 
> > > > ID*
> > > > 
> > > > But, The ACPI table is unreliable and it is very risky that we use the 
> > > > entity
> > > > which isn't related to a physical device at booting time. Here has 
> > > > already two
> > > > bugs we found.
> > > > 1. Duplicated Processor IDs in DSDT.
> > > > It has been fixed by commit 8e089eaa19, fd74da217d.
> > > > 2. The _PXM in DSDT is inconsistent with the one in MADT.
> > > > It may cause the bug, which is shown in:
> > > > https://lkml.org/lkml/2017/2/12/200
> > > > There may be more later. We shouldn't just only fix them everytime, we 
> > > > should
> > > > solve this problem from the source to avoid such problems happend again 
> > > > and
> > > > again.
> > > > 
> > > > Now, a simple and easy way is found, we revert our patches. Do the Step 
> > > > 2
> > > > at hot-plug time, not at booting time where we did some useless work.
> > > > 
> > > > It also can make the mapping of "cpuid <-> nodeid" fixed and avoid 
> > > > excessive
> > > > use of the ACPI table.
> > > > 
> > > > We have tested them in our box: Fujitsu PQ2000 with 2 nodes for 
> > > > hot-plug.
> > > > To Xiaolong:
> > > > Please help me to test it in the special machine.
> > > 
> > > Got it, I'll queue the tests on the previous machine and let you know the 
> > > result
> > > once I get it.
> > 
> > Previous kernel panic and incomplete run issue (described in [1]) in 0day
> > system is gone with this series.
> > 
> 
> Thanks very much, I am glad to hear that!
> 
> > Tested-by: Xiaolong Ye 
> > 
> 
> I will add it in my next version.

What is the status of the patch?

I still get oops during boot on a EP machine with today's Linus tree's
head commit 69eea5a4ab9c("Merge branch 'for-linus' of 
git://git.kernel.dk/linux-block")

The first oops call trace:

... ...
[8.599850] pci_bus :80: on NUMA node 2
[8.605611] ACPI: Enabled 4 GPEs in block 00 to 3F
[8.645521] BUG: unable to handle kernel paging request at 0001f768
[8.653585] IP: get_partial_node+0x2c/0x1f0
[8.659302] PGD 0 
[8.659303] 
[8.663724] Oops:  [#1] SMP
[8.667499] Modules linked in:
[8.671181] CPU: 60 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1 #1
[8.678554] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
SE5C610.86B.01.01.0008.021120151325 02/11/2015
[8.690672] task: 88202bc1 task.stack: c902c000
[8.697542] RIP: 0010:get_partial_node+0x2c/0x1f0
[8.703844] RSP: :c902fb20 EFLAGS: 00010006
[8.709944] RAX: 0002 RBX:  RCX: 014080c0
[8.718184] RDX: 88203281f740 RSI: 0001f760 RDI: 88202e548280
[8.726422] RBP: c902fbc0 R08:  R09: 000100220022
[8.734661] R10: ea0080a99600 R11:  R12: 88202e548280
[8.742896] R13: ea0080a991c0 R14: 88202e548280 R15: 88203281f730
[8.751144] FS:  () GS:88203280() 
knlGS:
[8.760633] CS:  0010 DS:  ES:  CR0: 80050033
[8.767312] CR2: 0001f768 CR3: 01e09000 CR4: 001406e0
[8.775550] Call Trace:
[8.778548]  ? acpi_os_release_lock+0xe/0x10
[8.783590]  ? acpi_ut_update_ref_count+0x5a/0x6b3
[8.789210]  ___slab_alloc+0x28a/0x4b0
[8.793660]  ? __kernfs_new_node+0x41/0xc0
[8.798505]  ? __kernfs_new_node+0x41/0xc0
[8.803348]  __slab_alloc+0x20/0x40
[8.807501]  kmem_cache_alloc+0x17f/0x1c0
[8.812231]  __kernfs_new_node+0x41/0xc0
[8.816882]  

Re: [LKP] [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-03-16 Thread Aaron Lu
On Wed, Feb 22, 2017 at 09:56:51AM +0800, Dou Liyang wrote:
> Hi, Xiaolong
> 
> At 02/21/2017 03:10 PM, Ye Xiaolong wrote:
> > On 02/21, Ye Xiaolong wrote:
> > > On 02/20, Dou Liyang wrote:
> > > > Currently, We make the mapping of "cpuid <-> nodeid" fixed at the 
> > > > booting time.
> > > > It keeps consistent with the WorkQueue and avoids some bugs which may 
> > > > be caused
> > > > by the dynamic assignment.
> > > > As we know, It is implemented by the patches as follows: 2532fc318d, 
> > > > f7c28833c2,
> > > > 8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply 
> > > > speaking:
> > > > 
> > > > Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
> > > > We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
> > > > get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
> > > > So, we get the mapping of
> > > > *Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
> > > > 
> > > > Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
> > > > The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
> > > > each entities. we just use it directly.
> > > > 
> > > > So, at last we get the maaping of *Node ID <-> Logical CPU ID* 
> > > > according to
> > > > step1 and step2:
> > > > *Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU 
> > > > ID*
> > > > 
> > > > But, The ACPI table is unreliable and it is very risky that we use the 
> > > > entity
> > > > which isn't related to a physical device at booting time. Here has 
> > > > already two
> > > > bugs we found.
> > > > 1. Duplicated Processor IDs in DSDT.
> > > > It has been fixed by commit 8e089eaa19, fd74da217d.
> > > > 2. The _PXM in DSDT is inconsistent with the one in MADT.
> > > > It may cause the bug, which is shown in:
> > > > https://lkml.org/lkml/2017/2/12/200
> > > > There may be more later. We shouldn't just only fix them everytime, we 
> > > > should
> > > > solve this problem from the source to avoid such problems happend again 
> > > > and
> > > > again.
> > > > 
> > > > Now, a simple and easy way is found, we revert our patches. Do the Step 
> > > > 2
> > > > at hot-plug time, not at booting time where we did some useless work.
> > > > 
> > > > It also can make the mapping of "cpuid <-> nodeid" fixed and avoid 
> > > > excessive
> > > > use of the ACPI table.
> > > > 
> > > > We have tested them in our box: Fujitsu PQ2000 with 2 nodes for 
> > > > hot-plug.
> > > > To Xiaolong:
> > > > Please help me to test it in the special machine.
> > > 
> > > Got it, I'll queue the tests on the previous machine and let you know the 
> > > result
> > > once I get it.
> > 
> > Previous kernel panic and incomplete run issue (described in [1]) in 0day
> > system is gone with this series.
> > 
> 
> Thanks very much, I am glad to hear that!
> 
> > Tested-by: Xiaolong Ye 
> > 
> 
> I will add it in my next version.

What is the status of the patch?

I still get oops during boot on a EP machine with today's Linus tree's
head commit 69eea5a4ab9c("Merge branch 'for-linus' of 
git://git.kernel.dk/linux-block")

The first oops call trace:

... ...
[8.599850] pci_bus :80: on NUMA node 2
[8.605611] ACPI: Enabled 4 GPEs in block 00 to 3F
[8.645521] BUG: unable to handle kernel paging request at 0001f768
[8.653585] IP: get_partial_node+0x2c/0x1f0
[8.659302] PGD 0 
[8.659303] 
[8.663724] Oops:  [#1] SMP
[8.667499] Modules linked in:
[8.671181] CPU: 60 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1 #1
[8.678554] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
SE5C610.86B.01.01.0008.021120151325 02/11/2015
[8.690672] task: 88202bc1 task.stack: c902c000
[8.697542] RIP: 0010:get_partial_node+0x2c/0x1f0
[8.703844] RSP: :c902fb20 EFLAGS: 00010006
[8.709944] RAX: 0002 RBX:  RCX: 014080c0
[8.718184] RDX: 88203281f740 RSI: 0001f760 RDI: 88202e548280
[8.726422] RBP: c902fbc0 R08:  R09: 000100220022
[8.734661] R10: ea0080a99600 R11:  R12: 88202e548280
[8.742896] R13: ea0080a991c0 R14: 88202e548280 R15: 88203281f730
[8.751144] FS:  () GS:88203280() 
knlGS:
[8.760633] CS:  0010 DS:  ES:  CR0: 80050033
[8.767312] CR2: 0001f768 CR3: 01e09000 CR4: 001406e0
[8.775550] Call Trace:
[8.778548]  ? acpi_os_release_lock+0xe/0x10
[8.783590]  ? acpi_ut_update_ref_count+0x5a/0x6b3
[8.789210]  ___slab_alloc+0x28a/0x4b0
[8.793660]  ? __kernfs_new_node+0x41/0xc0
[8.798505]  ? __kernfs_new_node+0x41/0xc0
[8.803348]  __slab_alloc+0x20/0x40
[8.807501]  kmem_cache_alloc+0x17f/0x1c0
[8.812231]  __kernfs_new_node+0x41/0xc0
[8.816882]  

Re: [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-21 Thread Dou Liyang

Hi, Xiaolong

At 02/21/2017 03:10 PM, Ye Xiaolong wrote:

On 02/21, Ye Xiaolong wrote:

On 02/20, Dou Liyang wrote:

Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.
As we know, It is implemented by the patches as follows: 2532fc318d, f7c28833c2,
8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply speaking:

Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
So, we get the mapping of
*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
each entities. we just use it directly.

So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
step1 and step2:
*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time. Here has already two
bugs we found.
1. Duplicated Processor IDs in DSDT.
It has been fixed by commit 8e089eaa19, fd74da217d.
2. The _PXM in DSDT is inconsistent with the one in MADT.
It may cause the bug, which is shown in:
https://lkml.org/lkml/2017/2/12/200
There may be more later. We shouldn't just only fix them everytime, we should
solve this problem from the source to avoid such problems happend again and
again.

Now, a simple and easy way is found, we revert our patches. Do the Step 2
at hot-plug time, not at booting time where we did some useless work.

It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
To Xiaolong:
Please help me to test it in the special machine.


Got it, I'll queue the tests on the previous machine and let you know the result
once I get it.


Previous kernel panic and incomplete run issue (described in [1]) in 0day
system is gone with this series.



Thanks very much, I am glad to hear that!


Tested-by: Xiaolong Ye 



I will add it in my next version.

Thanks,
Liyang


Here is the comparison:

$ compare -at dc6db24d2476cd09c0ecf2b8d80313539f737a89 
2e61bac54fad4c018afd23c118bce2399e504020
tests: 1
testcase/path_params/tbox_group/run: 
vm-scalability/300-never-never-1-1-swap-w-rand-performance/lkp-hsw-ep2

Here dc6db24d24 is previous first bad commit, 2e61bac54 is the head commit of 
your series
applied on top of latest tip of linus/master c945d0227d ("Merge branch 
'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

dc6db24d2476cd09  2e61bac54fad4c018afd23c118
  --
   fail:runs  %reproductionfail:runs
   | | |
   :12  12%   1:8 last_state.OOM
   :12  12%   1:8 
dmesg.page_allocation_failure:order:#,mode:#(GFP_USER|GFP_DMA32|__GFP_ZERO)
   :12  12%   1:8 dmesg.Mem-Info
 12:12-100%:8 dmesg.BUG:unable_to_handle_kernel
 12:12-100%:8 dmesg.Oops
 12:12-100%:8 dmesg.RIP:get_partial_node
  9:12 -75%:8 dmesg.RIP:_raw_spin_lock_irqsave
  3:12 -25%:8 
dmesg.general_protection_fault:#[##]SMP
  3:12 -25%:8 
dmesg.RIP:native_queued_spin_lock_slowpath
  3:12 -25%:8 
dmesg.Kernel_panic-not_syncing:Hard_LOCKUP
  2:12 -17%:8 dmesg.RIP:load_balance
  2:12 -17%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
  1:12  -8%:8 dmesg.RIP:resched_curr
  1:12  -8%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception
  5:12 -42%:8 
dmesg.WARNING:at_include/linux/uaccess.h:#__probe_kernel_read
  1:12  -8%:8 
dmesg.WARNING:at_lib/list_debug.c:#__list_add


[1] https://lkml.org/lkml/2017/2/12/200

Thanks,
Xiaolong



Thanks,
Xiaolong


Change log:
 v1 -> v2: 1. fix some comments.
   2. add the verification of duplicate processor id.

Dou Liyang (4):
 Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
 Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
 acpi: Fix the check handle in case of declaring processors using the
   Device operator
 acpi: Move the verification of duplicate proc_id from booting time to
   hot-plug time


Re: [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-21 Thread Dou Liyang

Hi, Xiaolong

At 02/21/2017 03:10 PM, Ye Xiaolong wrote:

On 02/21, Ye Xiaolong wrote:

On 02/20, Dou Liyang wrote:

Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.
As we know, It is implemented by the patches as follows: 2532fc318d, f7c28833c2,
8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply speaking:

Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
So, we get the mapping of
*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
each entities. we just use it directly.

So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
step1 and step2:
*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time. Here has already two
bugs we found.
1. Duplicated Processor IDs in DSDT.
It has been fixed by commit 8e089eaa19, fd74da217d.
2. The _PXM in DSDT is inconsistent with the one in MADT.
It may cause the bug, which is shown in:
https://lkml.org/lkml/2017/2/12/200
There may be more later. We shouldn't just only fix them everytime, we should
solve this problem from the source to avoid such problems happend again and
again.

Now, a simple and easy way is found, we revert our patches. Do the Step 2
at hot-plug time, not at booting time where we did some useless work.

It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
To Xiaolong:
Please help me to test it in the special machine.


Got it, I'll queue the tests on the previous machine and let you know the result
once I get it.


Previous kernel panic and incomplete run issue (described in [1]) in 0day
system is gone with this series.



Thanks very much, I am glad to hear that!


Tested-by: Xiaolong Ye 



I will add it in my next version.

Thanks,
Liyang


Here is the comparison:

$ compare -at dc6db24d2476cd09c0ecf2b8d80313539f737a89 
2e61bac54fad4c018afd23c118bce2399e504020
tests: 1
testcase/path_params/tbox_group/run: 
vm-scalability/300-never-never-1-1-swap-w-rand-performance/lkp-hsw-ep2

Here dc6db24d24 is previous first bad commit, 2e61bac54 is the head commit of 
your series
applied on top of latest tip of linus/master c945d0227d ("Merge branch 
'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

dc6db24d2476cd09  2e61bac54fad4c018afd23c118
  --
   fail:runs  %reproductionfail:runs
   | | |
   :12  12%   1:8 last_state.OOM
   :12  12%   1:8 
dmesg.page_allocation_failure:order:#,mode:#(GFP_USER|GFP_DMA32|__GFP_ZERO)
   :12  12%   1:8 dmesg.Mem-Info
 12:12-100%:8 dmesg.BUG:unable_to_handle_kernel
 12:12-100%:8 dmesg.Oops
 12:12-100%:8 dmesg.RIP:get_partial_node
  9:12 -75%:8 dmesg.RIP:_raw_spin_lock_irqsave
  3:12 -25%:8 
dmesg.general_protection_fault:#[##]SMP
  3:12 -25%:8 
dmesg.RIP:native_queued_spin_lock_slowpath
  3:12 -25%:8 
dmesg.Kernel_panic-not_syncing:Hard_LOCKUP
  2:12 -17%:8 dmesg.RIP:load_balance
  2:12 -17%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
  1:12  -8%:8 dmesg.RIP:resched_curr
  1:12  -8%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception
  5:12 -42%:8 
dmesg.WARNING:at_include/linux/uaccess.h:#__probe_kernel_read
  1:12  -8%:8 
dmesg.WARNING:at_lib/list_debug.c:#__list_add


[1] https://lkml.org/lkml/2017/2/12/200

Thanks,
Xiaolong



Thanks,
Xiaolong


Change log:
 v1 -> v2: 1. fix some comments.
   2. add the verification of duplicate processor id.

Dou Liyang (4):
 Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
 Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
 acpi: Fix the check handle in case of declaring processors using the
   Device operator
 acpi: Move the verification of duplicate proc_id from booting time to
   hot-plug time

arch/x86/kernel/acpi/boot.c   |   2 +-

Re: [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-20 Thread Ye Xiaolong
On 02/21, Ye Xiaolong wrote:
>On 02/20, Dou Liyang wrote:
>>Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting 
>>time.
>>It keeps consistent with the WorkQueue and avoids some bugs which may be 
>>caused
>>by the dynamic assignment.
>>As we know, It is implemented by the patches as follows: 2532fc318d, 
>>f7c28833c2,
>>8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply 
>>speaking:
>>
>>Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
>>We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
>>get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
>>So, we get the mapping of
>>*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>>
>>Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
>>The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
>>each entities. we just use it directly.
>>
>>So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
>>step1 and step2:
>>*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>>
>>But, The ACPI table is unreliable and it is very risky that we use the entity
>>which isn't related to a physical device at booting time. Here has already two
>>bugs we found.
>>1. Duplicated Processor IDs in DSDT.
>>  It has been fixed by commit 8e089eaa19, fd74da217d.
>>2. The _PXM in DSDT is inconsistent with the one in MADT.
>>  It may cause the bug, which is shown in:
>>  https://lkml.org/lkml/2017/2/12/200
>>There may be more later. We shouldn't just only fix them everytime, we should
>>solve this problem from the source to avoid such problems happend again and
>>again.
>>
>>Now, a simple and easy way is found, we revert our patches. Do the Step 2 
>>at hot-plug time, not at booting time where we did some useless work.
>>
>>It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
>>use of the ACPI table.
>>
>>We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
>>To Xiaolong: 
>>  Please help me to test it in the special machine.
>
>Got it, I'll queue the tests on the previous machine and let you know the 
>result
>once I get it.

Previous kernel panic and incomplete run issue (described in [1]) in 0day
system is gone with this series.

Tested-by: Xiaolong Ye 

Here is the comparison:

$ compare -at dc6db24d2476cd09c0ecf2b8d80313539f737a89 
2e61bac54fad4c018afd23c118bce2399e504020
tests: 1
testcase/path_params/tbox_group/run: 
vm-scalability/300-never-never-1-1-swap-w-rand-performance/lkp-hsw-ep2

Here dc6db24d24 is previous first bad commit, 2e61bac54 is the head commit of 
your series
applied on top of latest tip of linus/master c945d0227d ("Merge branch 
'x86-platform-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

dc6db24d2476cd09  2e61bac54fad4c018afd23c118  
  --  
   fail:runs  %reproductionfail:runs
   | | |
   :12  12%   1:8 last_state.OOM
   :12  12%   1:8 
dmesg.page_allocation_failure:order:#,mode:#(GFP_USER|GFP_DMA32|__GFP_ZERO)
   :12  12%   1:8 dmesg.Mem-Info
 12:12-100%:8 dmesg.BUG:unable_to_handle_kernel
 12:12-100%:8 dmesg.Oops
 12:12-100%:8 dmesg.RIP:get_partial_node
  9:12 -75%:8 dmesg.RIP:_raw_spin_lock_irqsave
  3:12 -25%:8 
dmesg.general_protection_fault:#[##]SMP
  3:12 -25%:8 
dmesg.RIP:native_queued_spin_lock_slowpath
  3:12 -25%:8 
dmesg.Kernel_panic-not_syncing:Hard_LOCKUP
  2:12 -17%:8 dmesg.RIP:load_balance
  2:12 -17%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
  1:12  -8%:8 dmesg.RIP:resched_curr
  1:12  -8%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception
  5:12 -42%:8 
dmesg.WARNING:at_include/linux/uaccess.h:#__probe_kernel_read
  1:12  -8%:8 
dmesg.WARNING:at_lib/list_debug.c:#__list_add


[1] https://lkml.org/lkml/2017/2/12/200

Thanks,
Xiaolong

>
>Thanks,
>Xiaolong
>>
>>Change log:
>>  v1 -> v2: 1. fix some comments.
>>2. add the verification of duplicate processor id.
>>
>>Dou Liyang (4):
>>  Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
>>  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
>>  acpi: Fix the check handle in case of declaring processors using the
>>Device operator
>>  acpi: Move the verification of duplicate proc_id from booting time to
>>hot-plug time
>>
>> arch/x86/kernel/acpi/boot.c   |   2 +-

Re: [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-20 Thread Ye Xiaolong
On 02/21, Ye Xiaolong wrote:
>On 02/20, Dou Liyang wrote:
>>Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting 
>>time.
>>It keeps consistent with the WorkQueue and avoids some bugs which may be 
>>caused
>>by the dynamic assignment.
>>As we know, It is implemented by the patches as follows: 2532fc318d, 
>>f7c28833c2,
>>8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply 
>>speaking:
>>
>>Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
>>We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
>>get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
>>So, we get the mapping of
>>*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>>
>>Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
>>The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
>>each entities. we just use it directly.
>>
>>So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
>>step1 and step2:
>>*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>>
>>But, The ACPI table is unreliable and it is very risky that we use the entity
>>which isn't related to a physical device at booting time. Here has already two
>>bugs we found.
>>1. Duplicated Processor IDs in DSDT.
>>  It has been fixed by commit 8e089eaa19, fd74da217d.
>>2. The _PXM in DSDT is inconsistent with the one in MADT.
>>  It may cause the bug, which is shown in:
>>  https://lkml.org/lkml/2017/2/12/200
>>There may be more later. We shouldn't just only fix them everytime, we should
>>solve this problem from the source to avoid such problems happend again and
>>again.
>>
>>Now, a simple and easy way is found, we revert our patches. Do the Step 2 
>>at hot-plug time, not at booting time where we did some useless work.
>>
>>It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
>>use of the ACPI table.
>>
>>We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
>>To Xiaolong: 
>>  Please help me to test it in the special machine.
>
>Got it, I'll queue the tests on the previous machine and let you know the 
>result
>once I get it.

Previous kernel panic and incomplete run issue (described in [1]) in 0day
system is gone with this series.

Tested-by: Xiaolong Ye 

Here is the comparison:

$ compare -at dc6db24d2476cd09c0ecf2b8d80313539f737a89 
2e61bac54fad4c018afd23c118bce2399e504020
tests: 1
testcase/path_params/tbox_group/run: 
vm-scalability/300-never-never-1-1-swap-w-rand-performance/lkp-hsw-ep2

Here dc6db24d24 is previous first bad commit, 2e61bac54 is the head commit of 
your series
applied on top of latest tip of linus/master c945d0227d ("Merge branch 
'x86-platform-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

dc6db24d2476cd09  2e61bac54fad4c018afd23c118  
  --  
   fail:runs  %reproductionfail:runs
   | | |
   :12  12%   1:8 last_state.OOM
   :12  12%   1:8 
dmesg.page_allocation_failure:order:#,mode:#(GFP_USER|GFP_DMA32|__GFP_ZERO)
   :12  12%   1:8 dmesg.Mem-Info
 12:12-100%:8 dmesg.BUG:unable_to_handle_kernel
 12:12-100%:8 dmesg.Oops
 12:12-100%:8 dmesg.RIP:get_partial_node
  9:12 -75%:8 dmesg.RIP:_raw_spin_lock_irqsave
  3:12 -25%:8 
dmesg.general_protection_fault:#[##]SMP
  3:12 -25%:8 
dmesg.RIP:native_queued_spin_lock_slowpath
  3:12 -25%:8 
dmesg.Kernel_panic-not_syncing:Hard_LOCKUP
  2:12 -17%:8 dmesg.RIP:load_balance
  2:12 -17%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
  1:12  -8%:8 dmesg.RIP:resched_curr
  1:12  -8%:8 
dmesg.Kernel_panic-not_syncing:Fatal_exception
  5:12 -42%:8 
dmesg.WARNING:at_include/linux/uaccess.h:#__probe_kernel_read
  1:12  -8%:8 
dmesg.WARNING:at_lib/list_debug.c:#__list_add


[1] https://lkml.org/lkml/2017/2/12/200

Thanks,
Xiaolong

>
>Thanks,
>Xiaolong
>>
>>Change log:
>>  v1 -> v2: 1. fix some comments.
>>2. add the verification of duplicate processor id.
>>
>>Dou Liyang (4):
>>  Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
>>  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
>>  acpi: Fix the check handle in case of declaring processors using the
>>Device operator
>>  acpi: Move the verification of duplicate proc_id from booting time to
>>hot-plug time
>>
>> arch/x86/kernel/acpi/boot.c   |   2 +-
>> 

Re: [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-20 Thread Ye Xiaolong
On 02/20, Dou Liyang wrote:
>Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
>It keeps consistent with the WorkQueue and avoids some bugs which may be caused
>by the dynamic assignment.
>As we know, It is implemented by the patches as follows: 2532fc318d, 
>f7c28833c2,
>8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply 
>speaking:
>
>Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
>We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
>get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
>So, we get the mapping of
>*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>
>Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
>The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
>each entities. we just use it directly.
>
>So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
>step1 and step2:
>*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>
>But, The ACPI table is unreliable and it is very risky that we use the entity
>which isn't related to a physical device at booting time. Here has already two
>bugs we found.
>1. Duplicated Processor IDs in DSDT.
>   It has been fixed by commit 8e089eaa19, fd74da217d.
>2. The _PXM in DSDT is inconsistent with the one in MADT.
>   It may cause the bug, which is shown in:
>   https://lkml.org/lkml/2017/2/12/200
>There may be more later. We shouldn't just only fix them everytime, we should
>solve this problem from the source to avoid such problems happend again and
>again.
>
>Now, a simple and easy way is found, we revert our patches. Do the Step 2 
>at hot-plug time, not at booting time where we did some useless work.
>
>It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
>use of the ACPI table.
>
>We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
>To Xiaolong: 
>   Please help me to test it in the special machine.

Got it, I'll queue the tests on the previous machine and let you know the result
once I get it.

Thanks,
Xiaolong
>
>Change log:
>  v1 -> v2: 1. fix some comments.
>2. add the verification of duplicate processor id.
>
>Dou Liyang (4):
>  Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
>  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
>  acpi: Fix the check handle in case of declaring processors using the
>Device operator
>  acpi: Move the verification of duplicate proc_id from booting time to
>hot-plug time
>
> arch/x86/kernel/acpi/boot.c   |   2 +-
> drivers/acpi/acpi_processor.c |  50 +++-
> drivers/acpi/bus.c|   1 -
> drivers/acpi/processor_core.c | 133 +++---
> include/linux/acpi.h  |   5 +-
> 5 files changed, 59 insertions(+), 132 deletions(-)
>
>-- 
>2.5.5
>
>
>


Re: [PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-20 Thread Ye Xiaolong
On 02/20, Dou Liyang wrote:
>Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
>It keeps consistent with the WorkQueue and avoids some bugs which may be caused
>by the dynamic assignment.
>As we know, It is implemented by the patches as follows: 2532fc318d, 
>f7c28833c2,
>8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply 
>speaking:
>
>Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
>We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
>get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
>So, we get the mapping of
>*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>
>Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
>The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
>each entities. we just use it directly.
>
>So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
>step1 and step2:
>*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>
>But, The ACPI table is unreliable and it is very risky that we use the entity
>which isn't related to a physical device at booting time. Here has already two
>bugs we found.
>1. Duplicated Processor IDs in DSDT.
>   It has been fixed by commit 8e089eaa19, fd74da217d.
>2. The _PXM in DSDT is inconsistent with the one in MADT.
>   It may cause the bug, which is shown in:
>   https://lkml.org/lkml/2017/2/12/200
>There may be more later. We shouldn't just only fix them everytime, we should
>solve this problem from the source to avoid such problems happend again and
>again.
>
>Now, a simple and easy way is found, we revert our patches. Do the Step 2 
>at hot-plug time, not at booting time where we did some useless work.
>
>It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
>use of the ACPI table.
>
>We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
>To Xiaolong: 
>   Please help me to test it in the special machine.

Got it, I'll queue the tests on the previous machine and let you know the result
once I get it.

Thanks,
Xiaolong
>
>Change log:
>  v1 -> v2: 1. fix some comments.
>2. add the verification of duplicate processor id.
>
>Dou Liyang (4):
>  Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
>  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
>  acpi: Fix the check handle in case of declaring processors using the
>Device operator
>  acpi: Move the verification of duplicate proc_id from booting time to
>hot-plug time
>
> arch/x86/kernel/acpi/boot.c   |   2 +-
> drivers/acpi/acpi_processor.c |  50 +++-
> drivers/acpi/bus.c|   1 -
> drivers/acpi/processor_core.c | 133 +++---
> include/linux/acpi.h  |   5 +-
> 5 files changed, 59 insertions(+), 132 deletions(-)
>
>-- 
>2.5.5
>
>
>


[PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-20 Thread Dou Liyang
Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.
As we know, It is implemented by the patches as follows: 2532fc318d, f7c28833c2,
8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply speaking:

Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
So, we get the mapping of
*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
each entities. we just use it directly.

So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
step1 and step2:
*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time. Here has already two
bugs we found.
1. Duplicated Processor IDs in DSDT.
It has been fixed by commit 8e089eaa19, fd74da217d.
2. The _PXM in DSDT is inconsistent with the one in MADT.
It may cause the bug, which is shown in:
https://lkml.org/lkml/2017/2/12/200
There may be more later. We shouldn't just only fix them everytime, we should
solve this problem from the source to avoid such problems happend again and
again.

Now, a simple and easy way is found, we revert our patches. Do the Step 2 
at hot-plug time, not at booting time where we did some useless work.

It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
To Xiaolong: 
Please help me to test it in the special machine.

Change log:
  v1 -> v2: 1. fix some comments.
2. add the verification of duplicate processor id.

Dou Liyang (4):
  Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
  acpi: Fix the check handle in case of declaring processors using the
Device operator
  acpi: Move the verification of duplicate proc_id from booting time to
hot-plug time

 arch/x86/kernel/acpi/boot.c   |   2 +-
 drivers/acpi/acpi_processor.c |  50 +++-
 drivers/acpi/bus.c|   1 -
 drivers/acpi/processor_core.c | 133 +++---
 include/linux/acpi.h  |   5 +-
 5 files changed, 59 insertions(+), 132 deletions(-)

-- 
2.5.5





[PATCH v2 0/4] Revert works for the mapping of cpuid <-> nodeid

2017-02-20 Thread Dou Liyang
Currently, We make the mapping of "cpuid <-> nodeid" fixed at the booting time.
It keeps consistent with the WorkQueue and avoids some bugs which may be caused
by the dynamic assignment.
As we know, It is implemented by the patches as follows: 2532fc318d, f7c28833c2,
8f54969dc8, 8ad893faf2, dc6db24d24, which depend on ACPI table. Simply speaking:

Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
So, we get the mapping of
*Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
each entities. we just use it directly.

So, at last we get the maaping of *Node ID <-> Logical CPU ID* according to
step1 and step2:
*Node ID(_PXM) <-> Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*

But, The ACPI table is unreliable and it is very risky that we use the entity
which isn't related to a physical device at booting time. Here has already two
bugs we found.
1. Duplicated Processor IDs in DSDT.
It has been fixed by commit 8e089eaa19, fd74da217d.
2. The _PXM in DSDT is inconsistent with the one in MADT.
It may cause the bug, which is shown in:
https://lkml.org/lkml/2017/2/12/200
There may be more later. We shouldn't just only fix them everytime, we should
solve this problem from the source to avoid such problems happend again and
again.

Now, a simple and easy way is found, we revert our patches. Do the Step 2 
at hot-plug time, not at booting time where we did some useless work.

It also can make the mapping of "cpuid <-> nodeid" fixed and avoid excessive
use of the ACPI table.

We have tested them in our box: Fujitsu PQ2000 with 2 nodes for hot-plug.
To Xiaolong: 
Please help me to test it in the special machine.

Change log:
  v1 -> v2: 1. fix some comments.
2. add the verification of duplicate processor id.

Dou Liyang (4):
  Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
  acpi: Fix the check handle in case of declaring processors using the
Device operator
  acpi: Move the verification of duplicate proc_id from booting time to
hot-plug time

 arch/x86/kernel/acpi/boot.c   |   2 +-
 drivers/acpi/acpi_processor.c |  50 +++-
 drivers/acpi/bus.c|   1 -
 drivers/acpi/processor_core.c | 133 +++---
 include/linux/acpi.h  |   5 +-
 5 files changed, 59 insertions(+), 132 deletions(-)

-- 
2.5.5