Re: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-04 Thread Ni, Ray
> 
> (1) For more clarity, I would suggest
> 
>   lock xadd  dword [edi], ebx
> 
> (even though "ebx" already specifies the width, yes)
> 
> Applies to the X64 version too.

OK.


> 
> 
> > +incebx  ; EBX is CpuNumber
> >
> > +; program stack
> 
> (2) This change belongs in a separate patch. The X64 version already has
> this comment, and making both versions makes sense. But touching
> anything "stack-related" in the current patch is very confusing to me. 

OK.


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71246): https://edk2.groups.io/g/devel/message/71246
Mute This Topic: https://groups.io/mt/80372087/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-04 Thread Laszlo Ersek
On 02/04/21 12:24, Zeng, Star wrote:
> Hi All,
> 
> Do you think it worth or not to also update MpFuncs.nasm in 
> Edk2\UefiCpuPkg\PiSmmCpuDxeSmm?

I haven't done any measurements, so I couldn't say...



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71242): https://edk2.groups.io/g/devel/message/71242
Mute This Topic: https://groups.io/mt/80372087/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-04 Thread Zeng, Star
Hi All,

Do you think it worth or not to also update MpFuncs.nasm in 
Edk2\UefiCpuPkg\PiSmmCpuDxeSmm?


Thanks,
Star
> -Original Message-
> From: devel@edk2.groups.io  On Behalf Of Ni, Ray
> Sent: Thursday, February 4, 2021 10:59 AM
> To: devel@edk2.groups.io
> Cc: Laszlo Ersek ; Dong, Eric ;
> Kumar, Rahul1 
> Subject: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid
> lock acquire/release
> 
> When AP firstly wakes up, MpFuncs.nasm contains below logic to assign an
> unique ApIndex to each AP according to who comes first:
> ---ASM---
> TestLock:
> xchg   [edi], eax
> cmpeax, NotVacantFlag
> jz TestLock
> 
> movecx, esi
> addecx, ApIndexLocation
> incdword [ecx]
> movebx, [ecx]
> 
> Releaselock:
> moveax, VacantFlag
> xchg   [edi], eax
> ---ASM END---
> 
> "lock inc" cannot be used to increase ApIndex because not only the global
> ApIndex should be increased, but also the result should be stored to a local
> general purpose register EBX.
> 
> This patch learns from the NASM implementation of
> InternalSyncIncrement() to use "XADD" instruction which can increase the
> global ApIndex and store the original ApIndex to EBX in one instruction.
> 
> With this patch, OVMF when running in a 255 threads QEMU spends about
> one second to wakeup all APs. Original implementation needs more than
> 10 seconds.
> 
> Signed-off-by: Ray Ni 
> Cc: Laszlo Ersek 
> Cc: Eric Dong 
> Cc: Rahul1 Kumar 
> ---
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc   |  4 
>  .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 21 +--
>  UefiCpuPkg/Library/MpInitLib/MpLib.c  |  3 +--
>  UefiCpuPkg/Library/MpInitLib/MpLib.h  |  3 +--
>  UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc|  4 
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 
>  6 files changed, 11 insertions(+), 42 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> index 244c1e72b7..5f1f0351d2 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> @@ -12,16 +12,12 @@
>  ; 
> ;---
>  -
> VacantFlagequ00h-NotVacantFlag 
> equ0ffh-
> CPU_SWITCH_STATE_IDLE equ0 CPU_SWITCH_STATE_STORED
> equ1 CPU_SWITCH_STATE_LOADED   equ2
> MP_CPU_EXCHANGE_INFO_OFFSET equ (SwitchToRealProcEnd -
> RendezvousFunnelProcStart) struc MP_CPU_EXCHANGE_INFO-  .Lock:
> resd 1   .StackStart:   resd 1   .StackSize:resd 1   
> .CFunction:
> resd 1diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 908c2eb447..b7267393db 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -122,23 +122,12 @@ SkipEnableExecuteDisable:
>   ; AP init movedi, esi-addedi,
> MP_CPU_EXCHANGE_INFO_OFFSET-moveax, NotVacantFlag--
> TestLock:-xchg   [edi], eax-cmpeax, NotVacantFlag-jz
> TestLock--movecx, esi-addecx,
> MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.ApIndex-
> incdword [ecx]-movebx, [ecx]--Releaselock:-mov
> eax,
> VacantFlag-xchg   [edi], eax+addedi,
> MP_CPU_EXCHANGE_INFO_OFFSET +
> MP_CPU_EXCHANGE_INFO.ApIndex+movebx, 1+lock xadd  [edi],
> ebx   ; EBX = ApIndex+++incebx
>   ; EBX is
> CpuNumber +; program stack movedi, esi addedi,
> MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.StackSize
> moveax, [edi]diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 8b1f7f84ba..32a3951742 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -1,7 +1,7 @@
>  /** @file   CPU MP Initialize Library common functions. -  Copyright (c) 
> 2016 -
> 2020, Intel Corporation. All rights reserved.+  Copyright (c) 2016 - 2021,
> Intel Corporation. All rights reserved.   Copyright (c) 2020, AMD Inc. All
> rights reserved.SPDX-License-Identifier: BSD-2-Clause-Patent@@ -
> 1012,7 +1012,6 @@ FillExchangeInfoData (
>IA32_CR4 Cr4;ExchangeInfo  = 
> CpuMpData-
> >MpCpuExchangeInfo;-  Exchange

Re: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-04 Thread Laszlo Ersek
On 02/04/21 03:59, Ni, Ray wrote:
> When AP firstly wakes up, MpFuncs.nasm contains below logic to assign
> an unique ApIndex to each AP according to who comes first:
> ---ASM---
> TestLock:
> xchg   [edi], eax
> cmpeax, NotVacantFlag
> jz TestLock
> 
> movecx, esi
> addecx, ApIndexLocation
> incdword [ecx]
> movebx, [ecx]
> 
> Releaselock:
> moveax, VacantFlag
> xchg   [edi], eax
> ---ASM END---
> 
> "lock inc" cannot be used to increase ApIndex because not only the
> global ApIndex should be increased, but also the result should be
> stored to a local general purpose register EBX.
> 
> This patch learns from the NASM implementation of
> InternalSyncIncrement() to use "XADD" instruction which can increase
> the global ApIndex and store the original ApIndex to EBX in one
> instruction.
> 
> With this patch, OVMF when running in a 255 threads QEMU spends about
> one second to wakeup all APs. Original implementation needs more than
> 10 seconds.
> 
> Signed-off-by: Ray Ni 
> Cc: Laszlo Ersek 
> Cc: Eric Dong 
> Cc: Rahul1 Kumar 
> ---
>  UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc   |  4 
>  .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 21 +--
>  UefiCpuPkg/Library/MpInitLib/MpLib.c  |  3 +--
>  UefiCpuPkg/Library/MpInitLib/MpLib.h  |  3 +--
>  UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc|  4 
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 
>  6 files changed, 11 insertions(+), 42 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc 
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> index 244c1e72b7..5f1f0351d2 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
> @@ -12,16 +12,12 @@
>  ;
>  
> ;---
>  
> -VacantFlagequ00h
> -NotVacantFlag equ0ffh
> -
>  CPU_SWITCH_STATE_IDLE equ0
>  CPU_SWITCH_STATE_STORED   equ1
>  CPU_SWITCH_STATE_LOADED   equ2
>  
>  MP_CPU_EXCHANGE_INFO_OFFSET equ (SwitchToRealProcEnd - 
> RendezvousFunnelProcStart)
>  struc MP_CPU_EXCHANGE_INFO
> -  .Lock: resd 1
>.StackStart:   resd 1
>.StackSize:resd 1
>.CFunction:resd 1
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 908c2eb447..b7267393db 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -122,23 +122,12 @@ SkipEnableExecuteDisable:
>  
>  ; AP init
>  movedi, esi
> -addedi, MP_CPU_EXCHANGE_INFO_OFFSET
> -moveax, NotVacantFlag
> -
> -TestLock:
> -xchg   [edi], eax
> -cmpeax, NotVacantFlag
> -jz TestLock
> -
> -movecx, esi
> -addecx, MP_CPU_EXCHANGE_INFO_OFFSET + 
> MP_CPU_EXCHANGE_INFO.ApIndex
> -incdword [ecx]
> -movebx, [ecx]
> -
> -Releaselock:
> -moveax, VacantFlag
> -xchg   [edi], eax
> +addedi, MP_CPU_EXCHANGE_INFO_OFFSET + 
> MP_CPU_EXCHANGE_INFO.ApIndex
> +movebx, 1
> +lock xadd  [edi], ebx   ; EBX = ApIndex++

(1) For more clarity, I would suggest

  lock xadd  dword [edi], ebx

(even though "ebx" already specifies the width, yes)

Applies to the X64 version too.


> +incebx  ; EBX is CpuNumber
>  
> +; program stack

(2) This change belongs in a separate patch. The X64 version already has
this comment, and making both versions makes sense. But touching
anything "stack-related" in the current patch is very confusing to me.

Thanks
Laszlo

>  movedi, esi
>  addedi, MP_CPU_EXCHANGE_INFO_OFFSET + 
> MP_CPU_EXCHANGE_INFO.StackSize
>  moveax, [edi]
> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c 
> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> index 8b1f7f84ba..32a3951742 100644
> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
> @@ -1,7 +1,7 @@
>  /** @file
>CPU MP Initialize Library common functions.
>  
> -  Copyright (c) 2016 - 2020, Intel Corporation. All rights reserved.
> +  Copyright (c) 2016 - 2021, Intel Corporation. All rights reserved.
>Copyright (c) 2020, AMD Inc. All rights reserved.
>  
>SPDX-License-Identifier: BSD-2-Clause-Patent
> @@ -1012,7 +1012,6 @@ FillExchangeInfoData (
>IA32_CR4 Cr4;
>  
>ExchangeInfo  = CpuMpData->MpCpuExchangeInfo;
> -  ExchangeInfo->Lock= 0;
>ExchangeInfo->StackStart  = CpuMpData->Buffer;
>ExchangeInfo->StackSize   = CpuMpData->CpuApStackSize;
>ExchangeInfo->BufferStart = 

[edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-03 Thread Ni, Ray
When AP firstly wakes up, MpFuncs.nasm contains below logic to assign
an unique ApIndex to each AP according to who comes first:
---ASM---
TestLock:
xchg   [edi], eax
cmpeax, NotVacantFlag
jz TestLock

movecx, esi
addecx, ApIndexLocation
incdword [ecx]
movebx, [ecx]

Releaselock:
moveax, VacantFlag
xchg   [edi], eax
---ASM END---

"lock inc" cannot be used to increase ApIndex because not only the
global ApIndex should be increased, but also the result should be
stored to a local general purpose register EBX.

This patch learns from the NASM implementation of
InternalSyncIncrement() to use "XADD" instruction which can increase
the global ApIndex and store the original ApIndex to EBX in one
instruction.

With this patch, OVMF when running in a 255 threads QEMU spends about
one second to wakeup all APs. Original implementation needs more than
10 seconds.

Signed-off-by: Ray Ni 
Cc: Laszlo Ersek 
Cc: Eric Dong 
Cc: Rahul1 Kumar 
---
 UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc   |  4 
 .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 21 +--
 UefiCpuPkg/Library/MpInitLib/MpLib.c  |  3 +--
 UefiCpuPkg/Library/MpInitLib/MpLib.h  |  3 +--
 UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc|  4 
 UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 
 6 files changed, 11 insertions(+), 42 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc 
b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
index 244c1e72b7..5f1f0351d2 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc
@@ -12,16 +12,12 @@
 ;
 
;---
 
-VacantFlagequ00h
-NotVacantFlag equ0ffh
-
 CPU_SWITCH_STATE_IDLE equ0
 CPU_SWITCH_STATE_STORED   equ1
 CPU_SWITCH_STATE_LOADED   equ2
 
 MP_CPU_EXCHANGE_INFO_OFFSET equ (SwitchToRealProcEnd - 
RendezvousFunnelProcStart)
 struc MP_CPU_EXCHANGE_INFO
-  .Lock: resd 1
   .StackStart:   resd 1
   .StackSize:resd 1
   .CFunction:resd 1
diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
index 908c2eb447..b7267393db 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
@@ -122,23 +122,12 @@ SkipEnableExecuteDisable:
 
 ; AP init
 movedi, esi
-addedi, MP_CPU_EXCHANGE_INFO_OFFSET
-moveax, NotVacantFlag
-
-TestLock:
-xchg   [edi], eax
-cmpeax, NotVacantFlag
-jz TestLock
-
-movecx, esi
-addecx, MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.ApIndex
-incdword [ecx]
-movebx, [ecx]
-
-Releaselock:
-moveax, VacantFlag
-xchg   [edi], eax
+addedi, MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.ApIndex
+movebx, 1
+lock xadd  [edi], ebx   ; EBX = ApIndex++
+incebx  ; EBX is CpuNumber
 
+; program stack
 movedi, esi
 addedi, MP_CPU_EXCHANGE_INFO_OFFSET + 
MP_CPU_EXCHANGE_INFO.StackSize
 moveax, [edi]
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c 
b/UefiCpuPkg/Library/MpInitLib/MpLib.c
index 8b1f7f84ba..32a3951742 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
@@ -1,7 +1,7 @@
 /** @file
   CPU MP Initialize Library common functions.
 
-  Copyright (c) 2016 - 2020, Intel Corporation. All rights reserved.
+  Copyright (c) 2016 - 2021, Intel Corporation. All rights reserved.
   Copyright (c) 2020, AMD Inc. All rights reserved.
 
   SPDX-License-Identifier: BSD-2-Clause-Patent
@@ -1012,7 +1012,6 @@ FillExchangeInfoData (
   IA32_CR4 Cr4;
 
   ExchangeInfo  = CpuMpData->MpCpuExchangeInfo;
-  ExchangeInfo->Lock= 0;
   ExchangeInfo->StackStart  = CpuMpData->Buffer;
   ExchangeInfo->StackSize   = CpuMpData->CpuApStackSize;
   ExchangeInfo->BufferStart = CpuMpData->WakeupBuffer;
diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h 
b/UefiCpuPkg/Library/MpInitLib/MpLib.h
index 02652eaae1..0bd60388b1 100644
--- a/UefiCpuPkg/Library/MpInitLib/MpLib.h
+++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h
@@ -1,7 +1,7 @@
 /** @file
   Common header file for MP Initialize Library.
 
-  Copyright (c) 2016 - 2020, Intel Corporation. All rights reserved.
+  Copyright (c) 2016 - 2021, Intel Corporation. All rights reserved.
   Copyright (c) 2020, AMD Inc. All rights reserved.
 
   SPDX-License-Identifier: BSD-2-Clause-Patent
@@ -190,7 +190,6 @@ typedef struct _CPU_MP_DATA  CPU_MP_DATA;
 // into this structure are used in assembly code in this