Re: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release
> > (1) For more clarity, I would suggest > > lock xadd dword [edi], ebx > > (even though "ebx" already specifies the width, yes) > > Applies to the X64 version too. OK. > > > > +incebx ; EBX is CpuNumber > > > > +; program stack > > (2) This change belongs in a separate patch. The X64 version already has > this comment, and making both versions makes sense. But touching > anything "stack-related" in the current patch is very confusing to me. OK. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#71246): https://edk2.groups.io/g/devel/message/71246 Mute This Topic: https://groups.io/mt/80372087/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release
On 02/04/21 12:24, Zeng, Star wrote: > Hi All, > > Do you think it worth or not to also update MpFuncs.nasm in > Edk2\UefiCpuPkg\PiSmmCpuDxeSmm? I haven't done any measurements, so I couldn't say... -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#71242): https://edk2.groups.io/g/devel/message/71242 Mute This Topic: https://groups.io/mt/80372087/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release
Hi All, Do you think it worth or not to also update MpFuncs.nasm in Edk2\UefiCpuPkg\PiSmmCpuDxeSmm? Thanks, Star > -Original Message- > From: devel@edk2.groups.io On Behalf Of Ni, Ray > Sent: Thursday, February 4, 2021 10:59 AM > To: devel@edk2.groups.io > Cc: Laszlo Ersek ; Dong, Eric ; > Kumar, Rahul1 > Subject: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid > lock acquire/release > > When AP firstly wakes up, MpFuncs.nasm contains below logic to assign an > unique ApIndex to each AP according to who comes first: > ---ASM--- > TestLock: > xchg [edi], eax > cmpeax, NotVacantFlag > jz TestLock > > movecx, esi > addecx, ApIndexLocation > incdword [ecx] > movebx, [ecx] > > Releaselock: > moveax, VacantFlag > xchg [edi], eax > ---ASM END--- > > "lock inc" cannot be used to increase ApIndex because not only the global > ApIndex should be increased, but also the result should be stored to a local > general purpose register EBX. > > This patch learns from the NASM implementation of > InternalSyncIncrement() to use "XADD" instruction which can increase the > global ApIndex and store the original ApIndex to EBX in one instruction. > > With this patch, OVMF when running in a 255 threads QEMU spends about > one second to wakeup all APs. Original implementation needs more than > 10 seconds. > > Signed-off-by: Ray Ni > Cc: Laszlo Ersek > Cc: Eric Dong > Cc: Rahul1 Kumar > --- > UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc | 4 > .../Library/MpInitLib/Ia32/MpFuncs.nasm | 21 +-- > UefiCpuPkg/Library/MpInitLib/MpLib.c | 3 +-- > UefiCpuPkg/Library/MpInitLib/MpLib.h | 3 +-- > UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc| 4 > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 > 6 files changed, 11 insertions(+), 42 deletions(-) > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > index 244c1e72b7..5f1f0351d2 100644 > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > @@ -12,16 +12,12 @@ > ; > ;--- > - > VacantFlagequ00h-NotVacantFlag > equ0ffh- > CPU_SWITCH_STATE_IDLE equ0 CPU_SWITCH_STATE_STORED > equ1 CPU_SWITCH_STATE_LOADED equ2 > MP_CPU_EXCHANGE_INFO_OFFSET equ (SwitchToRealProcEnd - > RendezvousFunnelProcStart) struc MP_CPU_EXCHANGE_INFO- .Lock: > resd 1 .StackStart: resd 1 .StackSize:resd 1 > .CFunction: > resd 1diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > index 908c2eb447..b7267393db 100644 > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > @@ -122,23 +122,12 @@ SkipEnableExecuteDisable: > ; AP init movedi, esi-addedi, > MP_CPU_EXCHANGE_INFO_OFFSET-moveax, NotVacantFlag-- > TestLock:-xchg [edi], eax-cmpeax, NotVacantFlag-jz > TestLock--movecx, esi-addecx, > MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.ApIndex- > incdword [ecx]-movebx, [ecx]--Releaselock:-mov > eax, > VacantFlag-xchg [edi], eax+addedi, > MP_CPU_EXCHANGE_INFO_OFFSET + > MP_CPU_EXCHANGE_INFO.ApIndex+movebx, 1+lock xadd [edi], > ebx ; EBX = ApIndex+++incebx > ; EBX is > CpuNumber +; program stack movedi, esi addedi, > MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.StackSize > moveax, [edi]diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c > b/UefiCpuPkg/Library/MpInitLib/MpLib.c > index 8b1f7f84ba..32a3951742 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > @@ -1,7 +1,7 @@ > /** @file CPU MP Initialize Library common functions. - Copyright (c) > 2016 - > 2020, Intel Corporation. All rights reserved.+ Copyright (c) 2016 - 2021, > Intel Corporation. All rights reserved. Copyright (c) 2020, AMD Inc. All > rights reserved.SPDX-License-Identifier: BSD-2-Clause-Patent@@ - > 1012,7 +1012,6 @@ FillExchangeInfoData ( >IA32_CR4 Cr4;ExchangeInfo = > CpuMpData- > >MpCpuExchangeInfo;- Exchange
Re: [edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release
On 02/04/21 03:59, Ni, Ray wrote: > When AP firstly wakes up, MpFuncs.nasm contains below logic to assign > an unique ApIndex to each AP according to who comes first: > ---ASM--- > TestLock: > xchg [edi], eax > cmpeax, NotVacantFlag > jz TestLock > > movecx, esi > addecx, ApIndexLocation > incdword [ecx] > movebx, [ecx] > > Releaselock: > moveax, VacantFlag > xchg [edi], eax > ---ASM END--- > > "lock inc" cannot be used to increase ApIndex because not only the > global ApIndex should be increased, but also the result should be > stored to a local general purpose register EBX. > > This patch learns from the NASM implementation of > InternalSyncIncrement() to use "XADD" instruction which can increase > the global ApIndex and store the original ApIndex to EBX in one > instruction. > > With this patch, OVMF when running in a 255 threads QEMU spends about > one second to wakeup all APs. Original implementation needs more than > 10 seconds. > > Signed-off-by: Ray Ni > Cc: Laszlo Ersek > Cc: Eric Dong > Cc: Rahul1 Kumar > --- > UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc | 4 > .../Library/MpInitLib/Ia32/MpFuncs.nasm | 21 +-- > UefiCpuPkg/Library/MpInitLib/MpLib.c | 3 +-- > UefiCpuPkg/Library/MpInitLib/MpLib.h | 3 +-- > UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc| 4 > UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 > 6 files changed, 11 insertions(+), 42 deletions(-) > > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > index 244c1e72b7..5f1f0351d2 100644 > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc > @@ -12,16 +12,12 @@ > ; > > ;--- > > -VacantFlagequ00h > -NotVacantFlag equ0ffh > - > CPU_SWITCH_STATE_IDLE equ0 > CPU_SWITCH_STATE_STORED equ1 > CPU_SWITCH_STATE_LOADED equ2 > > MP_CPU_EXCHANGE_INFO_OFFSET equ (SwitchToRealProcEnd - > RendezvousFunnelProcStart) > struc MP_CPU_EXCHANGE_INFO > - .Lock: resd 1 >.StackStart: resd 1 >.StackSize:resd 1 >.CFunction:resd 1 > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > index 908c2eb447..b7267393db 100644 > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm > @@ -122,23 +122,12 @@ SkipEnableExecuteDisable: > > ; AP init > movedi, esi > -addedi, MP_CPU_EXCHANGE_INFO_OFFSET > -moveax, NotVacantFlag > - > -TestLock: > -xchg [edi], eax > -cmpeax, NotVacantFlag > -jz TestLock > - > -movecx, esi > -addecx, MP_CPU_EXCHANGE_INFO_OFFSET + > MP_CPU_EXCHANGE_INFO.ApIndex > -incdword [ecx] > -movebx, [ecx] > - > -Releaselock: > -moveax, VacantFlag > -xchg [edi], eax > +addedi, MP_CPU_EXCHANGE_INFO_OFFSET + > MP_CPU_EXCHANGE_INFO.ApIndex > +movebx, 1 > +lock xadd [edi], ebx ; EBX = ApIndex++ (1) For more clarity, I would suggest lock xadd dword [edi], ebx (even though "ebx" already specifies the width, yes) Applies to the X64 version too. > +incebx ; EBX is CpuNumber > > +; program stack (2) This change belongs in a separate patch. The X64 version already has this comment, and making both versions makes sense. But touching anything "stack-related" in the current patch is very confusing to me. Thanks Laszlo > movedi, esi > addedi, MP_CPU_EXCHANGE_INFO_OFFSET + > MP_CPU_EXCHANGE_INFO.StackSize > moveax, [edi] > diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c > b/UefiCpuPkg/Library/MpInitLib/MpLib.c > index 8b1f7f84ba..32a3951742 100644 > --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c > +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c > @@ -1,7 +1,7 @@ > /** @file >CPU MP Initialize Library common functions. > > - Copyright (c) 2016 - 2020, Intel Corporation. All rights reserved. > + Copyright (c) 2016 - 2021, Intel Corporation. All rights reserved. >Copyright (c) 2020, AMD Inc. All rights reserved. > >SPDX-License-Identifier: BSD-2-Clause-Patent > @@ -1012,7 +1012,6 @@ FillExchangeInfoData ( >IA32_CR4 Cr4; > >ExchangeInfo = CpuMpData->MpCpuExchangeInfo; > - ExchangeInfo->Lock= 0; >ExchangeInfo->StackStart = CpuMpData->Buffer; >ExchangeInfo->StackSize = CpuMpData->CpuApStackSize; >ExchangeInfo->BufferStart =
[edk2-devel] [PATCH 2/2] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release
When AP firstly wakes up, MpFuncs.nasm contains below logic to assign an unique ApIndex to each AP according to who comes first: ---ASM--- TestLock: xchg [edi], eax cmpeax, NotVacantFlag jz TestLock movecx, esi addecx, ApIndexLocation incdword [ecx] movebx, [ecx] Releaselock: moveax, VacantFlag xchg [edi], eax ---ASM END--- "lock inc" cannot be used to increase ApIndex because not only the global ApIndex should be increased, but also the result should be stored to a local general purpose register EBX. This patch learns from the NASM implementation of InternalSyncIncrement() to use "XADD" instruction which can increase the global ApIndex and store the original ApIndex to EBX in one instruction. With this patch, OVMF when running in a 255 threads QEMU spends about one second to wakeup all APs. Original implementation needs more than 10 seconds. Signed-off-by: Ray Ni Cc: Laszlo Ersek Cc: Eric Dong Cc: Rahul1 Kumar --- UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc | 4 .../Library/MpInitLib/Ia32/MpFuncs.nasm | 21 +-- UefiCpuPkg/Library/MpInitLib/MpLib.c | 3 +-- UefiCpuPkg/Library/MpInitLib/MpLib.h | 3 +-- UefiCpuPkg/Library/MpInitLib/X64/MpEqu.inc| 4 UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 6 files changed, 11 insertions(+), 42 deletions(-) diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc index 244c1e72b7..5f1f0351d2 100644 --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpEqu.inc @@ -12,16 +12,12 @@ ; ;--- -VacantFlagequ00h -NotVacantFlag equ0ffh - CPU_SWITCH_STATE_IDLE equ0 CPU_SWITCH_STATE_STORED equ1 CPU_SWITCH_STATE_LOADED equ2 MP_CPU_EXCHANGE_INFO_OFFSET equ (SwitchToRealProcEnd - RendezvousFunnelProcStart) struc MP_CPU_EXCHANGE_INFO - .Lock: resd 1 .StackStart: resd 1 .StackSize:resd 1 .CFunction:resd 1 diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm index 908c2eb447..b7267393db 100644 --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm @@ -122,23 +122,12 @@ SkipEnableExecuteDisable: ; AP init movedi, esi -addedi, MP_CPU_EXCHANGE_INFO_OFFSET -moveax, NotVacantFlag - -TestLock: -xchg [edi], eax -cmpeax, NotVacantFlag -jz TestLock - -movecx, esi -addecx, MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.ApIndex -incdword [ecx] -movebx, [ecx] - -Releaselock: -moveax, VacantFlag -xchg [edi], eax +addedi, MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.ApIndex +movebx, 1 +lock xadd [edi], ebx ; EBX = ApIndex++ +incebx ; EBX is CpuNumber +; program stack movedi, esi addedi, MP_CPU_EXCHANGE_INFO_OFFSET + MP_CPU_EXCHANGE_INFO.StackSize moveax, [edi] diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c b/UefiCpuPkg/Library/MpInitLib/MpLib.c index 8b1f7f84ba..32a3951742 100644 --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c @@ -1,7 +1,7 @@ /** @file CPU MP Initialize Library common functions. - Copyright (c) 2016 - 2020, Intel Corporation. All rights reserved. + Copyright (c) 2016 - 2021, Intel Corporation. All rights reserved. Copyright (c) 2020, AMD Inc. All rights reserved. SPDX-License-Identifier: BSD-2-Clause-Patent @@ -1012,7 +1012,6 @@ FillExchangeInfoData ( IA32_CR4 Cr4; ExchangeInfo = CpuMpData->MpCpuExchangeInfo; - ExchangeInfo->Lock= 0; ExchangeInfo->StackStart = CpuMpData->Buffer; ExchangeInfo->StackSize = CpuMpData->CpuApStackSize; ExchangeInfo->BufferStart = CpuMpData->WakeupBuffer; diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.h b/UefiCpuPkg/Library/MpInitLib/MpLib.h index 02652eaae1..0bd60388b1 100644 --- a/UefiCpuPkg/Library/MpInitLib/MpLib.h +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.h @@ -1,7 +1,7 @@ /** @file Common header file for MP Initialize Library. - Copyright (c) 2016 - 2020, Intel Corporation. All rights reserved. + Copyright (c) 2016 - 2021, Intel Corporation. All rights reserved. Copyright (c) 2020, AMD Inc. All rights reserved. SPDX-License-Identifier: BSD-2-Clause-Patent @@ -190,7 +190,6 @@ typedef struct _CPU_MP_DATA CPU_MP_DATA; // into this structure are used in assembly code in this