Re: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-25 Thread Laszlo Ersek
On 02/25/21 05:04, Ni, Ray wrote:
> Laszlo,
> Do you think that Mike's R-b to the first patch can be an ack to the V3 patch 
> set?

No, I don't think so. If an R-b is given in response to a specific patch
(not the cover letter), and the reviewer doesn't explicitly say "series
Reviewed-by" or "for the entire series:", then the R-b applies only to
the specific patch.

Now, a different question is whether you want or need Mike's R-b for
*all* of the patches. That's up to you and Mike to decide.

> Can you please review and provide comments?

Yes, I'll comment soon.

Thanks
Laszlo

>> -Original Message-
>> From: Kinney, Michael D 
>> Sent: Wednesday, February 24, 2021 2:12 AM
>> To: devel@edk2.groups.io; Ni, Ray ; Kinney, Michael D
>> 
>> Cc: Dong, Eric ; Laszlo Ersek ;
>> Kumar, Rahul1 
>> Subject: RE: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD
>> to avoid lock acquire/release
>>
>> Reviewed-by: Michael D Kinney 
>>
>>> -Original Message-
>>> From: devel@edk2.groups.io  On Behalf Of Ni,
>> Ray
>>> Sent: Tuesday, February 9, 2021 6:17 AM
>>> To: devel@edk2.groups.io
>>> Cc: Dong, Eric ; Laszlo Ersek ;
>> Kumar, Rahul1 
>>> Subject: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to
>> avoid lock acquire/release
>>>
>>> When AP firstly wakes up, MpFuncs.nasm contains below logic to assign
>>> an unique ApIndex to each AP according to who comes first:
>>> ---ASM---
>>> TestLock:
>>> xchg   [edi], eax
>>> cmpeax, NotVacantFlag
>>> jz TestLock
>>>
>>> movecx, esi
>>> addecx, ApIndexLocation
>>> incdword [ecx]
>>> movebx, [ecx]
>>>
>>> Releaselock:
>>> moveax, VacantFlag
>>> xchg   [edi], eax
>>> ---ASM END---
>>>
>>> "lock inc" cannot be used to increase ApIndex because not only the
>>> global ApIndex should be increased, but also the result should be
>>> stored to a local general purpose register EBX.
>>>
>>> This patch learns from the NASM implementation of
>>> InternalSyncIncrement() to use "XADD" instruction which can increase
>>> the global ApIndex and store the original ApIndex to EBX in one
>>> instruction.
>>>
>>> With this patch, OVMF when running in a 255 threads QEMU spends about
>>> one second to wakeup all APs. Original implementation needs more than
>>> 10 seconds.
>>>
>>> Signed-off-by: Ray Ni 
>>> Cc: Eric Dong 
>>> Cc: Laszlo Ersek 
>>> Cc: Rahul Kumar 
>>> ---
>>>  .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 20 ++-
>>>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 ++---
>>>  2 files changed, 12 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>>> index 7e81d24aa6..2eaddc93bc 100644
>>> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>>> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
>>> @@ -1,5 +1,5 @@
>>>  
>>> ;--
>>>  ;
>>>
>>> -; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.
>>>
>>> +; Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved.
>>>
>>>  ; SPDX-License-Identifier: BSD-2-Clause-Patent
>>>
>>>  ;
>>>
>>>  ; Module Name:
>>>
>>> @@ -125,19 +125,11 @@ SkipEnableExecuteDisable:
>>>  addedi, LockLocation
>>>
>>>  moveax, NotVacantFlag
>>>
>>>
>>>
>>> -TestLock:
>>>
>>> -xchg   [edi], eax
>>>
>>> -cmpeax, NotVacantFlag
>>>
>>> -jz TestLock
>>>
>>> -
>>>
>>> -movecx, esi
>>>
>>> -addecx, ApIndexLocation
>>>
>>> -incdword [ecx]
>>>
>>> -movebx, [ecx]
>>>
>>> -
>>>
>>> -Releaselock:
>>>
>>> -moveax, VacantFlag
>>>
>>> -xchg   [edi], eax
>>>
>>> +movedi, esi
>>>
>>> +addedi, ApIndexLocation
>>

Re: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-24 Thread Ni, Ray
Laszlo,
Do you think that Mike's R-b to the first patch can be an ack to the V3 patch 
set?

Can you please review and provide comments?

Thanks,
Ray


> -Original Message-
> From: Kinney, Michael D 
> Sent: Wednesday, February 24, 2021 2:12 AM
> To: devel@edk2.groups.io; Ni, Ray ; Kinney, Michael D
> 
> Cc: Dong, Eric ; Laszlo Ersek ;
> Kumar, Rahul1 
> Subject: RE: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD
> to avoid lock acquire/release
> 
> Reviewed-by: Michael D Kinney 
> 
> > -Original Message-
> > From: devel@edk2.groups.io  On Behalf Of Ni,
> Ray
> > Sent: Tuesday, February 9, 2021 6:17 AM
> > To: devel@edk2.groups.io
> > Cc: Dong, Eric ; Laszlo Ersek ;
> Kumar, Rahul1 
> > Subject: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to
> avoid lock acquire/release
> >
> > When AP firstly wakes up, MpFuncs.nasm contains below logic to assign
> > an unique ApIndex to each AP according to who comes first:
> > ---ASM---
> > TestLock:
> > xchg   [edi], eax
> > cmpeax, NotVacantFlag
> > jz TestLock
> >
> > movecx, esi
> > addecx, ApIndexLocation
> > incdword [ecx]
> > movebx, [ecx]
> >
> > Releaselock:
> > moveax, VacantFlag
> > xchg   [edi], eax
> > ---ASM END---
> >
> > "lock inc" cannot be used to increase ApIndex because not only the
> > global ApIndex should be increased, but also the result should be
> > stored to a local general purpose register EBX.
> >
> > This patch learns from the NASM implementation of
> > InternalSyncIncrement() to use "XADD" instruction which can increase
> > the global ApIndex and store the original ApIndex to EBX in one
> > instruction.
> >
> > With this patch, OVMF when running in a 255 threads QEMU spends about
> > one second to wakeup all APs. Original implementation needs more than
> > 10 seconds.
> >
> > Signed-off-by: Ray Ni 
> > Cc: Eric Dong 
> > Cc: Laszlo Ersek 
> > Cc: Rahul Kumar 
> > ---
> >  .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 20 ++-
> >  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 ++---
> >  2 files changed, 12 insertions(+), 26 deletions(-)
> >
> > diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > index 7e81d24aa6..2eaddc93bc 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> > @@ -1,5 +1,5 @@
> >  
> > ;--
> >  ;
> >
> > -; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.
> >
> > +; Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved.
> >
> >  ; SPDX-License-Identifier: BSD-2-Clause-Patent
> >
> >  ;
> >
> >  ; Module Name:
> >
> > @@ -125,19 +125,11 @@ SkipEnableExecuteDisable:
> >  addedi, LockLocation
> >
> >  moveax, NotVacantFlag
> >
> >
> >
> > -TestLock:
> >
> > -xchg   [edi], eax
> >
> > -cmpeax, NotVacantFlag
> >
> > -jz TestLock
> >
> > -
> >
> > -movecx, esi
> >
> > -addecx, ApIndexLocation
> >
> > -incdword [ecx]
> >
> > -movebx, [ecx]
> >
> > -
> >
> > -Releaselock:
> >
> > -moveax, VacantFlag
> >
> > -xchg   [edi], eax
> >
> > +movedi, esi
> >
> > +addedi, ApIndexLocation
> >
> > +movebx, 1
> >
> > +lock xadd  dword [edi], ebx ; EBX = ApIndex++
> >
> > +incebx  ; EBX is CpuNumber
> >
> >
> >
> >  movedi, esi
> >
> >  addedi, StackSizeLocation
> >
> > diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> > index aecfd07bc0..5b588f2dcb 100644
> > --- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> > +++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> > @@ -1,5 +1,5 @@
> >  
> > ;--
> >  ;
> >
> > -; Copyright (c) 2015 - 2019, Int

Re: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-23 Thread Michael D Kinney
Reviewed-by: Michael D Kinney 

> -Original Message-
> From: devel@edk2.groups.io  On Behalf Of Ni, Ray
> Sent: Tuesday, February 9, 2021 6:17 AM
> To: devel@edk2.groups.io
> Cc: Dong, Eric ; Laszlo Ersek ; 
> Kumar, Rahul1 
> Subject: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to avoid 
> lock acquire/release
> 
> When AP firstly wakes up, MpFuncs.nasm contains below logic to assign
> an unique ApIndex to each AP according to who comes first:
> ---ASM---
> TestLock:
> xchg   [edi], eax
> cmpeax, NotVacantFlag
> jz TestLock
> 
> movecx, esi
> addecx, ApIndexLocation
> incdword [ecx]
> movebx, [ecx]
> 
> Releaselock:
> moveax, VacantFlag
> xchg   [edi], eax
> ---ASM END---
> 
> "lock inc" cannot be used to increase ApIndex because not only the
> global ApIndex should be increased, but also the result should be
> stored to a local general purpose register EBX.
> 
> This patch learns from the NASM implementation of
> InternalSyncIncrement() to use "XADD" instruction which can increase
> the global ApIndex and store the original ApIndex to EBX in one
> instruction.
> 
> With this patch, OVMF when running in a 255 threads QEMU spends about
> one second to wakeup all APs. Original implementation needs more than
> 10 seconds.
> 
> Signed-off-by: Ray Ni 
> Cc: Eric Dong 
> Cc: Laszlo Ersek 
> Cc: Rahul Kumar 
> ---
>  .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 20 ++-
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 ++---
>  2 files changed, 12 insertions(+), 26 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 7e81d24aa6..2eaddc93bc 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -1,5 +1,5 @@
>  
> ;--
>  ;
> 
> -; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.
> 
> +; Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved.
> 
>  ; SPDX-License-Identifier: BSD-2-Clause-Patent
> 
>  ;
> 
>  ; Module Name:
> 
> @@ -125,19 +125,11 @@ SkipEnableExecuteDisable:
>  addedi, LockLocation
> 
>  moveax, NotVacantFlag
> 
> 
> 
> -TestLock:
> 
> -xchg   [edi], eax
> 
> -cmpeax, NotVacantFlag
> 
> -jz TestLock
> 
> -
> 
> -movecx, esi
> 
> -addecx, ApIndexLocation
> 
> -incdword [ecx]
> 
> -movebx, [ecx]
> 
> -
> 
> -Releaselock:
> 
> -moveax, VacantFlag
> 
> -xchg   [edi], eax
> 
> +movedi, esi
> 
> +addedi, ApIndexLocation
> 
> +movebx, 1
> 
> +lock xadd  dword [edi], ebx ; EBX = ApIndex++
> 
> +incebx  ; EBX is CpuNumber
> 
> 
> 
>  movedi, esi
> 
>  addedi, StackSizeLocation
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm 
> b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> index aecfd07bc0..5b588f2dcb 100644
> --- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> @@ -1,5 +1,5 @@
>  
> ;--
>  ;
> 
> -; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.
> 
> +; Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved.
> 
>  ; SPDX-License-Identifier: BSD-2-Clause-Patent
> 
>  ;
> 
>  ; Module Name:
> 
> @@ -161,18 +161,12 @@ LongModeStart:
>  addedi, LockLocation
> 
>  movrax, NotVacantFlag
> 
> 
> 
> -TestLock:
> 
> -xchg   qword [edi], rax
> 
> -cmprax, NotVacantFlag
> 
> -jz TestLock
> 
> -
> 
> -leaecx, [esi + ApIndexLocation]
> 
> -incdword [ecx]
> 
> -movebx, [ecx]
> 
> +movedi, esi
> 
> +addedi, ApIndexLocation
> 
> +movebx, 1
> 
> +lock xadd  dword [edi], ebx ; EBX = ApIndex++
> 
> +incebx  ; EBX is CpuNumber
> 
> 
> 
> -Releaselock:
> 
> -movrax, VacantFlag
> 
> -xchg   qword [edi], rax
> 
>  ; program stack
> 
>  

Re: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-22 Thread Ni, Ray
Mike,
This patch follows your suggestion to fix the performance issue first, clean up 
the code next.

Can you check specifically this patch?

Thanks,
Ray

> -Original Message-
> From: devel@edk2.groups.io  On Behalf Of Ni, Ray
> Sent: Tuesday, February 9, 2021 10:17 PM
> To: devel@edk2.groups.io
> Cc: Dong, Eric ; Laszlo Ersek ;
> Kumar, Rahul1 
> Subject: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to
> avoid lock acquire/release
> 
> When AP firstly wakes up, MpFuncs.nasm contains below logic to assign
> an unique ApIndex to each AP according to who comes first:
> ---ASM---
> TestLock:
> xchg   [edi], eax
> cmpeax, NotVacantFlag
> jz TestLock
> 
> movecx, esi
> addecx, ApIndexLocation
> incdword [ecx]
> movebx, [ecx]
> 
> Releaselock:
> moveax, VacantFlag
> xchg   [edi], eax
> ---ASM END---
> 
> "lock inc" cannot be used to increase ApIndex because not only the
> global ApIndex should be increased, but also the result should be
> stored to a local general purpose register EBX.
> 
> This patch learns from the NASM implementation of
> InternalSyncIncrement() to use "XADD" instruction which can increase
> the global ApIndex and store the original ApIndex to EBX in one
> instruction.
> 
> With this patch, OVMF when running in a 255 threads QEMU spends about
> one second to wakeup all APs. Original implementation needs more than
> 10 seconds.
> 
> Signed-off-by: Ray Ni 
> Cc: Eric Dong 
> Cc: Laszlo Ersek 
> Cc: Rahul Kumar 
> ---
>  .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 20 ++-
>  UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 ++---
>  2 files changed, 12 insertions(+), 26 deletions(-)
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> index 7e81d24aa6..2eaddc93bc 100644
> --- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
> @@ -1,5 +1,5 @@
>  
> ;--
>  ;
> 
> -; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.
> 
> +; Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved.
> 
>  ; SPDX-License-Identifier: BSD-2-Clause-Patent
> 
>  ;
> 
>  ; Module Name:
> 
> @@ -125,19 +125,11 @@ SkipEnableExecuteDisable:
>  addedi, LockLocation
> 
>  moveax, NotVacantFlag
> 
> 
> 
> -TestLock:
> 
> -xchg   [edi], eax
> 
> -cmpeax, NotVacantFlag
> 
> -jz TestLock
> 
> -
> 
> -movecx, esi
> 
> -addecx, ApIndexLocation
> 
> -incdword [ecx]
> 
> -movebx, [ecx]
> 
> -
> 
> -Releaselock:
> 
> -moveax, VacantFlag
> 
> -xchg   [edi], eax
> 
> +movedi, esi
> 
> +addedi, ApIndexLocation
> 
> +movebx, 1
> 
> +lock xadd  dword [edi], ebx ; EBX = ApIndex++
> 
> +incebx  ; EBX is CpuNumber
> 
> 
> 
>  movedi, esi
> 
>  addedi, StackSizeLocation
> 
> diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> index aecfd07bc0..5b588f2dcb 100644
> --- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> +++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
> @@ -1,5 +1,5 @@
>  
> ;--
>  ;
> 
> -; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.
> 
> +; Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved.
> 
>  ; SPDX-License-Identifier: BSD-2-Clause-Patent
> 
>  ;
> 
>  ; Module Name:
> 
> @@ -161,18 +161,12 @@ LongModeStart:
>  addedi, LockLocation
> 
>  movrax, NotVacantFlag
> 
> 
> 
> -TestLock:
> 
> -xchg   qword [edi], rax
> 
> -cmprax, NotVacantFlag
> 
> -jz TestLock
> 
> -
> 
> -leaecx, [esi + ApIndexLocation]
> 
> -incdword [ecx]
> 
> -movebx, [ecx]
> 
> +movedi, esi
> 
> +addedi, ApIndexLocation
> 
> +movebx, 1
> 
> +lock xadd  dword [edi], ebx ; EBX = ApIndex++
> 
> +incebx  ; EBX is CpuNumber
> 
> 
> 
> -Rele

Re: [edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-22 Thread Dong, Eric
Reviewed-by: Eric Dong 

-Original Message-
From: Ni, Ray  
Sent: Tuesday, February 9, 2021 10:17 PM
To: devel@edk2.groups.io
Cc: Dong, Eric ; Laszlo Ersek ; Kumar, 
Rahul1 
Subject: [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to avoid lock 
acquire/release

When AP firstly wakes up, MpFuncs.nasm contains below logic to assign an unique 
ApIndex to each AP according to who comes first:
---ASM---
TestLock:
xchg   [edi], eax
cmpeax, NotVacantFlag
jz TestLock

movecx, esi
addecx, ApIndexLocation
incdword [ecx]
movebx, [ecx]

Releaselock:
moveax, VacantFlag
xchg   [edi], eax
---ASM END---

"lock inc" cannot be used to increase ApIndex because not only the global 
ApIndex should be increased, but also the result should be stored to a local 
general purpose register EBX.

This patch learns from the NASM implementation of
InternalSyncIncrement() to use "XADD" instruction which can increase the global 
ApIndex and store the original ApIndex to EBX in one instruction.

With this patch, OVMF when running in a 255 threads QEMU spends about one 
second to wakeup all APs. Original implementation needs more than
10 seconds.

Signed-off-by: Ray Ni 
Cc: Eric Dong 
Cc: Laszlo Ersek 
Cc: Rahul Kumar 
---
 .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 20 ++-
 UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 ++---
 2 files changed, 12 insertions(+), 26 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
index 7e81d24aa6..2eaddc93bc 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
@@ -1,5 +1,5 @@
 
;-- 
;-; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.+; 
Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved. ; 
SPDX-License-Identifier: BSD-2-Clause-Patent ; ; Module Name:@@ -125,19 +125,11 
@@ SkipEnableExecuteDisable:
 addedi, LockLocation moveax, NotVacantFlag -TestLock:- 
   xchg   [edi], eax-cmpeax, NotVacantFlag-jz 
TestLock--movecx, esi-addecx, ApIndexLocation-inc   
 dword [ecx]-movebx, [ecx]--Releaselock:-moveax, 
VacantFlag-xchg   [edi], eax+movedi, esi+add
edi, ApIndexLocation+movebx, 1+lock xadd  dword [edi], ebx  
   ; EBX = ApIndex+++incebx  ; 
EBX is CpuNumber  movedi, esi addedi, 
StackSizeLocationdiff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm 
b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
index aecfd07bc0..5b588f2dcb 100644
--- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
@@ -1,5 +1,5 @@
 
;-- 
;-; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.+; 
Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved. ; 
SPDX-License-Identifier: BSD-2-Clause-Patent ; ; Module Name:@@ -161,18 +161,12 
@@ LongModeStart:
 addedi, LockLocation movrax, NotVacantFlag -TestLock:- 
   xchg   qword [edi], rax-cmprax, NotVacantFlag-jz 
TestLock--leaecx, [esi + ApIndexLocation]-incdword 
[ecx]-movebx, [ecx]+movedi, esi+addedi, 
ApIndexLocation+movebx, 1+lock xadd  dword [edi], ebx   
  ; EBX = ApIndex+++incebx  ; EBX 
is CpuNumber -Releaselock:-movrax, VacantFlag-xchg   qword 
[edi], rax ; program stack movedi, esi addedi, 
StackSizeLocation-- 
2.27.0.windows.1



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71926): https://edk2.groups.io/g/devel/message/71926
Mute This Topic: https://groups.io/mt/80504936/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




[edk2-devel] [PATCH v3 1/4] UefiCpuPkg/MpInitLib: Use XADD to avoid lock acquire/release

2021-02-09 Thread Ni, Ray
When AP firstly wakes up, MpFuncs.nasm contains below logic to assign
an unique ApIndex to each AP according to who comes first:
---ASM---
TestLock:
xchg   [edi], eax
cmpeax, NotVacantFlag
jz TestLock

movecx, esi
addecx, ApIndexLocation
incdword [ecx]
movebx, [ecx]

Releaselock:
moveax, VacantFlag
xchg   [edi], eax
---ASM END---

"lock inc" cannot be used to increase ApIndex because not only the
global ApIndex should be increased, but also the result should be
stored to a local general purpose register EBX.

This patch learns from the NASM implementation of
InternalSyncIncrement() to use "XADD" instruction which can increase
the global ApIndex and store the original ApIndex to EBX in one
instruction.

With this patch, OVMF when running in a 255 threads QEMU spends about
one second to wakeup all APs. Original implementation needs more than
10 seconds.

Signed-off-by: Ray Ni 
Cc: Eric Dong 
Cc: Laszlo Ersek 
Cc: Rahul Kumar 
---
 .../Library/MpInitLib/Ia32/MpFuncs.nasm   | 20 ++-
 UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 18 ++---
 2 files changed, 12 insertions(+), 26 deletions(-)

diff --git a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm 
b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
index 7e81d24aa6..2eaddc93bc 100644
--- a/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/Ia32/MpFuncs.nasm
@@ -1,5 +1,5 @@
 
;-- 
;
-; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.
+; Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved.
 ; SPDX-License-Identifier: BSD-2-Clause-Patent
 ;
 ; Module Name:
@@ -125,19 +125,11 @@ SkipEnableExecuteDisable:
 addedi, LockLocation
 moveax, NotVacantFlag
 
-TestLock:
-xchg   [edi], eax
-cmpeax, NotVacantFlag
-jz TestLock
-
-movecx, esi
-addecx, ApIndexLocation
-incdword [ecx]
-movebx, [ecx]
-
-Releaselock:
-moveax, VacantFlag
-xchg   [edi], eax
+movedi, esi
+addedi, ApIndexLocation
+movebx, 1
+lock xadd  dword [edi], ebx ; EBX = ApIndex++
+incebx  ; EBX is CpuNumber
 
 movedi, esi
 addedi, StackSizeLocation
diff --git a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm 
b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
index aecfd07bc0..5b588f2dcb 100644
--- a/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
+++ b/UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm
@@ -1,5 +1,5 @@
 
;-- 
;
-; Copyright (c) 2015 - 2019, Intel Corporation. All rights reserved.
+; Copyright (c) 2015 - 2021, Intel Corporation. All rights reserved.
 ; SPDX-License-Identifier: BSD-2-Clause-Patent
 ;
 ; Module Name:
@@ -161,18 +161,12 @@ LongModeStart:
 addedi, LockLocation
 movrax, NotVacantFlag
 
-TestLock:
-xchg   qword [edi], rax
-cmprax, NotVacantFlag
-jz TestLock
-
-leaecx, [esi + ApIndexLocation]
-incdword [ecx]
-movebx, [ecx]
+movedi, esi
+addedi, ApIndexLocation
+movebx, 1
+lock xadd  dword [edi], ebx ; EBX = ApIndex++
+incebx  ; EBX is CpuNumber
 
-Releaselock:
-movrax, VacantFlag
-xchg   qword [edi], rax
 ; program stack
 movedi, esi
 addedi, StackSizeLocation
-- 
2.27.0.windows.1



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71517): https://edk2.groups.io/g/devel/message/71517
Mute This Topic: https://groups.io/mt/80504936/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-