Re: [edk2] [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.

2018-10-16 Thread Dong, Eric
Hi Ruiyu,

> -Original Message-
> From: Ni, Ruiyu
> Sent: Tuesday, October 16, 2018 11:05 AM
> To: Dong, Eric ; edk2-devel@lists.01.org
> Cc: Laszlo Ersek 
> Subject: Re: [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to
> support semaphore type.
> 
> On 10/15/2018 10:49 AM, Eric Dong wrote:
> > In a system which has multiple cores, current set register value task costs
> huge times.
> > After investigation, current set MSR task costs most of the times. Current
> logic uses
> > SpinLock to let set MSR task as an single thread task for all cores. Because
> MSR has
> > scope attribute which may cause GP fault if multiple APs set MSR at the
> same time,
> > current logic use an easiest solution (use SpinLock) to avoid this issue, 
> > but it
> will
> > cost huge times.
> >
> > In order to fix this performance issue, new solution will set MSRs base on
> their scope
> > attribute. After this, the SpinLock will not needed. Without SpinLock, new
> issue raised
> > which is caused by MSR dependence. For example, MSR A depends on
> MSR B which means MSR A
> > must been set after MSR B has been set. Also MSR B is package scope level
> and MSR A is
> > thread scope level. If system has multiple threads, Thread 1 needs to set
> the thread level
> > MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs
> task for thread 1
> > and thread 2 like below:
> >
> >  Thread 1 Thread 2
> > MSR B  NY
> > MSR A  YY
> >
> > If driver don't control execute MSR order, for thread 1, it will execute MSR
> A first, but
> > at this time, MSR B not been executed yet by thread 2. system may trig
> exception at this
> > time.
> >
> > In order to fix the above issue, driver introduces semaphore logic to 
> > control
> the MSR
> > execute sequence. For the above case, a semaphore will be add between
> MSR A and B for
> > all threads. Semaphore has scope info for it. The possible scope value is
> core or package.
> > For each thread, when it meets a semaphore during it set registers, it will 
> > 1)
> release
> > semaphore (+1) for each threads in this core or package(based on the
> scope info for this
> > semaphore) 2) acquire semaphore (-1) for all the threads in this core or
> package(based
> > on the scope info for this semaphore). With these two steps, driver can
> control MSR
> > sequence. Sample code logic like below:
> >
> >//
> >// First increase semaphore count by 1 for processors in this package.
> >//
> >for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ;
> ProcessorIndex ++) {
> >  LibReleaseSemaphore ((UINT32 *) [PackageOffset +
> ProcessorIndex]);
> >}
> >//
> >// Second, check whether the count has reach the check number.
> >//
> >for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex
> ++) {
> >  LibWaitForSemaphore ([ApOffset]);
> >}
> >
> > Platform Requirement:
> > 1. This change requires register MSR setting base on MSR scope info. If 
> > still
> register MSR
> > for all threads, exception may raised.
> >
> > Known limitation:
> > 1. Current CpuFeatures driver supports DXE instance and PEI instance. But
> semaphore logic
> > requires Aps execute in async mode which is not supported by PEI driver.
> So CpuFeature
> > PEI instance not works after this change. We plan to support async mode
> for PEI in phase
> > 2 for this task.
> >
> > Cc: Ruiyu Ni 
> > Cc: Laszlo Ersek 
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Eric Dong 
> > ---
> >   .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 --
> -
> >   .../DxeRegisterCpuFeaturesLib.c|  71 +++-
> >   .../DxeRegisterCpuFeaturesLib.inf  |   3 +
> >   .../PeiRegisterCpuFeaturesLib.c|  55 ++-
> >   .../PeiRegisterCpuFeaturesLib.inf  |   1 +
> >   .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
> >   .../RegisterCpuFeaturesLib.c   | 452 
> > ++---
> >   7 files changed, 840 insertions(+), 117 deletions(-)
> >
> > diff --git
> a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > index ba3fb3250f..f820b4fed7 100644
> > --- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > +++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
> > @@ -145,6 +145,20 @@ CpuInitDataInitialize (
> > CPU_FEATURES_INIT_ORDER  *InitOrder;
> > CPU_FEATURES_DATA*CpuFeaturesData;
> > LIST_ENTRY   *Entry;
> > +  UINT32   Core;
> > +  UINT32   Package;
> > +  UINT32   Thread;
> > +  EFI_CPU_PHYSICAL_LOCATION*Location;
> > +  UINT32

Re: [edk2] [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.

2018-10-15 Thread Ni, Ruiyu

On 10/15/2018 10:49 AM, Eric Dong wrote:

In a system which has multiple cores, current set register value task costs 
huge times.
After investigation, current set MSR task costs most of the times. Current 
logic uses
SpinLock to let set MSR task as an single thread task for all cores. Because 
MSR has
scope attribute which may cause GP fault if multiple APs set MSR at the same 
time,
current logic use an easiest solution (use SpinLock) to avoid this issue, but 
it will
cost huge times.

In order to fix this performance issue, new solution will set MSRs base on 
their scope
attribute. After this, the SpinLock will not needed. Without SpinLock, new 
issue raised
which is caused by MSR dependence. For example, MSR A depends on MSR B which 
means MSR A
must been set after MSR B has been set. Also MSR B is package scope level and 
MSR A is
thread scope level. If system has multiple threads, Thread 1 needs to set the 
thread level
MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for 
thread 1
and thread 2 like below:

 Thread 1 Thread 2
MSR B  NY
MSR A  YY

If driver don't control execute MSR order, for thread 1, it will execute MSR A 
first, but
at this time, MSR B not been executed yet by thread 2. system may trig 
exception at this
time.

In order to fix the above issue, driver introduces semaphore logic to control 
the MSR
execute sequence. For the above case, a semaphore will be add between MSR A and 
B for
all threads. Semaphore has scope info for it. The possible scope value is core 
or package.
For each thread, when it meets a semaphore during it set registers, it will 1) 
release
semaphore (+1) for each threads in this core or package(based on the scope info 
for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or 
package(based
on the scope info for this semaphore). With these two steps, driver can control 
MSR
sequence. Sample code logic like below:

   //
   // First increase semaphore count by 1 for processors in this package.
   //
   for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; 
ProcessorIndex ++) {
 LibReleaseSemaphore ((UINT32 *) [PackageOffset + 
ProcessorIndex]);
   }
   //
   // Second, check whether the count has reach the check number.
   //
   for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
 LibWaitForSemaphore ([ApOffset]);
   }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still 
register MSR
for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But 
semaphore logic
requires Aps execute in async mode which is not supported by PEI driver. So 
CpuFeature
PEI instance not works after this change. We plan to support async mode for 
PEI in phase
2 for this task.

Cc: Ruiyu Ni 
Cc: Laszlo Ersek 
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong 
---
  .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ---
  .../DxeRegisterCpuFeaturesLib.c|  71 +++-
  .../DxeRegisterCpuFeaturesLib.inf  |   3 +
  .../PeiRegisterCpuFeaturesLib.c|  55 ++-
  .../PeiRegisterCpuFeaturesLib.inf  |   1 +
  .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
  .../RegisterCpuFeaturesLib.c   | 452 ++---
  7 files changed, 840 insertions(+), 117 deletions(-)

diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
index ba3fb3250f..f820b4fed7 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
@@ -145,6 +145,20 @@ CpuInitDataInitialize (
CPU_FEATURES_INIT_ORDER  *InitOrder;
CPU_FEATURES_DATA*CpuFeaturesData;
LIST_ENTRY   *Entry;
+  UINT32   Core;
+  UINT32   Package;
+  UINT32   Thread;
+  EFI_CPU_PHYSICAL_LOCATION*Location;
+  UINT32   *CoreArray;
+  UINTNIndex;
+  UINT32   ValidCount;
+  UINTNCoreIndex;
+  ACPI_CPU_DATA*AcpiCpuData;
+  CPU_STATUS_INFORMATION   *CpuStatus;
+
+  Core= 0;
+  Package = 0;
+  Thread  = 0;
  
CpuFeaturesData = GetCpuFeaturesData ();

CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof 
(CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
@@ -163,6 +177,16 @@ CpuInitDataInitialize (
  Entry = Entry->ForwardLink;
}
  
+  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;

+
+  

[edk2] [Patch 3/4] UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type.

2018-10-14 Thread Eric Dong
In a system which has multiple cores, current set register value task costs 
huge times.
After investigation, current set MSR task costs most of the times. Current 
logic uses
SpinLock to let set MSR task as an single thread task for all cores. Because 
MSR has
scope attribute which may cause GP fault if multiple APs set MSR at the same 
time,
current logic use an easiest solution (use SpinLock) to avoid this issue, but 
it will
cost huge times.

In order to fix this performance issue, new solution will set MSRs base on 
their scope
attribute. After this, the SpinLock will not needed. Without SpinLock, new 
issue raised
which is caused by MSR dependence. For example, MSR A depends on MSR B which 
means MSR A
must been set after MSR B has been set. Also MSR B is package scope level and 
MSR A is
thread scope level. If system has multiple threads, Thread 1 needs to set the 
thread level
MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for 
thread 1
and thread 2 like below:

Thread 1 Thread 2
MSR B  NY
MSR A  YY

If driver don't control execute MSR order, for thread 1, it will execute MSR A 
first, but
at this time, MSR B not been executed yet by thread 2. system may trig 
exception at this
time.

In order to fix the above issue, driver introduces semaphore logic to control 
the MSR
execute sequence. For the above case, a semaphore will be add between MSR A and 
B for
all threads. Semaphore has scope info for it. The possible scope value is core 
or package.
For each thread, when it meets a semaphore during it set registers, it will 1) 
release
semaphore (+1) for each threads in this core or package(based on the scope info 
for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or 
package(based
on the scope info for this semaphore). With these two steps, driver can control 
MSR
sequence. Sample code logic like below:

  //
  // First increase semaphore count by 1 for processors in this package.
  //
  for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; 
ProcessorIndex ++) {
LibReleaseSemaphore ((UINT32 *) [PackageOffset + 
ProcessorIndex]);
  }
  //
  // Second, check whether the count has reach the check number.
  //
  for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
LibWaitForSemaphore ([ApOffset]);
  }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still 
register MSR
   for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But 
semaphore logic
   requires Aps execute in async mode which is not supported by PEI driver. So 
CpuFeature
   PEI instance not works after this change. We plan to support async mode for 
PEI in phase
   2 for this task.

Cc: Ruiyu Ni 
Cc: Laszlo Ersek 
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong 
---
 .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ---
 .../DxeRegisterCpuFeaturesLib.c|  71 +++-
 .../DxeRegisterCpuFeaturesLib.inf  |   3 +
 .../PeiRegisterCpuFeaturesLib.c|  55 ++-
 .../PeiRegisterCpuFeaturesLib.inf  |   1 +
 .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  51 ++-
 .../RegisterCpuFeaturesLib.c   | 452 ++---
 7 files changed, 840 insertions(+), 117 deletions(-)

diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
index ba3fb3250f..f820b4fed7 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
@@ -145,6 +145,20 @@ CpuInitDataInitialize (
   CPU_FEATURES_INIT_ORDER  *InitOrder;
   CPU_FEATURES_DATA*CpuFeaturesData;
   LIST_ENTRY   *Entry;
+  UINT32   Core;
+  UINT32   Package;
+  UINT32   Thread;
+  EFI_CPU_PHYSICAL_LOCATION*Location;
+  UINT32   *CoreArray;
+  UINTNIndex;
+  UINT32   ValidCount;
+  UINTNCoreIndex;
+  ACPI_CPU_DATA*AcpiCpuData;
+  CPU_STATUS_INFORMATION   *CpuStatus;
+
+  Core= 0;
+  Package = 0;
+  Thread  = 0;
 
   CpuFeaturesData = GetCpuFeaturesData ();
   CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof 
(CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
@@ -163,6 +177,16 @@ CpuInitDataInitialize (
 Entry = Entry->ForwardLink;
   }
 
+  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
+
+  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
+