Re: [edk2] [PATCH] UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and writes of IO ports

2016-04-08 Thread Laszlo Ersek
On 04/08/16 05:07, Ni, Ruiyu wrote:
> Laszlo,
> After I sent the mail to propose changing the CpuIo driver to improve IO 
> performance,
> I now saw your patch sent earlier than my last mail. A very interesting 
> feeling.

:)

Right; after we more or less identified the problem, I dove into the
core PciHostBridgeDxe driver, and saw that it relied on the CpuIo2
driver for the IO port access. I figured I could send a patch for that too.

> Reviewed-by: Ruiyu Ni 

Thanks! I'll send a new version with the .asm files as well, for the
MSFT toolchains' sake, as Jeff (and before him, Jordan) suggested.

Cheers
Laszlo

> 
> Regards,
> Ray
> 
> 
>> -Original Message-
>> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of 
>> Laszlo Ersek
>> Sent: Friday, April 8, 2016 5:52 AM
>> To: edk2-devel-01 
>> Cc: Ni, Ruiyu ; Justen, Jordan L 
>> ; Fan, Jeff ; Mark
>> 
>> Subject: [edk2] [PATCH] UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and 
>> writes of IO ports
>>
>> * Short description:
>>
>>  The CpuIoServiceRead() and CpuIoServiceWrite() functions transfer data
>>  between memory and IO ports with individual Io(Read|Write)(8|16|32)
>>  function calls, each in an appropriately set up loop.
>>
>>  On the Ia32 and X64 platforms however, FIFO reads and writes can be
>>  optimized, by coding them in assembly, and delegating the loop to the
>>  CPU, with the REP prefix.
>>
>>  On KVM virtualization hosts, this difference has a huge performance
>>  impact: if the loop is open-coded, then the virtual machine traps to the
>>  hypervisor on every single UINT8 / UINT16 / UINT32 transfer, whereas
>>  with the REP prefix, KVM can transfer up to a page of data per VM trap.
>>  This is especially noticeable with IDE PIO transfers, where all the data
>>  are squeezed through IO ports.
>>
>> * Long description:
>>
>>  The RootBridgeIoIoRW() function in
>>
>>PcAtChipsetPkg/PciHostBridgeDxe/PciRootBridgeIo.c
>>
>>  used to have the exact same IO port acces optimization, dating back
>>  verbatim to commit 1fd376d9792:
>>
>>PcAtChipsetPkg/PciHostBridgeDxe: Improve KVM FIFO I/O read/write
>>  performance
>>
>>  OvmfPkg cloned the "PcAtChipsetPkg/PciHostBridgeDxe" driver (for
>>  unrelated reasons), and inherited the optimization from PcAtChipsetPkg.
>>
>>  The "PcAtChipsetPkg/PciHostBridgeDxe" driver was ultimately removed in
>>  commit 111d79db47:
>>
>>PcAtChipsetPkg/PciHostBridge: Remove PciHostBridge driver
>>
>>  and OvmfPkg too was rebased to the new core Pci Host Bridge Driver, in
>>  commit 4014885ffd:
>>
>>OvmfPkg: switch to MdeModulePkg/Bus/Pci/PciHostBridgeDxe
>>
>>  This caused the optimization to go lost. Namely, the
>>  RootBridgeIoIoRead() and RootBridgeIoIoWrite() functions in the new core
>>  Pci Host Bridge Driver delegate IO port accesses to
>>  EFI_CPU_IO2_PROTOCOL. And, in OvmfPkg (and likely most other Ia32 / X64
>>  edk2 platforms), this protocol is provided by "UefiCpuPkg/CpuIo2Dxe",
>>  which lacks the optimization.
>>
>>  Therefore, this patch ports the C source code logic from commit
>>  1fd376d9792 (see above) to "UefiCpuPkg/CpuIo2Dxe", plus it ports the
>>  NASM-converted assembly helper functions from OvmfPkg commits
>>  6026bf460037 and ace1d0517b65:
>>
>>OvmfPkg PciHostBridgeDxe: Convert Ia32/IoFifo.asm to NASM
>>
>>OvmfPkg PciHostBridgeDxe: Convert X64/IoFifo.asm to NASM
>>
>> * Notes about the port:
>>
>>  - The write and read branches from commit 1fd376d9792 are split to the
>>separate functions CpuIoServiceWrite() and CpuIoServiceRead().
>>
>>  - The EfiPciWidthUintXX constants are replaced with EfiCpuIoWidthUintXX.
>>
>>  - The cast expression "(UINTN) Address" is replaced with
>>"(UINTN)Address" (i.e., no space), because that's how the receiving
>>functions spell it as well.
>>
>>  - The labels in the switch statements are unindented by one level, to
>>match the edk2 coding style (and the rest of UefiCpuPkg) better.
>>
>> * The first signoff belongs to Jordan, because he authored all of
>>  1fd376d9792, 6026bf460037 and ace1d0517b65.
>>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Jordan Justen 
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Laszlo Ersek 
>> Ref: https://www.redhat.com/archives/vfio-users/2016-April/msg00029.html
>> Reported-by: Mark 
>> Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/10424/focus=10432
>> Reported-by: Jordan Justen 
>> Cc: Jordan Justen 
>> Cc: Ruiyu Ni 
>> Cc: Jeff Fan 
>> Cc: Mark 
>> ---
>> UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf|   7 +
>> UefiCpuPkg/CpuIo2Dxe/IoFifo.h | 176 
>> 

Re: [edk2] [PATCH] UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and writes of IO ports

2016-04-07 Thread Fan, Jeff
Laszlo,

Please add .asm file implementation to support MSFT tool chain also, because 
NASM is not required for MSTF tool chain now.
If you cannot verify MSFT build, I could help to verify your patch with .asm 
file.

Jeff
-Original Message-
From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of Laszlo 
Ersek
Sent: Friday, April 08, 2016 5:52 AM
To: edk2-devel-01
Cc: Ni, Ruiyu; Justen, Jordan L; Fan, Jeff; Mark
Subject: [edk2] [PATCH] UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and writes 
of IO ports

* Short description:

  The CpuIoServiceRead() and CpuIoServiceWrite() functions transfer data
  between memory and IO ports with individual Io(Read|Write)(8|16|32)
  function calls, each in an appropriately set up loop.

  On the Ia32 and X64 platforms however, FIFO reads and writes can be
  optimized, by coding them in assembly, and delegating the loop to the
  CPU, with the REP prefix.

  On KVM virtualization hosts, this difference has a huge performance
  impact: if the loop is open-coded, then the virtual machine traps to the
  hypervisor on every single UINT8 / UINT16 / UINT32 transfer, whereas
  with the REP prefix, KVM can transfer up to a page of data per VM trap.
  This is especially noticeable with IDE PIO transfers, where all the data
  are squeezed through IO ports.

* Long description:

  The RootBridgeIoIoRW() function in

PcAtChipsetPkg/PciHostBridgeDxe/PciRootBridgeIo.c

  used to have the exact same IO port acces optimization, dating back
  verbatim to commit 1fd376d9792:

PcAtChipsetPkg/PciHostBridgeDxe: Improve KVM FIFO I/O read/write
  performance

  OvmfPkg cloned the "PcAtChipsetPkg/PciHostBridgeDxe" driver (for
  unrelated reasons), and inherited the optimization from PcAtChipsetPkg.

  The "PcAtChipsetPkg/PciHostBridgeDxe" driver was ultimately removed in
  commit 111d79db47:

PcAtChipsetPkg/PciHostBridge: Remove PciHostBridge driver

  and OvmfPkg too was rebased to the new core Pci Host Bridge Driver, in
  commit 4014885ffd:

OvmfPkg: switch to MdeModulePkg/Bus/Pci/PciHostBridgeDxe

  This caused the optimization to go lost. Namely, the
  RootBridgeIoIoRead() and RootBridgeIoIoWrite() functions in the new core
  Pci Host Bridge Driver delegate IO port accesses to
  EFI_CPU_IO2_PROTOCOL. And, in OvmfPkg (and likely most other Ia32 / X64
  edk2 platforms), this protocol is provided by "UefiCpuPkg/CpuIo2Dxe",
  which lacks the optimization.

  Therefore, this patch ports the C source code logic from commit
  1fd376d9792 (see above) to "UefiCpuPkg/CpuIo2Dxe", plus it ports the
  NASM-converted assembly helper functions from OvmfPkg commits
  6026bf460037 and ace1d0517b65:

OvmfPkg PciHostBridgeDxe: Convert Ia32/IoFifo.asm to NASM

OvmfPkg PciHostBridgeDxe: Convert X64/IoFifo.asm to NASM

* Notes about the port:

  - The write and read branches from commit 1fd376d9792 are split to the
separate functions CpuIoServiceWrite() and CpuIoServiceRead().

  - The EfiPciWidthUintXX constants are replaced with EfiCpuIoWidthUintXX.

  - The cast expression "(UINTN) Address" is replaced with
"(UINTN)Address" (i.e., no space), because that's how the receiving
functions spell it as well.

  - The labels in the switch statements are unindented by one level, to
match the edk2 coding style (and the rest of UefiCpuPkg) better.

* The first signoff belongs to Jordan, because he authored all of
  1fd376d9792, 6026bf460037 and ace1d0517b65.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jordan Justen 
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek 
Ref: https://www.redhat.com/archives/vfio-users/2016-April/msg00029.html
Reported-by: Mark 
Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/10424/focus=10432
Reported-by: Jordan Justen 
Cc: Jordan Justen 
Cc: Ruiyu Ni 
Cc: Jeff Fan 
Cc: Mark 
---
 UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf|   7 +
 UefiCpuPkg/CpuIo2Dxe/IoFifo.h | 176 
 UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.c  |  49 ++
 UefiCpuPkg/CpuIo2Dxe/Ia32/IoFifo.nasm | 136 +++  
UefiCpuPkg/CpuIo2Dxe/X64/IoFifo.nasm  | 125 ++
 5 files changed, 493 insertions(+)

diff --git a/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf 
b/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf
index 8ef8b3d31cff..be79b1b3b992 100644
--- a/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf
+++ b/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf
@@ -30,7 +30,14 @@ [Defines]
 [Sources]
   CpuIo2Dxe.c
   CpuIo2Dxe.h
+  IoFifo.h
   
+[Sources.IA32]
+  Ia32/IoFifo.nasm
+
+[Sources.X64]
+  X64/IoFifo.nasm
+
 [Packages]
   MdePkg/MdePkg.dec
 
diff --git a/UefiCpuPkg/CpuIo2Dxe/IoFifo.h b/UefiCpuPkg/CpuIo2Dxe/IoFifo.h new 
file mode 100644 index ..9978f8bfc39a
--- /dev/null
+++ b/UefiCpuPkg/CpuIo2Dxe/IoFifo.h
@@ -0,0 +1,176 

Re: [edk2] [PATCH] UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and writes of IO ports

2016-04-07 Thread Ni, Ruiyu
Laszlo,
After I sent the mail to propose changing the CpuIo driver to improve IO 
performance,
I now saw your patch sent earlier than my last mail. A very interesting feeling.

Reviewed-by: Ruiyu Ni 

Regards,
Ray


>-Original Message-
>From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of Laszlo 
>Ersek
>Sent: Friday, April 8, 2016 5:52 AM
>To: edk2-devel-01 
>Cc: Ni, Ruiyu ; Justen, Jordan L 
>; Fan, Jeff ; Mark
>
>Subject: [edk2] [PATCH] UefiCpuPkg: CpuIo2Dxe: optimize FIFO reads and writes 
>of IO ports
>
>* Short description:
>
>  The CpuIoServiceRead() and CpuIoServiceWrite() functions transfer data
>  between memory and IO ports with individual Io(Read|Write)(8|16|32)
>  function calls, each in an appropriately set up loop.
>
>  On the Ia32 and X64 platforms however, FIFO reads and writes can be
>  optimized, by coding them in assembly, and delegating the loop to the
>  CPU, with the REP prefix.
>
>  On KVM virtualization hosts, this difference has a huge performance
>  impact: if the loop is open-coded, then the virtual machine traps to the
>  hypervisor on every single UINT8 / UINT16 / UINT32 transfer, whereas
>  with the REP prefix, KVM can transfer up to a page of data per VM trap.
>  This is especially noticeable with IDE PIO transfers, where all the data
>  are squeezed through IO ports.
>
>* Long description:
>
>  The RootBridgeIoIoRW() function in
>
>PcAtChipsetPkg/PciHostBridgeDxe/PciRootBridgeIo.c
>
>  used to have the exact same IO port acces optimization, dating back
>  verbatim to commit 1fd376d9792:
>
>PcAtChipsetPkg/PciHostBridgeDxe: Improve KVM FIFO I/O read/write
>  performance
>
>  OvmfPkg cloned the "PcAtChipsetPkg/PciHostBridgeDxe" driver (for
>  unrelated reasons), and inherited the optimization from PcAtChipsetPkg.
>
>  The "PcAtChipsetPkg/PciHostBridgeDxe" driver was ultimately removed in
>  commit 111d79db47:
>
>PcAtChipsetPkg/PciHostBridge: Remove PciHostBridge driver
>
>  and OvmfPkg too was rebased to the new core Pci Host Bridge Driver, in
>  commit 4014885ffd:
>
>OvmfPkg: switch to MdeModulePkg/Bus/Pci/PciHostBridgeDxe
>
>  This caused the optimization to go lost. Namely, the
>  RootBridgeIoIoRead() and RootBridgeIoIoWrite() functions in the new core
>  Pci Host Bridge Driver delegate IO port accesses to
>  EFI_CPU_IO2_PROTOCOL. And, in OvmfPkg (and likely most other Ia32 / X64
>  edk2 platforms), this protocol is provided by "UefiCpuPkg/CpuIo2Dxe",
>  which lacks the optimization.
>
>  Therefore, this patch ports the C source code logic from commit
>  1fd376d9792 (see above) to "UefiCpuPkg/CpuIo2Dxe", plus it ports the
>  NASM-converted assembly helper functions from OvmfPkg commits
>  6026bf460037 and ace1d0517b65:
>
>OvmfPkg PciHostBridgeDxe: Convert Ia32/IoFifo.asm to NASM
>
>OvmfPkg PciHostBridgeDxe: Convert X64/IoFifo.asm to NASM
>
>* Notes about the port:
>
>  - The write and read branches from commit 1fd376d9792 are split to the
>separate functions CpuIoServiceWrite() and CpuIoServiceRead().
>
>  - The EfiPciWidthUintXX constants are replaced with EfiCpuIoWidthUintXX.
>
>  - The cast expression "(UINTN) Address" is replaced with
>"(UINTN)Address" (i.e., no space), because that's how the receiving
>functions spell it as well.
>
>  - The labels in the switch statements are unindented by one level, to
>match the edk2 coding style (and the rest of UefiCpuPkg) better.
>
>* The first signoff belongs to Jordan, because he authored all of
>  1fd376d9792, 6026bf460037 and ace1d0517b65.
>
>Contributed-under: TianoCore Contribution Agreement 1.0
>Signed-off-by: Jordan Justen 
>Contributed-under: TianoCore Contribution Agreement 1.0
>Signed-off-by: Laszlo Ersek 
>Ref: https://www.redhat.com/archives/vfio-users/2016-April/msg00029.html
>Reported-by: Mark 
>Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/10424/focus=10432
>Reported-by: Jordan Justen 
>Cc: Jordan Justen 
>Cc: Ruiyu Ni 
>Cc: Jeff Fan 
>Cc: Mark 
>---
> UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf|   7 +
> UefiCpuPkg/CpuIo2Dxe/IoFifo.h | 176 
> UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.c  |  49 ++
> UefiCpuPkg/CpuIo2Dxe/Ia32/IoFifo.nasm | 136 +++
> UefiCpuPkg/CpuIo2Dxe/X64/IoFifo.nasm  | 125 ++
> 5 files changed, 493 insertions(+)
>
>diff --git a/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf 
>b/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf
>index 8ef8b3d31cff..be79b1b3b992 100644
>--- a/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf
>+++ b/UefiCpuPkg/CpuIo2Dxe/CpuIo2Dxe.inf
>@@ -30,7 +30,14 @@ [Defines]
> [Sources]
>   CpuIo2Dxe.c
>   CpuIo2Dxe.h
>+  IoFifo.h
>
>+[Sources.IA32]
>+  Ia32/IoFifo.nasm
>+
>+[Sources.X64]
>+