Re: [Intel-gfx] [PATCH v5] drm/i915: Use Transparent Hugepages when IOMMU is enabled

2021-09-10 Thread Tvrtko Ursulin



On 09/09/2021 17:17, Rodrigo Vivi wrote:

On Thu, Sep 09, 2021 at 12:44:48PM +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Usage of Transparent Hugepages was disabled in 9987da4b5dcf
("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it
appears majority of performance regressions reported with an enabled IOMMU
can be almost eliminated by turning them on, lets just do that.

To err on the side of safety we keep the current default in cases where
IOMMU is not active, and only when it is default to the "huge=within_size"
mode. Although there probably would be wins to enable them throughout,
more extensive testing across benchmarks and platforms would need to be
done.

With the patch and IOMMU enabled my local testing on a small Skylake part
shows OglVSTangent regression being reduced from ~14% (IOMMU on versus
IOMMU off) to ~2% (same comparison but with THP on).

More detailed testing done in the below referenced Gitlab issue by Eero:

Skylake GT4e:

Performance drops from enabling IOMMU:

 30-35% SynMark CSDof
 20-25% Unigine Heaven, MemBW GPU write, SynMark VSTangent
 ~20% GLB Egypt  (1/2 screen window)
 10-15% GLB T-Rex (1/2 screen window)
 8-10% GfxBench T-Rex, MemBW GPU blit
 7-8% SynMark DeferredAA + TerrainFly* + ZBuffer
 6-7% GfxBench Manhattan 3.0 + 3.1, SynMark TexMem128 & CSCloth
 5-6% GfxBench CarChase, Unigine Valley
 3-5% GfxBench Vulkan & GL AztecRuins + ALU2, MemBW GPU texture,
  SynMark Fill*, Deferred, TerrainPan*
 1-2% Most of the other tests

With the patch drops become:

 20-25% SynMark TexMem*
 15-20% GLB Egypt (1/2 screen window)
 10-15% GLB T-Rex (1/2 screen window)
 4-7% GfxBench T-Rex, GpuTest Triangle
 1-8% GfxBench ALU2 (offscreen 1%, onscreen 8%)
 3% GfxBench Manhattan 3.0, SynMark CSDof
 2-3% Unigine Heaven + Valley, MemBW GPU texture
 1-3 GfxBench Manhattan 3.1 + CarChase + Vulkan & GL AztecRuins

Broxton:

Performance drops from IOMMU, without patch:

 30% MemBW GPU write
 25% SynMark ZBuffer + Fill*
 20% MemBW GPU blit
 15% MemBW GPU blend, GpuTest Triangle
 10-15% MemBW GPU texture
 10% GLB Egypt, Unigine Heaven (had hangs), SynMark TerrainFly*
 7-9% GLB T-Rex, GfxBench Manhattan 3.0 + T-Rex,
  SynMark Deferred* + TexMem*
 6-8% GfxBench CarChase, Unigine Valley,
  SynMark CSCloth + ShMapVsm + TerrainPan*
 5-6% GfxBench Manhattan 3.1 + GL AztecRuins,
  SynMark CSDof + TexFilterTri
 2-4% GfxBench ALU2, SynMark DrvRes + GSCloth + ShMapPcf + Batch[0-5] +
  TexFilterAniso, GpuTest GiMark + 32-bit Julia

And with patch:

 15-20% MemBW GPU texture
 10% SynMark TexMem*
 8-9% GLB Egypt (1/2 screen window)
 4-5% GLB T-Rex (1/2 screen window)
 3-6% GfxBench Manhattan 3.0, GpuTest FurMark,
  SynMark Deferred + TexFilterTri
 3-4% GfxBench Manhattan 3.1 + T-Rex, SynMark VSInstancing
 2-4% GpuTest Triangle, SynMark DeferredAA
 2-3% Unigine Heaven + Valley
 1-3% SynMark Terrain*
 1-2% GfxBench CarChase, SynMark TexFilterAniso + ZBuffer

Tigerlake-H:

 20-25% MemBW GPU texture
 15-20% GpuTest Triangle
 13-15% SynMark TerrainFly* + DeferredAA + HdrBloom
 8-10% GfxBench Manhattan 3.1, SynMark TerrainPan* + DrvRes
 6-7% GfxBench Manhattan 3.0, SynMark TexMem*
 4-8% GLB onscreen Fill + T-Rex + Egypt (more in onscreen than
  offscreen versions of T-Rex/Egypt)
 4-6% GfxBench CarChase + GLES AztecRuins + ALU2, GpuTest 32-bit Julia,
  SynMark CSDof + DrvState
 3-5% GfxBench T-Rex + Egypt, Unigine Heaven + Valley, GpuTest Plot3D
 1-7% Media tests
 2-3% MemBW GPU blit
 1-3% Most of the rest of 3D tests

With the patch:

 6-8% MemBW GPU blend => the only regression in these tests (compared
  to IOMMU without THP)
 4-6% SynMark DrvState (not impacted) + HdrBloom (improved)
 3-4% GLB T-Rex
 ~3% GLB Egypt, SynMark DrvRes
 1-3% GfxBench T-Rex + Egypt, SynMark TexFilterTri
 1-2% GfxBench CarChase + GLES AztecRuins, Unigine Valley,
 GpuTest Triangle
 ~1% GfxBench Manhattan 3.0/3.1, Unigine Heaven

Perf of several tests actually improved with IOMMU + THP, compared to no
IOMMU / no THP:

 10-15% SynMark Batch[0-3]
 5-10% MemBW GPU texture, SynMark ShMapVsm
 3-4% SynMark Fill* + Geom*
 2-3% SynMark TexMem512 + CSCloth
 1-2% SynMark TexMem128 + DeferredAA

As a summary across all platforms, these are the benchmarks where enabling
THP on top of IOMMU enabled brings regressions:

  * Skylake GT4e:
20-25% SynMark TexMem*
(whereas all MemBW GPU tests either improve or are not affected)

  * Broxton J4205:
7% MemBW GPU texture
2-3% SynMark TexMem*

  * Tigerlake-H:
7% MemBW GPU blend

Other benchmarks show either lowering of regressions or improvements.

v2:
  * Add Kconfig dependency to transparent hugepages and some help text.
  * Move to helper for 

Re: [Intel-gfx] [PATCH v5] drm/i915: Use Transparent Hugepages when IOMMU is enabled

2021-09-09 Thread Rodrigo Vivi
On Thu, Sep 09, 2021 at 12:44:48PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Usage of Transparent Hugepages was disabled in 9987da4b5dcf
> ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it
> appears majority of performance regressions reported with an enabled IOMMU
> can be almost eliminated by turning them on, lets just do that.
> 
> To err on the side of safety we keep the current default in cases where
> IOMMU is not active, and only when it is default to the "huge=within_size"
> mode. Although there probably would be wins to enable them throughout,
> more extensive testing across benchmarks and platforms would need to be
> done.
> 
> With the patch and IOMMU enabled my local testing on a small Skylake part
> shows OglVSTangent regression being reduced from ~14% (IOMMU on versus
> IOMMU off) to ~2% (same comparison but with THP on).
> 
> More detailed testing done in the below referenced Gitlab issue by Eero:
> 
> Skylake GT4e:
> 
> Performance drops from enabling IOMMU:
> 
> 30-35% SynMark CSDof
> 20-25% Unigine Heaven, MemBW GPU write, SynMark VSTangent
> ~20% GLB Egypt  (1/2 screen window)
> 10-15% GLB T-Rex (1/2 screen window)
> 8-10% GfxBench T-Rex, MemBW GPU blit
> 7-8% SynMark DeferredAA + TerrainFly* + ZBuffer
> 6-7% GfxBench Manhattan 3.0 + 3.1, SynMark TexMem128 & CSCloth
> 5-6% GfxBench CarChase, Unigine Valley
> 3-5% GfxBench Vulkan & GL AztecRuins + ALU2, MemBW GPU texture,
>  SynMark Fill*, Deferred, TerrainPan*
> 1-2% Most of the other tests
> 
> With the patch drops become:
> 
> 20-25% SynMark TexMem*
> 15-20% GLB Egypt (1/2 screen window)
> 10-15% GLB T-Rex (1/2 screen window)
> 4-7% GfxBench T-Rex, GpuTest Triangle
> 1-8% GfxBench ALU2 (offscreen 1%, onscreen 8%)
> 3% GfxBench Manhattan 3.0, SynMark CSDof
> 2-3% Unigine Heaven + Valley, MemBW GPU texture
> 1-3 GfxBench Manhattan 3.1 + CarChase + Vulkan & GL AztecRuins
> 
> Broxton:
> 
> Performance drops from IOMMU, without patch:
> 
> 30% MemBW GPU write
> 25% SynMark ZBuffer + Fill*
> 20% MemBW GPU blit
> 15% MemBW GPU blend, GpuTest Triangle
> 10-15% MemBW GPU texture
> 10% GLB Egypt, Unigine Heaven (had hangs), SynMark TerrainFly*
> 7-9% GLB T-Rex, GfxBench Manhattan 3.0 + T-Rex,
>  SynMark Deferred* + TexMem*
> 6-8% GfxBench CarChase, Unigine Valley,
>  SynMark CSCloth + ShMapVsm + TerrainPan*
> 5-6% GfxBench Manhattan 3.1 + GL AztecRuins,
>  SynMark CSDof + TexFilterTri
> 2-4% GfxBench ALU2, SynMark DrvRes + GSCloth + ShMapPcf + Batch[0-5] +
>  TexFilterAniso, GpuTest GiMark + 32-bit Julia
> 
> And with patch:
> 
> 15-20% MemBW GPU texture
> 10% SynMark TexMem*
> 8-9% GLB Egypt (1/2 screen window)
> 4-5% GLB T-Rex (1/2 screen window)
> 3-6% GfxBench Manhattan 3.0, GpuTest FurMark,
>  SynMark Deferred + TexFilterTri
> 3-4% GfxBench Manhattan 3.1 + T-Rex, SynMark VSInstancing
> 2-4% GpuTest Triangle, SynMark DeferredAA
> 2-3% Unigine Heaven + Valley
> 1-3% SynMark Terrain*
> 1-2% GfxBench CarChase, SynMark TexFilterAniso + ZBuffer
> 
> Tigerlake-H:
> 
> 20-25% MemBW GPU texture
> 15-20% GpuTest Triangle
> 13-15% SynMark TerrainFly* + DeferredAA + HdrBloom
> 8-10% GfxBench Manhattan 3.1, SynMark TerrainPan* + DrvRes
> 6-7% GfxBench Manhattan 3.0, SynMark TexMem*
> 4-8% GLB onscreen Fill + T-Rex + Egypt (more in onscreen than
>  offscreen versions of T-Rex/Egypt)
> 4-6% GfxBench CarChase + GLES AztecRuins + ALU2, GpuTest 32-bit Julia,
>  SynMark CSDof + DrvState
> 3-5% GfxBench T-Rex + Egypt, Unigine Heaven + Valley, GpuTest Plot3D
> 1-7% Media tests
> 2-3% MemBW GPU blit
> 1-3% Most of the rest of 3D tests
> 
> With the patch:
> 
> 6-8% MemBW GPU blend => the only regression in these tests (compared
>  to IOMMU without THP)
> 4-6% SynMark DrvState (not impacted) + HdrBloom (improved)
> 3-4% GLB T-Rex
> ~3% GLB Egypt, SynMark DrvRes
> 1-3% GfxBench T-Rex + Egypt, SynMark TexFilterTri
> 1-2% GfxBench CarChase + GLES AztecRuins, Unigine Valley,
> GpuTest Triangle
> ~1% GfxBench Manhattan 3.0/3.1, Unigine Heaven
> 
> Perf of several tests actually improved with IOMMU + THP, compared to no
> IOMMU / no THP:
> 
> 10-15% SynMark Batch[0-3]
> 5-10% MemBW GPU texture, SynMark ShMapVsm
> 3-4% SynMark Fill* + Geom*
> 2-3% SynMark TexMem512 + CSCloth
> 1-2% SynMark TexMem128 + DeferredAA
> 
> As a summary across all platforms, these are the benchmarks where enabling
> THP on top of IOMMU enabled brings regressions:
> 
>  * Skylake GT4e:
>20-25% SynMark TexMem*
>(whereas all MemBW GPU tests either improve or are not affected)
> 
>  * Broxton J4205:
>7% MemBW GPU texture
>2-3% SynMark TexMem*
> 
>  * Tigerlake-H:
>7% MemBW GPU blend
> 
> Other benchmarks show either