Re: [Intel-gfx] [PATCH v7 2/2] drm/i915: use pat_index instead of cache_level

2023-05-11 Thread Tvrtko Ursulin



On 09/05/2023 18:12, Yang, Fei wrote:

 > On 09/05/2023 00:48, fei.y...@intel.com wrote:
 >> From: Fei Yang 
 >>
 >> Currently the KMD is using enum i915_cache_level to set caching 
policy for
 >> buffer objects. This is flaky because the PAT index which really 
controls
 >> the caching behavior in PTE has far more levels than what's defined 
in the
 >> enum. In addition, the PAT index is platform dependent, having to 
translate
 >> between i915_cache_level and PAT index is not reliable, and makes 
the code

 >> more complicated.
 >>
 >> From UMD's perspective there is also a necessity to set caching 
policy for
 >> performance fine tuning. It's much easier for the UMD to directly 
use PAT
 >> index because the behavior of each PAT index is clearly defined in 
Bspec.
 >> Having the abstracted i915_cache_level sitting in between would only 
cause
 >> more ambiguity. PAT is expected to work much like MOCS already works 
today,

 >> and by design userspace is expected to select the index that exactly
 >> matches the desired behavior described in the hardware specification.
 >>
 >> For these reasons this patch replaces i915_cache_level with PAT 
index. Also
 >> note, the cache_level is not completely removed yet, because the KMD 
still
 >> has the need of creating buffer objects with simple cache settings 
such as
 >> cached, uncached, or writethrough. For kernel objects, cache_level 
is used
 >> for simplicity and backward compatibility. For Pre-gen12 platforms 
PAT can
 >> have 1:1 mapping to i915_cache_level, so these two are 
interchangeable. see

 >> the use of LEGACY_CACHELEVEL.
 >>
 >> One consequence of this change is that gen8_pte_encode is no longer 
working
 >> for gen12 platforms due to the fact that gen12 platforms has 
different PAT
 >> definitions. In the meantime the mtl_pte_encode introduced 
specfically for

 >> MTL becomes generic for all gen12 platforms. This patch renames the MTL
 >> PTE encode function into gen12_pte_encode and apply it to all gen12. 
Even
 >> though this change looks unrelated, but separating them would 
temporarily

 >> break gen12 PTE encoding, thus squash them in one patch.
 >>
 >> Special note: this patch changes the way caching behavior is 
controlled in
 >> the sense that some objects are left to be managed by userspace. For 
such

 >> objects we need to be careful not to change the userspace settings.There
 >> are kerneldoc and comments added around obj->cache_coherent, 
cache_dirty,

 >> and how to bypass the checkings by i915_gem_object_has_cache_level. For
 >> full understanding, these changes need to be looked at together with the
 >> two follow-up patches, one disables the {set|get}_caching ioctl's 
and the

 >> other adds set_pat extension to the GEM_CREATE uAPI.
 >>
 >> Bspec: 63019
 >>
 >> Cc: Chris Wilson 
 >> Signed-off-by: Fei Yang 
 >> Reviewed-by: Andi Shyti 
 >> Reviewed-by: Matt Roper 


[snip]


 >> +                                          node.start,
 >> +                                          i915_gem_get_pat_index(i915,
 >> + 
I915_CACHE_NONE), 0);
 >>                        wmb(); /* flush modifications to the GGTT 
(insert_page) */

 >>                } else {
 >>                        page_base += offset & PAGE_MASK;
 >> @@ -1142,6 +1148,19 @@ int i915_gem_init(struct drm_i915_private 
*dev_priv)

 >>        unsigned int i;
 >>        int ret;
 >>
 >> +     /*
 >> +      * In the proccess of replacing cache_level with pat_index a 
tricky
 >> +      * dependency is created on the definition of the enum 
i915_cache_level.

 >> +      * in case this enum is changed, PTE encode would be broken.
 >
 >_I_n

Sorry, what does this mean?


Start of sentence, capital 'i'.

[snip]


 > With a pinky promise to improve this all in the near future I won't
 > grumble to loudly. :) I haven't read all the details, I leave that to
 > other reviewers, and also assuming some final tweaks as indicated above
 > please.

Thanks for all the suggestions, really appreciated.
May I add your Acked-by?


I can't make myself do it since I really don't like the design that 
much. That's why I said I will not grumble too loudly.


Jira for follow up clean since we both agreed something more elegant is 
possible would be appreciated though.


Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v7 2/2] drm/i915: use pat_index instead of cache_level

2023-05-09 Thread Tvrtko Ursulin



On 09/05/2023 00:48, fei.y...@intel.com wrote:

From: Fei Yang 

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.


From UMD's perspective there is also a necessity to set caching policy for

performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.

For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.

One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.

Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.

Bspec: 63019

Cc: Chris Wilson 
Signed-off-by: Fei Yang 
Reviewed-by: Andi Shyti 
Reviewed-by: Matt Roper 

To be squashed
---
  drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
  drivers/gpu/drm/i915/gem/i915_gem_domain.c| 58 +
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 15 +++-
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  | 11 ++-
  drivers/gpu/drm/i915/gem/i915_gem_object.c| 51 ++-
  drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 44 +-
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  8 +-
  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
  .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 84 +--
  drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 82 +-
  drivers/gpu/drm/i915/gt/intel_gtt.h   | 18 ++--
  drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
  drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
  drivers/gpu/drm/i915/gt/intel_ppgtt.c |  4 +-
  drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 ++-
  drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
  drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
  drivers/gpu/drm/i915/i915_debugfs.c   | 53 +---
  drivers/gpu/drm/i915/i915_gem.c   | 27 +-
  drivers/gpu/drm/i915/i915_gpu_error.c |  8 +-
  drivers/gpu/drm/i915/i915_vma.c   | 16 ++--
  drivers/gpu/drm/i915/i915_vma.h   |  2 +-
  drivers/gpu/drm/i915/i915_vma_types.h |  2 -
  drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +-
  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
  .../drm/i915/selftests/intel_memory_region.c  |  4 +-
  drivers/gpu/drm/i915/selftests/mock_gtt.c |  8 +-
  36 files changed, 451 insertions(+), 238 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index c5eacfdba1a5..7c5fddb203ba 100644
--- 

Re: [Intel-gfx] [PATCH v7 2/2] drm/i915: use pat_index instead of cache_level

2023-05-09 Thread Andi Shyti
Hi Fei,

On Mon, May 08, 2023 at 04:48:54PM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> Currently the KMD is using enum i915_cache_level to set caching policy for
> buffer objects. This is flaky because the PAT index which really controls
> the caching behavior in PTE has far more levels than what's defined in the
> enum. In addition, the PAT index is platform dependent, having to translate
> between i915_cache_level and PAT index is not reliable, and makes the code
> more complicated.
> 
> >From UMD's perspective there is also a necessity to set caching policy for
> performance fine tuning. It's much easier for the UMD to directly use PAT
> index because the behavior of each PAT index is clearly defined in Bspec.
> Having the abstracted i915_cache_level sitting in between would only cause
> more ambiguity. PAT is expected to work much like MOCS already works today,
> and by design userspace is expected to select the index that exactly
> matches the desired behavior described in the hardware specification.
> 
> For these reasons this patch replaces i915_cache_level with PAT index. Also
> note, the cache_level is not completely removed yet, because the KMD still
> has the need of creating buffer objects with simple cache settings such as
> cached, uncached, or writethrough. For kernel objects, cache_level is used
> for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
> have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
> the use of LEGACY_CACHELEVEL.
> 
> One consequence of this change is that gen8_pte_encode is no longer working
> for gen12 platforms due to the fact that gen12 platforms has different PAT
> definitions. In the meantime the mtl_pte_encode introduced specfically for
> MTL becomes generic for all gen12 platforms. This patch renames the MTL
> PTE encode function into gen12_pte_encode and apply it to all gen12. Even
> though this change looks unrelated, but separating them would temporarily
> break gen12 PTE encoding, thus squash them in one patch.
> 
> Special note: this patch changes the way caching behavior is controlled in
> the sense that some objects are left to be managed by userspace. For such
> objects we need to be careful not to change the userspace settings.There
> are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
> and how to bypass the checkings by i915_gem_object_has_cache_level. For
> full understanding, these changes need to be looked at together with the
> two follow-up patches, one disables the {set|get}_caching ioctl's and the
> other adds set_pat extension to the GEM_CREATE uAPI.
> 
> Bspec: 63019
> 
> Cc: Chris Wilson 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andi Shyti 
> Reviewed-by: Matt Roper 
> 
> To be squashed

Ha! you forgot to remove this... I also do the same :)

No worries, if the patch is right I'll fix it before pushing it.

Tvrtko? Any opinion?

Andi


[Intel-gfx] [PATCH v7 2/2] drm/i915: use pat_index instead of cache_level

2023-05-08 Thread fei . yang
From: Fei Yang 

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.

>From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.

For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.

One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.

Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.

Bspec: 63019

Cc: Chris Wilson 
Signed-off-by: Fei Yang 
Reviewed-by: Andi Shyti 
Reviewed-by: Matt Roper 

To be squashed
---
 drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
 drivers/gpu/drm/i915/gem/i915_gem_domain.c| 58 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 15 +++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  | 11 ++-
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 51 ++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 44 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  8 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 84 +--
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c  | 82 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 18 ++--
 drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
 drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  4 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 ++-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
 drivers/gpu/drm/i915/i915_debugfs.c   | 53 +---
 drivers/gpu/drm/i915/i915_gem.c   | 27 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |  8 +-
 drivers/gpu/drm/i915/i915_vma.c   | 16 ++--
 drivers/gpu/drm/i915/i915_vma.h   |  2 +-
 drivers/gpu/drm/i915/i915_vma_types.h |  2 -
 drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
 .../drm/i915/selftests/intel_memory_region.c  |  4 +-
 drivers/gpu/drm/i915/selftests/mock_gtt.c |  8 +-
 36 files changed, 451 insertions(+), 238 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index c5eacfdba1a5..7c5fddb203ba 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -43,24 +43,24