Re: [PATCH v5 2/3] drm/i915: use pat_index instead of cache_level

2023-05-08 Thread Yang, Fei
> On Sun, May 07, 2023 at 11:39:18PM -0700, Yang, Fei wrote:
>>> On Wed, May 03, 2023 at 03:50:59PM -0700, fei.y...@intel.com wrote:
 From: Fei Yang 

 Currently the KMD is using enum i915_cache_level to set caching policy for
 buffer objects. This is flaky because the PAT index which really controls
 the caching behavior in PTE has far more levels than what's defined in the
 enum. In addition, the PAT index is platform dependent, having to translate
 between i915_cache_level and PAT index is not reliable, and makes the code
 more complicated.

 From UMD's perspective there is also a necessity to set caching policy for
 performance fine tuning. It's much easier for the UMD to directly use PAT
 index because the behavior of each PAT index is clearly defined in Bspec.
 Having the abstracted i915_cache_level sitting in between would only cause
 more ambiguity.
>>>
>>> It may be worth mentioning here that PAT is expected to work much like
>>> MOCS already works today --- the contract on the exact platform-specific
>>> meaning of each index is documented in the hardware spec and userspace
>>> is expected to select the index that exactly matches the behavior it
>>> desires.
>> will update.

Please review https://patchwork.freedesktop.org/series/117480/


 For these reasons this patch replaces i915_cache_level with PAT index. Also
 note, the cache_level is not completely removed yet, because the KMD still
 has the need of creating buffer objects with simple cache settings such as
 cached, uncached, or writethrough. For such simple cases, using cache_level
 would help simplify the code.
>>>
>>> See my comment farther down, but the implementation of
>>> i915_gem_object_has_cache_level() seems a bit confusing and you may want
>>> to elaborate on it here.
>>>
>>> Also somewhat confusing from a high-level skim is if/how we're still
>>> using obj->cache_coherent once userspace has taken direct control of the
>>> cache behavior.  Some PAT indices give coherency and others don't (and
>>> the combinations will likely get more complicated on future platforms).
>>> If obj->cache_coherent is still being considered even once PAT indices
>>> are being controlled by userspace, I think we need some explanation of
>>> how that works in the commit message (and likely in the kerneldoc for
>>> that field too).
>> For the objects with pat_index set by user space, coherency is managed by
>> user space. Things like obj->cache_coherent and obj->cache_dirty are still
>> there for objects created by kernel.
>
> Right, that's the challenge --- userspace is taking over control of this
> stuff, but the fields are still around and still used internally within
> the driver.  How we reconcile those two things needs to be clearly
> explained in the commit message and kerneldoc, otherwise we're going to
> wind up with confusing code that's very difficult to maintain down the
> road.
>
> All the cache behavior is complicated enough that it probably wouldn't
> be a bad idea to have a dedicated section of the kerneldoc focused on
> cache behavior.

Updated with kerneldoc and comments.
As discussed off-line, I have also squashed the pte_encode patch.


 Cc: Chris Wilson 
 Cc: Matt Roper 
 Signed-off-by: Fei Yang 
 Reviewed-by: Andi Shyti 
 ---
  drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
  drivers/gpu/drm/i915/gem/i915_gem_domain.c| 45 ++
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++-
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
  drivers/gpu/drm/i915/gem/i915_gem_object.c| 51 +++-
  drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +-
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
  .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 71 
  drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 82 +--
  drivers/gpu/drm/i915/gt/intel_gtt.h   | 20 ++---
  drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
  drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
  drivers/gpu/drm/i915/gt/intel_ppgtt.c |  6 +-
  drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 ++-
  drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
  drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
  drivers/gpu/drm/i915/i915_debugfs.c   | 52

Re: [PATCH v5 2/3] drm/i915: use pat_index instead of cache_level

2023-05-08 Thread Matt Roper
On Sun, May 07, 2023 at 11:39:18PM -0700, Yang, Fei wrote:
>> On Wed, May 03, 2023 at 03:50:59PM -0700, fei.y...@intel.com wrote:
>>> From: Fei Yang 
>>>
>>> Currently the KMD is using enum i915_cache_level to set caching policy
>for
>>> buffer objects. This is flaky because the PAT index which really
>controls
>>> the caching behavior in PTE has far more levels than what's defined in
>the
>>> enum. In addition, the PAT index is platform dependent, having to
>translate
>>> between i915_cache_level and PAT index is not reliable, and makes the
>code
>>> more complicated.
>>>
>>> From UMD's perspective there is also a necessity to set caching policy
>for
>>> performance fine tuning. It's much easier for the UMD to directly use
>PAT
>>> index because the behavior of each PAT index is clearly defined in
>Bspec.
>>> Having the abstracted i915_cache_level sitting in between would only
>cause
>>> more ambiguity.
>>
>> It may be worth mentioning here that PAT is expected to work much like
>> MOCS already works today --- the contract on the exact platform-specific
>> meaning of each index is documented in the hardware spec and userspace
>> is expected to select the index that exactly matches the behavior it
>> desires.
>will update.
>>>
>>> For these reasons this patch replaces i915_cache_level with PAT index.
>Also
>>> note, the cache_level is not completely removed yet, because the KMD
>still
>>> has the need of creating buffer objects with simple cache settings such
>as
>>> cached, uncached, or writethrough. For such simple cases, using
>cache_level
>>> would help simplify the code.
>>
>> See my comment farther down, but the implementation of
>> i915_gem_object_has_cache_level() seems a bit confusing and you may want
>> to elaborate on it here.
>>
>> Also somewhat confusing from a high-level skim is if/how we're still
>> using obj->cache_coherent once userspace has taken direct control of the
>> cache behavior.  Some PAT indices give coherency and others don't (and
>> the combinations will likely get more complicated on future platforms).
>> If obj->cache_coherent is still being considered even once PAT indices
>> are being controlled by userspace, I think we need some explanation of
>> how that works in the commit message (and likely in the kerneldoc for
>> that field too).
>For the objects with pat_index set by user space, coherency is managed by
>user space. Things like obj->cache_coherent and obj->cache_dirty are still
>there for objects created by kernel.

Right, that's the challenge --- userspace is taking over control of this
stuff, but the fields are still around and still used internally within
the driver.  How we reconcile those two things needs to be clearly
explained in the commit message and kerneldoc, otherwise we're going to
wind up with confusing code that's very difficult to maintain down the
road.

All the cache behavior is complicated enough that it probably wouldn't
be a bad idea to have a dedicated section of the kerneldoc focused on
cache behavior.

>>>
>>> Cc: Chris Wilson 
>>> Cc: Matt Roper 
>>> Signed-off-by: Fei Yang 
>>> Reviewed-by: Andi Shyti 
>>> ---
>>>  drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
>>>  drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 45 ++
>>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
>>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_object.c    | 51 +++-
>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
>>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>>  .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
>>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 
>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +--
>>>  drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 ++---
>>>  drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++-
>>>  drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
>>>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
>>>  drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++-
>>>  drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
>>>  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>>>  drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
>>>  drivers/gpu/drm/

Re: [PATCH v5 2/3] drm/i915: use pat_index instead of cache_level

2023-05-04 Thread Matt Roper
On Wed, May 03, 2023 at 03:50:59PM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> Currently the KMD is using enum i915_cache_level to set caching policy for
> buffer objects. This is flaky because the PAT index which really controls
> the caching behavior in PTE has far more levels than what's defined in the
> enum. In addition, the PAT index is platform dependent, having to translate
> between i915_cache_level and PAT index is not reliable, and makes the code
> more complicated.
> 
> From UMD's perspective there is also a necessity to set caching policy for
> performance fine tuning. It's much easier for the UMD to directly use PAT
> index because the behavior of each PAT index is clearly defined in Bspec.
> Having the abstracted i915_cache_level sitting in between would only cause
> more ambiguity.

It may be worth mentioning here that PAT is expected to work much like
MOCS already works today --- the contract on the exact platform-specific
meaning of each index is documented in the hardware spec and userspace
is expected to select the index that exactly matches the behavior it
desires.

> 
> For these reasons this patch replaces i915_cache_level with PAT index. Also
> note, the cache_level is not completely removed yet, because the KMD still
> has the need of creating buffer objects with simple cache settings such as
> cached, uncached, or writethrough. For such simple cases, using cache_level
> would help simplify the code.

See my comment farther down, but the implementation of
i915_gem_object_has_cache_level() seems a bit confusing and you may want
to elaborate on it here.

Also somewhat confusing from a high-level skim is if/how we're still
using obj->cache_coherent once userspace has taken direct control of the
cache behavior.  Some PAT indices give coherency and others don't (and
the combinations will likely get more complicated on future platforms).
If obj->cache_coherent is still being considered even once PAT indices
are being controlled by userspace, I think we need some explanation of
how that works in the commit message (and likely in the kerneldoc for
that field too).

> 
> Cc: Chris Wilson 
> Cc: Matt Roper 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andi Shyti 
> ---
>  drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c| 45 ++
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.c| 51 +++-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +-
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>  .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 71 
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
>  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 82 +--
>  drivers/gpu/drm/i915/gt/intel_gtt.h   | 20 ++---
>  drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
>  drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c |  6 +-
>  drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 ++-
>  drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
>  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>  drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
>  drivers/gpu/drm/i915/i915_debugfs.c   | 52 +---
>  drivers/gpu/drm/i915/i915_gem.c   | 16 +++-
>  drivers/gpu/drm/i915/i915_gpu_error.c |  8 +-
>  drivers/gpu/drm/i915/i915_vma.c   | 16 ++--
>  drivers/gpu/drm/i915/i915_vma.h   |  2 +-
>  drivers/gpu/drm/i915/i915_vma_types.h |  2 -
>  drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +-
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>  .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>  drivers/gpu/drm/i915/selftests/mock_gtt.c |  8 +-
>  36 files changed, 391 insertions(+), 240 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
> b/drivers/gpu/drm/i915/display/intel_dpt.c
> index c5eacfdba1a5..7c5fddb203ba 100644
> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t 
> pte)
>  static void dpt_insert_page(struct i915_address_space *vm,
>   dma_addr_t addr,
>   u64 offset,
> - enum i915_cache_level level,
> + 

[PATCH v5 2/3] drm/i915: use pat_index instead of cache_level

2023-05-03 Thread fei . yang
From: Fei Yang 

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.

>From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity.

For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For such simple cases, using cache_level
would help simplify the code.

Cc: Chris Wilson 
Cc: Matt Roper 
Signed-off-by: Fei Yang 
Reviewed-by: Andi Shyti 
---
 drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
 drivers/gpu/drm/i915/gem/i915_gem_domain.c| 45 ++
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 51 +++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 71 
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c  | 82 +--
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 20 ++---
 drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
 drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  6 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 ++-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
 drivers/gpu/drm/i915/i915_debugfs.c   | 52 +---
 drivers/gpu/drm/i915/i915_gem.c   | 16 +++-
 drivers/gpu/drm/i915/i915_gpu_error.c |  8 +-
 drivers/gpu/drm/i915/i915_vma.c   | 16 ++--
 drivers/gpu/drm/i915/i915_vma.h   |  2 +-
 drivers/gpu/drm/i915/i915_vma_types.h |  2 -
 drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
 .../drm/i915/selftests/intel_memory_region.c  |  4 +-
 drivers/gpu/drm/i915/selftests/mock_gtt.c |  8 +-
 36 files changed, 391 insertions(+), 240 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index c5eacfdba1a5..7c5fddb203ba 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
 static void dpt_insert_page(struct i915_address_space *vm,
dma_addr_t addr,
u64 offset,
-   enum i915_cache_level level,
+   unsigned int pat_index,
u32 flags)
 {
struct i915_dpt *dpt = i915_vm_to_dpt(vm);
gen8_pte_t __iomem *base = dpt->iomem;
 
gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,
-vm->pte_encode(addr, level, flags));
+vm->pte_encode(addr, pat_index, flags));
 }
 
 static void dpt_insert_entries(struct i915_address_space *vm,
   struct i915_vma_resource *vma_res,
-  enum i915_cache_level level,
+  unsigned int pat_index,
   u32 flags)
 {
struct i915_dpt *dpt = i915_vm_to_dpt(vm);
gen8_pte_t __iomem *base = dpt->iomem;
-   const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);
+   const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
struct sgt_iter sgt_iter;
dma_addr_t addr;
int i;
@@ -83,7 +83,7 @@ static void dpt_clear_range(struct i915_address_space *vm,
 static void dpt_bind_vma(struct i915_address_space *vm,
 struct i915_vm_pt_stash *stash,
 struct