RE: [RFC 4/8] drm/i915: Refactor PAT/object cache handling

2023-07-28 Thread Yang, Fei
[snip]
> @@ -41,14 +42,17 @@ static bool gpu_write_needs_clflush(struct 
> drm_i915_gem_object *obj)
>   return false;
>
>   /*
> -  * For objects created by userspace through GEM_CREATE with pat_index
> -  * set by set_pat extension, i915_gem_object_has_cache_level() will
> -  * always return true, because the coherency of such object is managed

i915_gem_object_has_cache_level() always return true means this function
always return false.

> -  * by userspace. Othereise the call here would fall back to checking
> -  * whether the object is un-cached or write-through.
> +  * Always flush cache for UMD objects with PAT index set.

(obj->pat_set_by_user == true) indicates UMD knows how to handle the coherency,
forcing clflush in KMD would be redundant.

>*/
> - return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> -  i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> + if (obj->pat_set_by_user)
> + return true;

return false;

> +
> + /*
> +  * Fully coherent cached access may end up with data in the CPU cache
> +  * which hasn't hit memory yet.
> +  */
> + return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> +i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W);

Why checking COH2W here? The logic was, if UC or WT return false, otherwise
return true. So, as long as cache_mode is WB, it's sufficient to say true
here, right?

>  }


RE: [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake

2023-07-27 Thread Yang, Fei
> From: Tvrtko Ursulin 
>
> On Meteorlake CPU cache will not contain stale data after GPU
> access since write-invalidate protocol is used, which means
> there is no need to flush before potentially transitioning the
> buffer to a non-coherent domain.
>
> Use the opportunity to documet the situation on discrete too.

LGTM.
Reviewed-by: Fei Yang 

> Signed-off-by: Tvrtko Ursulin 
> Cc: Matt Roper 
> Cc: Fei Yang 
> Cc: Matthew Auld 
> Cc: Thomas Hellström 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index ffddec1d2a76..57db9c581bf6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -24,9 +24,22 @@ static bool gpu_write_needs_clflush(struct 
> drm_i915_gem_object *obj)  {
>   struct drm_i915_private *i915 = to_i915(obj->base.dev);
>
> + /*
> +  * Discrete GPUs never dirty the CPU cache.
> +  */
>   if (IS_DGFX(i915))
>   return false;
>
> + /*
> +  * Cache snooping on Meteorlake is using write-invalidate so GPU writes
> +  * never end up in the CPU cache.
> +  *
> +  * QQQ: Do other snooping platforms behave identicaly and could we
> +  *  therefore write this as "if !HAS_LLC(i915) && HAS_SNOOP(i915)"?
> +  */
> + if (IS_METEORLAKE(i915))
> + return false;
> +
>   /*
>* For objects created by userspace through GEM_CREATE with pat_index
>* set by set_pat extension, i915_gem_object_has_cache_level() will
> --
> 2.39.2


RE: [PATCH v3] drm/i915: Refactor PAT/object cache handling

2023-07-20 Thread Yang, Fei
>>> [snip]
> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct
> drm_i915_gem_object *obj)

 The code change here looks accurate, but while we're here, I have a
 side question about this function in general...it was originally
 introduced in commit 48004881f693 ("drm/i915: Mark CPU cache as
 dirty when used for
 rendering") which states that GPU rendering ends up in the CPU cache
 (and thus needs a clflush later to make sure it lands in memory).
 That makes sense to me for LLC platforms, but is it really true for
 non-LLC snooping platforms (like MTL) as the commit states?
>>>
>>> For non-LLC platforms objects can be set to 1-way coherent which
>>> means GPU rendering ending up in CPU cache as well, so for non-LLC
>>> platform the logic here should be checking 1-way coherent flag.
>>
>> That's the part that I'm questioning (and not just for MTL, but for
>> all of our other non-LLC platforms too).  Just because there's
>> coherency doesn't mean that device writes landed in the CPU cache.
>> Coherency is also achieved if device writes invalidate the contents of the 
>> CPU cache.
>> I thought our non-LLC snooping platforms were coherent due to
>> write-invalidate rather than write-update, but I can't find it
>> specifically documented anywhere at the moment.  If write-invalidate
>> was used, then there shouldn't be a need for a later clflush either.
>
> [Trying to consolidate by doing a combined reply to the discussion so far.]
>
> On the write-invalidate vs write-update I don't know. If you did not
> find it in bspec then I doubt I would. I can have a browse still.

Matt was correct. Quote Ron Silvas from SW ARCH, "MTL GPU doesn't write to
CPU cache, it simply snoop CPU cache on its way to RAM."

 My understanding
 was that snooping platforms just invalidated the CPU cache to
 prevent future CPU reads from seeing stale data but didn't actually
 stick any new data in there?  Am I off track or is the original
 logic of this function not quite right?

 Anyway, even if the logic of this function is wrong, it's a mistake
 that would only hurt performance
>>>
>>> Yes, this logic will introduce performance impact because it's
>>> missing the checking for obj->pat_set_by_user. For objects with
>>> pat_set_by_user==true, even if the object is snooping or 1-way
>>> coherent, we don't want to enforce a clflush here since the
>>> coherency is supposed to be handled by user space.
>
> What should I add you think to fix it?

I think the simplest would be

if (obj->pat_set_by_user)
return false;

because even checking for incoherent WB is unnecessary, simply no
need for the KMD to initiate a flush if PAT is set by user.

> Add a check for non-coherent WB in gpu_write_needs_clflush as an additional 
> condition for returning false?
>
> And then if Matt is correct write-invalidate is used also !HAS_LLC should 
> just return false?
>
 (flushing more often than we truly need to) rather than
 functionality, so not something we really need to dig into right now
 as part of this patch.

>   if (IS_DGFX(i915))
>   return false;
>
> -/*
> - * For objects created by userspace through GEM_CREATE with pat_index
> - * set by set_pat extension, i915_gem_object_has_cache_level() will
> - * always return true, because the coherency of such object is 
> managed
> - * by userspace. Othereise the call here would fall back to checking
> - * whether the object is un-cached or write-through.
> - */
> -return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> - i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> +return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 
> &&
> +   i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1;
>   }
>>>
>>> [snip]
> @@ -640,15 +640,9 @@ static inline int use_cpu_reloc(const struct 
> reloc_cache *cache,
>   if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>   return false;
>
> -/*
> - * For objects created by userspace through GEM_CREATE with pat_index
> - * set by set_pat extension, i915_gem_object_has_cache_level() always
> - * return true, otherwise the call would fall back to checking 
> whether
> - * the object is un-cached.
> - */
>   return (cache->has_llc ||
>   obj->cache_dirty ||
> -!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> +i915_gem_object_has_cache_mode(obj,
> + I915_CACHE_MODE_UC) != 1);

 Platforms with relocations and platforms with user-specified PAT
 have no overlap, right?  So a -1 return should be impossible here
 and this is one case where we could just treat the return value as
 a boolean, right?
>>>
>
> Hm no, or 

RE: [PATCH v3] drm/i915: Refactor PAT/object cache handling

2023-07-19 Thread Yang, Fei
[snip]
>> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct 
>> drm_i915_gem_object *obj)
>
> The code change here looks accurate, but while we're here, I have a side
> question about this function in general...it was originally introduced
> in commit 48004881f693 ("drm/i915: Mark CPU cache as dirty when used for
> rendering") which states that GPU rendering ends up in the CPU cache
> (and thus needs a clflush later to make sure it lands in memory).  That
> makes sense to me for LLC platforms, but is it really true for non-LLC
> snooping platforms (like MTL) as the commit states?

For non-LLC platforms objects can be set to 1-way coherent which means
GPU rendering ending up in CPU cache as well, so for non-LLC platform
the logic here should be checking 1-way coherent flag.

> My understanding
> was that snooping platforms just invalidated the CPU cache to prevent
> future CPU reads from seeing stale data but didn't actually stick any
> new data in there?  Am I off track or is the original logic of this
> function not quite right?
>
> Anyway, even if the logic of this function is wrong, it's a mistake that
> would only hurt performance

Yes, this logic will introduce performance impact because it's missing the
checking for obj->pat_set_by_user. For objects with pat_set_by_user==true,
even if the object is snooping or 1-way coherent, we don't want to enforce
a clflush here since the coherency is supposed to be handled by user space.

> (flushing more often than we truly need to)
> rather than functionality, so not something we really need to dig into
> right now as part of this patch.
>
>>  if (IS_DGFX(i915))
>>  return false;
>>
>> -/*
>> - * For objects created by userspace through GEM_CREATE with pat_index
>> - * set by set_pat extension, i915_gem_object_has_cache_level() will
>> - * always return true, because the coherency of such object is managed
>> - * by userspace. Othereise the call here would fall back to checking
>> - * whether the object is un-cached or write-through.
>> - */
>> -return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>> - i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>> +return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 &&
>> +   i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1;
>>  }

[snip]
>> @@ -640,15 +640,9 @@ static inline int use_cpu_reloc(const struct 
>> reloc_cache *cache,
>>  if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>>  return false;
>>
>> -/*
>> - * For objects created by userspace through GEM_CREATE with pat_index
>> - * set by set_pat extension, i915_gem_object_has_cache_level() always
>> - * return true, otherwise the call would fall back to checking whether
>> - * the object is un-cached.
>> - */
>>  return (cache->has_llc ||
>>  obj->cache_dirty ||
>> -!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>> +i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1);
>
> Platforms with relocations and platforms with user-specified PAT have no
> overlap, right?  So a -1 return should be impossible here and this is
> one case where we could just treat the return value as a boolean, right?

My understanding is that the condition here means to say that, if GPU
access is uncached, don't use CPU reloc because the CPU cache might
contain stale data. This condition is sufficient for snooping platforms.
But from MTL onward, the condition show be whether the GPU access is
coherent with CPU. So, we should be checking 1-way coherent flag instead
of UC mode, because even if the GPU access is WB, it's still non-coherent,
thus CPU cache could be out-dated.

[snip]
>> @@ -208,12 +230,6 @@ bool i915_gem_object_can_bypass_llc(struct 
>> drm_i915_gem_object *obj)
>>  if (!(obj->flags & I915_BO_ALLOC_USER))
>>  return false;
>>
>> -/*
>> - * Always flush cache for UMD objects at creation time.
>> - */
>> -if (obj->pat_set_by_user)
>> -return true;
>> -

I'm still worried that the removal of these lines would cause the
MESA failure seen before. I know you are checking pat index below, but
that is only about GPU access. It doesn't tell you how CPU is going to
access the memory. If user space is setting an uncached PAT, then use
copy engine to zero out the memory, but on the CPU side the mapping is
cacheable, you could still seeing garbage data.

I agree the lines don't belong here because it doesn't have anything
to do with LLC, but they need to be moved to the right location instead
of being removed.

>>  /*
>>   * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
>>   * possible for userspace to bypass the GTT caching bits set by the
>> @@ -226,7 +242,21 @@ bool i915_gem_object_can_bypass_llc(struct 
>> drm_i915_gem_object *obj)
>>   * it, but since i915 takes the stance of always zeroing 

RE: [RFC 2/2] drm/i915: Remove PAT hack from i915_gem_object_can_bypass_llc

2023-07-14 Thread Yang, Fei
> On 14/07/2023 06:43, Yang, Fei wrote:
>>> From: Tvrtko Ursulin 
>>>
>>> According to the comment in i915_gem_object_can_bypass_llc the
>>> purpose of the function is to return false if the platform/object has
>>> a caching mode where GPU can bypass the LLC.
>>>
>>> So far the only platforms which allegedly can do this are Jasperlake
>>> and Elkhartlake, and that via MOCS (not PAT).
>>>
>>> Instead of blindly assuming that objects where userspace has set the
>>> PAT index can (bypass the LLC), question is is there a such PAT index
>>> on a platform. Probably starting with Meteorlake since that one is
>>> the only one where set PAT extension can be currently used. Or if
>>> there is a MOCS entry which can achieve the same thing on Meteorlake.
>>>
>>> If there is such PAT, now that i915 can be made to understand them
>>> better, we can make the check more fine grained. Or if there is a
>>> MOCS entry then we probably should apply the blanket IS_METEORLAKE 
>>> condition.
>>>
>>> Signed-off-by: Tvrtko Ursulin 
>>> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of
>>> cache_level")
>>> Cc: Chris Wilson 
>>> Cc: Fei Yang 
>>> Cc: Andi Shyti 
>>> Cc: Matt Roper 
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 --
>>>   1 file changed, 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> index 33a1e97d18b3..1e34171c4162 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct 
>>> drm_i915_gem_object *obj)
>>>if (!(obj->flags & I915_BO_ALLOC_USER))
>>>return false;
>>>
>>> - /*
>>> -  * Always flush cache for UMD objects at creation time.
>>> -  */
>>> - if (obj->pat_set_by_user)
>>
>> I'm afraid this is going to break MESA. Can we run MESA tests with this 
>> patch?
>
> I can't, but question is why it would break Mesa which would need a
> nice comment here?

For objects with PAT index set by user, the KMD doesn't know whether the user
space would map it as cacheable or not for CPU access. So we want to enforce
a cache flush at BO creation time before handing the BO over to user space.
I remember the issue we had before is that MESA sees garbage data in a BO
that is supposed to be initialized with zero.

I understand your point, checking for obj->pat_set_by_user is not something
to be done in this function. I'm fine with the removal of these lines, but
the checking needs to be done somewhere, maybe just one level up?

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 1df74f7aa3dc..39cd903ba223 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -258,6 +258,7 @@ static int i915_gem_object_get_pages_dmabuf(struct 
drm_i915_gem_object *obj)
 * the driver.
 */
if (i915_gem_object_can_bypass_llc(obj) ||
+   obj->pat_set_by_user ||
(!HAS_LLC(i915) && !IS_DG1(i915)))
wbinvd_on_all_cpus();

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..770e02a851f6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -254,7 +254,8 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
if (i915_gem_object_needs_bit17_swizzle(obj))
i915_gem_object_do_bit_17_swizzle(obj, st);

-   if (i915_gem_object_can_bypass_llc(obj))
+   if (i915_gem_object_can_bypass_llc(obj) ||
+   obj->pat_set_by_user)
obj->cache_dirty = true;

__i915_gem_object_set_pages(obj, st);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 1d3ebdf4069b..9e65e3324422 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -170,7 +170,8 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
}

WARN_ON_ONCE(!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE));
-   if (i915_gem_object_can_bypass_llc(obj))
+   if (i915_gem_object_can_bypass_llc(obj) ||
+   obj->pat_set_by_user)
obj->cache_dirty = true;

__i915_gem_object_set_pages(obj, st);

-Fei

> For instance should the check be IS_METEORLAKE?
>
> Or should it be "is wb" && "not has 1-way coherent"?
>
> Or both?
>
> Or, given how Meteorlake does not have LLC, how can anything bypass it
> there? Or is it about snooping on Meteorlake and how?
>
> Regards,
>
> Tvrtko
>
>>
>>>/*
>>> * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
>>> * possible for userspace to bypass the GTT caching bits set
>>> by the
>>> --
>>> 2.39.2


RE: [RFC 2/2] drm/i915: Remove PAT hack from i915_gem_object_can_bypass_llc

2023-07-13 Thread Yang, Fei
> From: Tvrtko Ursulin 
>
> According to the comment in i915_gem_object_can_bypass_llc the
> purpose of the function is to return false if the platform/object
> has a caching mode where GPU can bypass the LLC.
>
> So far the only platforms which allegedly can do this are Jasperlake
> and Elkhartlake, and that via MOCS (not PAT).
>
> Instead of blindly assuming that objects where userspace has set the
> PAT index can (bypass the LLC), question is is there a such PAT index
> on a platform. Probably starting with Meteorlake since that one is the
> only one where set PAT extension can be currently used. Or if there is
> a MOCS entry which can achieve the same thing on Meteorlake.
>
> If there is such PAT, now that i915 can be made to understand them
> better, we can make the check more fine grained. Or if there is a MOCS
> entry then we probably should apply the blanket IS_METEORLAKE condition.
>
> Signed-off-by: Tvrtko Ursulin 
> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Cc: Chris Wilson 
> Cc: Fei Yang 
> Cc: Andi Shyti 
> Cc: Matt Roper 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 33a1e97d18b3..1e34171c4162 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct 
> drm_i915_gem_object *obj)
>   if (!(obj->flags & I915_BO_ALLOC_USER))
>   return false;
>
> - /*
> -  * Always flush cache for UMD objects at creation time.
> -  */
> - if (obj->pat_set_by_user)

I'm afraid this is going to break MESA. Can we run MESA tests with this patch?

>   /*
>* EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
>* possible for userspace to bypass the GTT caching bits set by the
> --
> 2.39.2


RE: [RFC 1/2] drm/i915: Refactor PAT/object cache handling

2023-07-13 Thread Yang, Fei
[snip]
> @@ -326,10 +330,10 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, 
> void *data,
>   goto out;
>   }
>
> - if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
> - i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
> + if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
> + i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH1W) == 1)

Need to check L3 flag as well? The original code has L3_LLC.
if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
(i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH1W) == 1 ||
 i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_L3) == 1))

>   args->caching = I915_CACHING_CACHED;

[snip]
> +int i915_cache_init(struct drm_i915_private *i915)
> +{
> + struct drm_printer p = drm_info_printer(i915->drm.dev);
> + char buf[I915_CACHE_NAME_LEN];
> + int ret;
> +
> + i915->cache = HAS_LLC(i915) ? I915_CACHE_CACHED : I915_CACHE_NONE;
> + i915_cache_print(buf, sizeof(buf), "", i915->cache);
> + drm_printf(, "Coherent cache mode %s", buf);
> +
> + ret = i915_cache_find_pat(i915, I915_CACHE_NONE);
> + if (ret < 0)
> + return -ENODEV;
> + drm_info(>drm, "Using PAT index %u for uncached access\n", ret);
> + i915->pat_uc = ret;
> +
> + ret = i915_cache_find_pat(i915, I915_CACHE_CACHED);
> + if (ret < 0)
> + return -ENODEV;
> + drm_info(>drm, "Using PAT index %u for write-back access\n", ret);
> + i915->pat_wb = ret;

I think i915->pat_wt is needed as well, because display driver has code like 
this,
HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE

How did you handle functions like initial_plane_vma() and 
i915_gem_object_pin_to_display_plane()?

> + return 0;
> +}

[snip]
> -#define TGL_CACHELEVEL \
> - .cachelevel_to_pat = { \
> - [I915_CACHE_NONE]   = 3, \
> - [I915_CACHE_LLC]= 0, \
> - [I915_CACHE_L3_LLC] = 0, \
> - [I915_CACHE_WT] = 2, \
> +/*
> + * TODO/QQQ
> + *
> + * PAT 0 - is it 1-way or 2-way?

1-way

> + */
> +#define GEN12_CACHE_MODES \
> + .cache_modes = { \
> + [0] = _I915_CACHE(WB, COH1W), \
> + [1] = I915_CACHE(WC), \
> + [2] = I915_CACHE(WT), \
> + [3] = I915_CACHE(UC), \
>   }

[snip]
> -#define PVC_CACHELEVEL \
> - .cachelevel_to_pat = { \
> - [I915_CACHE_NONE]   = 0, \
> - [I915_CACHE_LLC]= 3, \
> - [I915_CACHE_L3_LLC] = 3, \
> - [I915_CACHE_WT] = 2, \
> +/*
> + * TODO/QQQ
> + *
> + * PAT 3 - is it 1-way or 2-way?

1-way

PVC access is always coherent, it should have 1-way for [5] and [7] as well.

> + */
> +#define PVC_CACHE_MODES \
> + .cache_modes = { \
> + [0] = I915_CACHE(UC), \
> + [1] = I915_CACHE(WC), \
> + [2] = I915_CACHE(WT), \
> + [3] = _I915_CACHE(WB, COH1W), \
> + [4] = _I915_CACHE(WT, CLOS1), \
> + [5] = _I915_CACHE(WB, CLOS1), \
> + [6] = _I915_CACHE(WT, CLOS2), \
> + [7] = _I915_CACHE(WB, CLOS2), \
>   }


RE: [PATCH] drm/i915: Remove dead code from gen8_pte_encode

2023-07-07 Thread Yang, Fei
> From: Tvrtko Ursulin 
>
> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> added a dedicated gen12_pte_encode but forgot to remove the Gen12
> specific bit from gen8_pte_encode.

Reviewed-by: Fei Yang 

> Signed-off-by: Tvrtko Ursulin 
> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Cc: Fei Yang 
> Cc: Andi Shyti 
> Cc: Matt Roper 
> ---
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index f948d33e5ec5..c8568e5d1147 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -37,9 +37,6 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>   if (unlikely(flags & PTE_READ_ONLY))
>   pte &= ~GEN8_PAGE_RW;
>
> - if (flags & PTE_LM)
> - pte |= GEN12_PPGTT_PTE_LM;
> -
>   /*
>* For pre-gen12 platforms pat_index is the same as enum
>* i915_cache_level, so the switch-case here is still valid.
> --
> 2.39.2


Re: [PATCH v2] drm/i915: Refactor PAT/cache handling

2023-07-06 Thread Yang, Fei
> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct  
> drm_i915_gem_object *obj)
>  if (IS_DGFX(i915))
>  return false;
> -   /*
> -* For objects created by userspace through GEM_CREATE with 
> pat_index
> -* set by set_pat extension, i915_gem_object_has_cache_level() 
> will
> -* always return true, because the coherency of such object is 
> managed
> -* by userspace. Othereise the call here would fall back to 
> checking
> -* whether the object is un-cached or write-through.
> -*/
> -   return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> -i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> +   return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 
> 1 &&
> +  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 
> 1;

 This logic was changed for objects with pat index set by user here. It
 used to return false regardless the cache mode. But now it returns true
 if its cache mode is neither UC nor WT.
>>>
>>> Yes, that was half of the motivation of the refactory. Before the PAT
>>> index series code was like this:
>>>
>>>return !(obj->cache_level == I915_CACHE_NONE ||
>>> obj->cache_level == I915_CACHE_WT);
>>> So kernel knew it needs to flush only if caching mode is neither UC or WT.
>>> With the PAT index series you changed it to:
>>>
>>>return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>> i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>>> And as i915_gem_object_has_cache_level was changed to always return true
>>> if PAT was set by user, that made the check meaningless for such objects.
>>
>> But the point is that the KMD should not flush the cache for objects
>> with PAT set by user because UMD is handling the cache conherency.
>> That is the design. Well doing so wouldn't cause functional issue, but
>> it's unecessary and might have performance impact.
>
> Not all i915_gem_object_has_cache_level() are even about flushing the cache
> and if the kernel doesn't know what is behind a PAT index
> (i915_gem_object_has_cache_level lies by always returning true) are we 100%
> sure everything is functionally correct?
>
> flush_write_domain() for instance uses it to determine whether to set
> obj->cache_dirty after GPU activity. How does the UMD manage that?
>
> Then use_cpu_reloc(). Another pointless/misleading question.
>
> Finally vm_fault_gtt() rejects access based on it.
>
> Perhaps the question is moot since the set pat extension is restricted to
> MTL so some other conditions used in the above checks, like HAS_LLC and such,
> make for no practical difference. Even if so, what if the extension was 
> allowed
> on other platforms as it was the plan until it was discovered there is no
> userspace code for other platforms. Would the plan work on all platforms? And
> even if it would I think the implementation is very non-obvious.
>

Understand your point, perhaps we should let i915_gem_object_has_cache_mode()
do what it supposed to do, and add a separate check for obj->pat_set_by_user
in functions like gpu_write_needs_clflush(), use_cpu_reloc(), etc. Anyway,
the design is to let UMD handle coherency for objects with pat set by user.

>>> With my refactoring it becomes meaningful again and restores to the
>>> original behaviour. That's the intent at least.
>>>
>  bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> @@ -255,9 +249,9 @@ i915_gem_object_set_to_gtt_domain(struct 
> drm_i915_gem_object *obj, bool write)
>  }
>
>  /**
> - * i915_gem_object_set_cache_level - Changes the cache-level of an 
> object across all VMA.
> + * i915_gem_object_set_cache - Changes the cache-level of an object 
> across all VMA.

[...]

> -   if (i915_gem_object_has_cache_level(obj, cache_level))
> +   ret = i915_cache_find_pat(i915, cache);
> +   if (ret < 0) {
> +   struct drm_printer p =
> +drm_err_printer("Attempting to use unknown caching mode 
> ");
> +
> +   i915_cache_print(, cache);
> +   drm_puts(, "!\n");
> +
> +   return -EINVAL;
> +   } else if (ret == obj->pat_index) {
> return 0;
 We will have to do this conversion and checking again later in this
 function when calling i915_gem_object_set_cache_coherency().
>>>
>>> Yes the double lookup part is a bit naff. It is not super required
>>> apart for validating internal callers (could be a debug build only
>>> feature perhaps) and to see if PAT index has changed and so whether
>>> it needs to call i915_gem_object_wait before proceeding to
>>> i915_gem_object_set_cache_coherency...
>>>
 My thought was to simply remove the use of 

Re: [PATCH] drm/i915/mtl: Update cache coherency setting for context structure

2023-07-06 Thread Yang, Fei
> As context structure is shared memory for CPU/GPU, Wa_22016122933 is
> needed for this memory block as well.
>
> Signed-off-by: Zhanjun Dong 
> CC: Fei Yang 

Reviewed-by: Fei Yang 

> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index a4ec20aaafe2..1b710102390b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1092,8 +1092,15 @@ __lrc_alloc_state(struct intel_context *ce, struct 
> intel_engine_cs *engine)
>
>  obj = i915_gem_object_create_lmem(engine->i915, context_size,
>I915_BO_ALLOC_PM_VOLATILE);
> -   if (IS_ERR(obj))
> +   if (IS_ERR(obj)) {
>  obj = i915_gem_object_create_shmem(engine->i915, 
> context_size);
> +   /*
> +* Wa_22016122933: For MTL the shared memory needs to be 
> mapped
> +* as WC on CPU side and UC (PAT index 2) on GPU side
> +*/
> +   if (IS_METEORLAKE(engine->i915))
> +   i915_gem_object_set_cache_coherency(obj, 
> I915_CACHE_NONE);
> +   }
>  if (IS_ERR(obj))
>  return ERR_CAST(obj);
>
> --
> 2.34.1



Re: [PATCH v2] drm/i915: Refactor PAT/cache handling

2023-07-04 Thread Yang, Fei
>>> From: Tvrtko Ursulin 
>>>
>>> Informal commit message for now.
>>>
>>> I got a bit impatient and curious to see if the idea we discussed would
>>> work so sketched something out. I think it is what I was describing back
>>> then..
>>>
>>> So high level idea is to teach the driver what caching modes are hidden
>>> behind PAT indices. Given you already had that in static tables, if we
>>> just turn the tables a bit around and add a driver abstraction of caching
>>> modes this is what happens:
>>>
>>>  * We can lose the ugly runtime i915_gem_get_pat_index.
>>>
>>>  * We can have a smarter i915_gem_object_has_cache_level, which now can
>>>use the above mentioned table to understand the caching modes and so
>>>does not have to pessimistically return true for _any_ input when user
>>>has set the PAT index. This may improve things even for MTL.
>>>
>>>  * We can simplify the debugfs printout to be platform agnostic.
>>>
>>>  * We are perhaps opening the door to un-regress the dodgy addition
>>>made to i915_gem_object_can_bypass_llc? See QQQ/FIXME in the patch.
>>>
>>> I hope I did not forget anything, but anyway, please have a read and see
>>> what you think. I think it has potential.
>>>
>>> Proper commit message can come later.
>>>
>>> Signed-off-by: Tvrtko Ursulin 
>>> Cc: Fei Yang 
>>> ---
>>>  drivers/gpu/drm/i915/Makefile |   1 +
>>>  drivers/gpu/drm/i915/display/intel_dpt.c  |   2 +-
>>>  drivers/gpu/drm/i915/display/intel_fb_pin.c   |   2 +-
>>>  .../drm/i915/display/intel_plane_initial.c|   4 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_domain.c|  66 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_domain.h|   7 +-
>>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  13 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   6 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  10 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_object.c| 116 +
>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h|   9 +-
>>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 117 +++---
>>>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  10 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  13 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  43 ---
>>>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>>>  .../drm/i915/gem/selftests/huge_gem_object.c  |   6 +-
>>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   8 +-
>>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |   4 +-
>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  19 +--
>>>  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 +-
>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c  |  33 ++---
>>>  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |   4 +-
>>>  drivers/gpu/drm/i915/gt/intel_gtt.c   |   2 +-
>>>  drivers/gpu/drm/i915/gt/intel_gtt.h   |   3 +-
>>>  drivers/gpu/drm/i915/gt/intel_migrate.c   |  11 +-
>>>  drivers/gpu/drm/i915/gt/intel_ppgtt.c |   6 +-
>>>  .../gpu/drm/i915/gt/intel_ring_submission.c   |   2 +-
>>>  drivers/gpu/drm/i915/gt/intel_timeline.c  |   2 +-
>>>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>>>  drivers/gpu/drm/i915/gt/selftest_migrate.c|   9 +-
>>>  drivers/gpu/drm/i915/gt/selftest_reset.c  |  14 +--
>>>  drivers/gpu/drm/i915/gt/selftest_timeline.c   |   1 +
>>>  drivers/gpu/drm/i915/gt/selftest_tlb.c|   5 +-
>>>  .../gpu/drm/i915/gt/selftest_workarounds.c|   2 +-
>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.c|   2 +-
>>>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  |   8 +-
>>>  drivers/gpu/drm/i915/i915_cache.c |  72 +++
>>>  drivers/gpu/drm/i915/i915_cache.h |  49 
>>>  drivers/gpu/drm/i915/i915_debugfs.c   |  65 +++---
>>>  drivers/gpu/drm/i915/i915_driver.c|   5 +
>>>  drivers/gpu/drm/i915/i915_drv.h   |   3 +
>>>  drivers/gpu/drm/i915/i915_gem.c   |  21 +---
>>>  drivers/gpu/drm/i915/i915_gpu_error.c |   7 +-
>>>  drivers/gpu/drm/i915/i915_pci.c   |  83 +++--
>>>  drivers/gpu/drm/i915/i915_perf.c  |   2 +-
>>>  drivers/gpu/drm/i915/intel_device_info.h  |   6 +-
>>>  drivers/gpu/drm/i915/selftests/i915_gem.c |   5 +-
>>>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |   8 +-
>>>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  13 +-
>>>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>>>  .../drm/i915/selftests/intel_memory_region.c  |   4 +-
>>>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  12 +-
>>>  drivers/gpu/drm/i915/selftests/mock_region.c  |   2 +-
>>>  54 files changed, 440 insertions(+), 485 deletions(-)
>>>  create mode 100644 drivers/gpu/drm/i915/i915_cache.c
>>>  create mode 100644 drivers/gpu/drm/i915/i915_cache.h
>>>
>>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>>> index 2be9dd960540..2c3da8f0c78e 100644
>>> --- a/drivers/gpu/drm/i915/Makefile
>>> +++ 

Re: [PATCH v8 1/2] drm/i915: preparation for using PAT index

2023-06-27 Thread Yang, Fei
> On 27/06/2023 14:28, Jani Nikula wrote:
>> On Tue, 09 May 2023, fei.y...@intel.com wrote:
>>> From: Fei Yang 
>>>
>>> This patch is a preparation for replacing enum i915_cache_level with
>>> PAT index. Caching policy for buffer objects is set through the PAT
>>> index in PTE, the old i915_cache_level is not sufficient to represent
>>> all caching modes supported by the hardware.
>>>
>>> Preparing the transition by adding some platform dependent data
>>> structures and helper functions to translate the cache_level to pat_index.
>>>
>>> cachelevel_to_pat: a platform dependent array mapping cache_level to
>>> pat_index.
>>>
>>> max_pat_index: the maximum PAT index recommended in hardware specification
>>> Needed for validating the PAT index passed in from user
>>> space.
>>>
>>> i915_gem_get_pat_index: function to convert cache_level to PAT index.
>>>
>>> obj_to_i915(obj): macro moved to header file for wider usage.
>>>
>>> I915_MAX_CACHE_LEVEL: upper bound of i915_cache_level for the
>>>convenience of coding.
>>>
>>> Cc: Chris Wilson 
>>> Cc: Matt Roper 
>>> Signed-off-by: Fei Yang 
>>> Reviewed-by: Andi Shyti 
>>> Reviewed-by: Andrzej Hajda 
>>
>> [snip]
>>
>>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> index f6a7c0bd2955..0eda8b4ee17f 100644
>>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> @@ -123,7 +123,9 @@ struct drm_i915_private *mock_gem_device(void)
>>> static struct dev_iommu fake_iommu = { .priv = (void *)-1 };
>>>   #endif
>>> struct drm_i915_private *i915;
>>> +   struct intel_device_info *i915_info;
>>> struct pci_dev *pdev;
>>> +   unsigned int i;
>>> int ret;
>>>
>>> pdev = kzalloc(sizeof(*pdev), GFP_KERNEL); @@ -180,6 +182,13 @@
>>> struct drm_i915_private *mock_gem_device(void)
>>> I915_GTT_PAGE_SIZE_2M;
>>>
>>> RUNTIME_INFO(i915)->memory_regions = REGION_SMEM;
>>> +
>>> +   /* simply use legacy cache level for mock device */
>>> +   i915_info = (struct intel_device_info *)INTEL_INFO(i915);
>>
>> This is not okay. It's not okay to modify device info at runtime. This
>> is why we've separated runtime info from device info. This is why
>> we've made device info const, and localized the modifications to a
>> couple of places.
>>
>> If you need to modify it, it belongs in runtime info. Even if it's
>> only ever modified for mock devices.
>>
>> We were at the brink of being able to finally convert INTEL_INFO()
>> into a pointer to const rodata [1], where you are unable to modify it,
>> but this having been merged as commit 5e352e32aec2 ("drm/i915:
>> preparation for using PAT index") sets us back. (With [1] this oopses
>> trying to modify read-only data.)
>>
>> This has been posted to the public list 20+ times, and nobody noticed
>> or pointed this out?!
>>
>> Throwing away const should be a huge red flag to any developer or
>> reviewer. Hell, *any* cast should be.
>>
>> I've got a patch ready moving cachelevel_to_pat and max_pat_index to
>> runtime info. It's not great, since we'd be doing it only for the mock
>> device. Better ideas? I'm not waiting long.
>>
>>
>> BR,
>> Jani.
>>
>>
>> [1]
>> https://patchwork.freedesktop.org/patch/msgid/0badc36ce6dd6b030507bdfd
>> 8a42ab984fb38d12.1686236840.git.jani.nik...@intel.com
>>
>>
>>> +   i915_info->max_pat_index = 3;
>>> +   for (i = 0; i < I915_MAX_CACHE_LEVEL; i++)
>>> +   i915_info->cachelevel_to_pat[i] = i;
>>> +
>
> I'd simply suggest having a local static const table for the mock device.
> It should be trivial once i915->__info becomes a pointer so in that series
> I guess.
>
> While I am here - Fei - any plans to work on the promised cleanup?
> Abstracting the caching modes with a hw agnostic sw/driver representation,
> if you remember what we discussed.

Yes, I'm still working on that as a side task. Hopefully I would be able to
post something to the mailing list after the July 4th holiday.

> Regards,
>
> Tvrtko



Re: [PATCH v8 1/2] drm/i915: preparation for using PAT index

2023-06-27 Thread Yang, Fei
> Hi Jani and Tvrtko,
>
 This patch is a preparation for replacing enum i915_cache_level with PAT
 index. Caching policy for buffer objects is set through the PAT index in
 PTE, the old i915_cache_level is not sufficient to represent all caching
 modes supported by the hardware.

 Preparing the transition by adding some platform dependent data structures
 and helper functions to translate the cache_level to pat_index.

 cachelevel_to_pat: a platform dependent array mapping cache_level to
 pat_index.

 max_pat_index: the maximum PAT index recommended in hardware specification
 Needed for validating the PAT index passed in from user
 space.

 i915_gem_get_pat_index: function to convert cache_level to PAT index.

 obj_to_i915(obj): macro moved to header file for wider usage.

 I915_MAX_CACHE_LEVEL: upper bound of i915_cache_level for the
convenience of coding.

 Cc: Chris Wilson 
 Cc: Matt Roper 
 Signed-off-by: Fei Yang 
 Reviewed-by: Andi Shyti 
 Reviewed-by: Andrzej Hajda 
>>>
>>> [snip]
>>>
 diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c 
 b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
 index f6a7c0bd2955..0eda8b4ee17f 100644
 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
 +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
 @@ -123,7 +123,9 @@ struct drm_i915_private *mock_gem_device(void)
static struct dev_iommu fake_iommu = { .priv = (void *)-1 };
   #endif
struct drm_i915_private *i915;
 + struct intel_device_info *i915_info;
struct pci_dev *pdev;
 + unsigned int i;
int ret;
pdev = kzalloc(sizeof(*pdev), GFP_KERNEL);
 @@ -180,6 +182,13 @@ struct drm_i915_private *mock_gem_device(void)
I915_GTT_PAGE_SIZE_2M;
RUNTIME_INFO(i915)->memory_regions = REGION_SMEM;
 +
 + /* simply use legacy cache level for mock device */
 + i915_info = (struct intel_device_info *)INTEL_INFO(i915);
>>>
>>> This is not okay. It's not okay to modify device info at runtime. This
>>> is why we've separated runtime info from device info. This is why we've
>>> made device info const, and localized the modifications to a couple of
>>> places.
>>>
>>> If you need to modify it, it belongs in runtime info. Even if it's only
>>> ever modified for mock devices.
>>>
>>> We were at the brink of being able to finally convert INTEL_INFO() into
>>> a pointer to const rodata [1], where you are unable to modify it, but
>>> this having been merged as commit 5e352e32aec2 ("drm/i915: preparation
>>> for using PAT index") sets us back. (With [1] this oopses trying to
>>> modify read-only data.)
>>>
>>> This has been posted to the public list 20+ times, and nobody noticed or
>>> pointed this out?!
>
> That's not cool, indeed.
>
>>> Throwing away const should be a huge red flag to any developer or
>>> reviewer. Hell, *any* cast should be.
>>>
>>> I've got a patch ready moving cachelevel_to_pat and max_pat_index to
>>> runtime info. It's not great, since we'd be doing it only for the mock
>>> device. Better ideas? I'm not waiting long.
>>>
>>>
>>> BR,
>>> Jani.
>>>
>>>
>>> [1] 
>>> https://patchwork.freedesktop.org/patch/msgid/0badc36ce6dd6b030507bdfd8a42ab984fb38d12.1686236840.git.jani.nik...@intel.com
>>>
>>>
 + i915_info->max_pat_index = 3;
 + for (i = 0; i < I915_MAX_CACHE_LEVEL; i++)
 + i915_info->cachelevel_to_pat[i] = i;
 +
>>
>> I'd simply suggest having a local static const table for the mock device. It
>> should be trivial once i915->__info becomes a pointer so in that series I
>> guess.
>
> Fei... do you have bandwidth to look into this or do you want me
> to try Tvrtko's suggestion out?

Please go ahead Andi if you have time to help on this.

> Thank you Jani for reporting it and thank you Tvrtko for
> proposing the fix.

Sorry for the trouble here.

> Andi



Re: [Intel-gfx] [PATCH] drm/i915/gt: Remove incorrect hard coded cache coherrency setting

2023-06-22 Thread Yang, Fei
> The previouse i915_gem_object_create_internal already set it with proper
> value before function return. This hard coded setting is incorrect for
> platforms like MTL, thus need to be removed.
>
> Signed-off-by: Zhanjun Dong 
> ---
>  drivers/gpu/drm/i915/gt/intel_timeline.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c 
> b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index b9640212d659..693d18e14b00 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -26,8 +26,6 @@ static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>  if (IS_ERR(obj))
>  return ERR_CAST(obj);
>
> -   i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
> -

Does this change really fix the coherency issue?
I consulted with Chris and he said that the hwsp is purposely set to be
cacheable. The mapping on CPU side also indicates it's cacheable,

intel_timeline_pin_map(struct intel_timeline *timeline)
{
struct drm_i915_gem_object *obj = timeline->hwsp_ggtt->obj;
u32 ofs = offset_in_page(timeline->hwsp_offset);
void *vaddr;

vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
...
}

>  vma = i915_vma_instance(obj, >ggtt->vm, NULL);
>  if (IS_ERR(vma))
>  i915_gem_object_put(obj);
> --
> 2.34.1



Re: [PATCH v17 1/1] drm/i915: Allow user to set cache at BO creation

2023-06-09 Thread Yang, Fei
> Hi Carl,
>
 besides this, ask a dumb question.
 How we retrieve the pat_index from a shared resource though dma_buf fd?
 maybe we need to know whether it could be CPU cached if we want map it.
 Of course, looks there are no real usage to access it though CPU.
 Just use it directly without any pat related options ?
>>>
>>> I am not understanding. Do you want to ask the PAT table to the driver? Are
>>> you referring to the CPU PAT index?
>>>
>>> In any case, if I understood correctly, you don't necessarily always need to
>>> set the PAT options and the cache options will fall into the default values.
>>>
>>> Please let me know if I haven't answered the question.
>>>
>>
>> If mesa create a resource , then use DRM_IOCTL_PRIME_HANDLE_TO_FD convert it 
>> to a dma fd.
>> Then share it to media, media use DRM_IOCTL_PRIME_FD_TO_HANDLE convert it to 
>> a gem bo.
>> But media does not know the PAT index , because mesa create it and set it.
>> So, if media want to call DRM_IOCTL_I915_GEM_MMAP_OFFSET, media does not 
>> know whether it could be WB.
>
> That's a good point. To be honest I am not really sure how this
> is handled.
>
> Fei, Jordan? Do you have suggestion here?

Is it possible to pass the PAT setting when sharing the fd?
Or perhaps we should have kept the get_caching ioctl functional?
Joonas, could you chime in here?

> Andi


Re: [Intel-gfx] [PATCH v15 1/1] drm/i915: Allow user to set cache at BO creation

2023-06-06 Thread Yang, Fei
>> On 31/05/2023 18:10, fei.y...@intel.com wrote:
>>> From: Fei Yang 
>>>
>>> To comply with the design that buffer objects shall have immutable
>>> cache setting through out their life cycle, {set, get}_caching ioctl's
>>> are no longer supported from MTL onward. With that change caching
>>> policy can only be set at object creation time. The current code
>>> applies a default (platform dependent) cache setting for all objects.
>>> However this is not optimal for performance tuning. The patch extends
>>> the existing gem_create uAPI to let user set PAT index for the object
>>> at creation time.
>>> The new extension is platform independent, so UMD's can switch to using
>>> this extension for older platforms as well, while {set, get}_caching are
>>> still supported on these legacy paltforms for compatibility reason.
>>>
>>> Note: The detailed description of PAT index is missing in current PRM
>>> even for older hardware and will be added by the next PRM update under
>>> chapter name "Memory Views".
>>>
>>> BSpec: 45101
>>>
>>> Mesa support has been submitted in this merge request:
>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878
>>>
>>> The media driver is supported by the following commits:
>>> https://github.com/intel/media-driver/commit/92c00a857433ebb34ec575e9834f473c6fcb6341
>>> https://github.com/intel/media-driver/commit/fd375cf2c5e1f6bf6b43258ff797b3134aadc9fd
>>> https://github.com/intel/media-driver/commit/08dd244b22484770a33464c2c8ae85430e548000
>>
>> On which platforms will media-driver use the uapi? I couldn't easily
>> figure out myself from the links above and also in the master branch I
>> couldn't find the implementation of CachePolicyGetPATIndex.
>
> These commits look like platform independent. Carl, could you chime in here?

Confirmed with Carl and Lihao offline that the media driver is calling set_pat
extension in common code path, so the use of set_pat extension is platform
independent. The only problem right now is that the gmm library is not returning
correct PAT index for all hardware platforms, so on some platforms the call 
would
be bypassed and fall back to the old way.
I think this is the correct implementation. It should be platform independent as
long as the application knows what PAT index to set. Updating the gmm library to
understand PAT index for each hardware platform is a separate issue.

>> Now that PRMs for Tigerlake have been published and Meteorlake situation
>> is documented indirectly in Mesa code, my only remaining concern is with
>> the older platforms. So if there is no particular reason to have the
>> extension working on those, I would strongly suggest we disable there.
>
> What's the concern? There is no change required for older platforms, existing
> user space code should continue to work. And this extension should be made
> available for any new development because the cache settings for BO's need
> to be immutable. And that is platform independent.
>
>> For a precedent see I915_CONTEXT_PARAM_SSEU and how it allows the
>> extension only on Gen11 and only for a very specific usecase (see
>> restrictions in set_sseu() and i915_gem_user_to_context_sseu()).
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> The IGT test related to this change is
>>> igt@gem_create@create-ext-set-pat
>>>
>>> Signed-off-by: Fei Yang 
>>> Cc: Chris Wilson 
>>> Cc: Matt Roper 
>>> Cc: Andi Shyti 
>>> Reviewed-by: Andi Shyti 
>>> Acked-by: Jordan Justen 
>>> Tested-by: Jordan Justen 
>>> Acked-by: Carl Zhang 
>>> Tested-by: Lihao Gu 
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 +++
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>>>   include/uapi/drm/i915_drm.h| 41 ++
>>>   3 files changed, 83 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> index bfe1dbda4cb7..644a936248ad 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>>> @@ -245,6 +245,7 @@ struct create_ext {
>>>unsigned int n_placements;
>>>unsigned int placement_mask;
>>>unsigned long flags;
>>> + unsigned int pat_index;
>>>   };
>>>
>>>   static void repr_placements(char *buf, size_t size,
>>> @@ -394,11 +395,39 @@ static int ext_set_protected(struct 
>>> i915_user_extension __user *base, void *data
>>>return 0;
>>>   }
>>>
>>> +static int ext_set_pat(struct i915_user_extension __user *base, void *data)
>>> +{
>>> + struct create_ext *ext_data = data;
>>> + struct drm_i915_private *i915 = ext_data->i915;
>>> + struct drm_i915_gem_create_ext_set_pat ext;
>>> + unsigned int max_pat_index;
>>> +
>>> + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
>>> +  offsetofend(struct drm_i915_gem_create_ext_set_pat, 
>>> rsvd));
>>> +
>>> + if (copy_from_user(, base, sizeof(ext)))
>>> + return -EFAULT;

Re: [Intel-gfx] [PATCH v15 1/1] drm/i915: Allow user to set cache at BO creation

2023-06-05 Thread Yang, Fei
> On 31/05/2023 18:10, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> To comply with the design that buffer objects shall have immutable
>> cache setting through out their life cycle, {set, get}_caching ioctl's
>> are no longer supported from MTL onward. With that change caching
>> policy can only be set at object creation time. The current code
>> applies a default (platform dependent) cache setting for all objects.
>> However this is not optimal for performance tuning. The patch extends
>> the existing gem_create uAPI to let user set PAT index for the object
>> at creation time.
>> The new extension is platform independent, so UMD's can switch to using
>> this extension for older platforms as well, while {set, get}_caching are
>> still supported on these legacy paltforms for compatibility reason.
>>
>> Note: The detailed description of PAT index is missing in current PRM
>> even for older hardware and will be added by the next PRM update under
>> chapter name "Memory Views".
>>
>> BSpec: 45101
>>
>> Mesa support has been submitted in this merge request:
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878
>>
>> The media driver is supported by the following commits:
>> https://github.com/intel/media-driver/commit/92c00a857433ebb34ec575e9834f473c6fcb6341
>> https://github.com/intel/media-driver/commit/fd375cf2c5e1f6bf6b43258ff797b3134aadc9fd
>> https://github.com/intel/media-driver/commit/08dd244b22484770a33464c2c8ae85430e548000
>
> On which platforms will media-driver use the uapi? I couldn't easily
> figure out myself from the links above and also in the master branch I
> couldn't find the implementation of CachePolicyGetPATIndex.

These commits look like platform independent. Carl, could you chime in here?

> Now that PRMs for Tigerlake have been published and Meteorlake situation
> is documented indirectly in Mesa code, my only remaining concern is with
> the older platforms. So if there is no particular reason to have the
> extension working on those, I would strongly suggest we disable there.

What's the concern? There is no change required for older platforms, existing
user space code should continue to work. And this extension should be made
available for any new development because the cache settings for BO's need
to be immutable. And that is platform independent.

> For a precedent see I915_CONTEXT_PARAM_SSEU and how it allows the
> extension only on Gen11 and only for a very specific usecase (see
> restrictions in set_sseu() and i915_gem_user_to_context_sseu()).
>
> Regards,
>
> Tvrtko
>
>>
>> The IGT test related to this change is
>> igt@gem_create@create-ext-set-pat
>>
>> Signed-off-by: Fei Yang 
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Cc: Andi Shyti 
>> Reviewed-by: Andi Shyti 
>> Acked-by: Jordan Justen 
>> Tested-by: Jordan Justen 
>> Acked-by: Carl Zhang 
>> Tested-by: Lihao Gu 
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 +++
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>>   include/uapi/drm/i915_drm.h| 41 ++
>>   3 files changed, 83 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> index bfe1dbda4cb7..644a936248ad 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> @@ -245,6 +245,7 @@ struct create_ext {
>>unsigned int n_placements;
>>unsigned int placement_mask;
>>unsigned long flags;
>> + unsigned int pat_index;
>>   };
>>
>>   static void repr_placements(char *buf, size_t size,
>> @@ -394,11 +395,39 @@ static int ext_set_protected(struct 
>> i915_user_extension __user *base, void *data
>>return 0;
>>   }
>>
>> +static int ext_set_pat(struct i915_user_extension __user *base, void *data)
>> +{
>> + struct create_ext *ext_data = data;
>> + struct drm_i915_private *i915 = ext_data->i915;
>> + struct drm_i915_gem_create_ext_set_pat ext;
>> + unsigned int max_pat_index;
>> +
>> + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
>> +  offsetofend(struct drm_i915_gem_create_ext_set_pat, 
>> rsvd));
>> +
>> + if (copy_from_user(, base, sizeof(ext)))
>> + return -EFAULT;
>> +
>> + max_pat_index = INTEL_INFO(i915)->max_pat_index;
>> +
>> + if (ext.pat_index > max_pat_index) {
>> + drm_dbg(>drm, "PAT index is invalid: %u\n",
>> + ext.pat_index);
>> + return -EINVAL;
>> + }
>> +
>> + ext_data->pat_index = ext.pat_index;
>> +
>> + return 0;
>> +}
>> +
>>   static const i915_user_extension_fn create_extensions[] = {
>>[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>> + [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
>>   };
>>
>> +#define PAT_INDEX_NOT_SET0x
>>   /**
>>* i915_gem_create_ext_ioctl 

Re: [Intel-gfx] [PATCH v15 0/1] drm/i915: Allow user to set cache at BO creation

2023-06-05 Thread Yang, Fei
> On 05/06/2023 09:53, Tvrtko Ursulin wrote:
>> On 31/05/2023 18:10, fei.y...@intel.com wrote:
>>> From: Fei Yang 
>>>
>>> This series introduce a new extension for GEM_CREATE,
>>> 1. end support for set caching ioctl [PATCH 1/2]
>>> 2. add set_pat extension for gem_create [PATCH 2/2]
>>>
>>> v2: drop one patch that was merged separately
>>>  commit 341ad0e8e254 ("drm/i915/mtl: Add PTE encode function")
>>> v3: rebased on https://patchwork.freedesktop.org/series/117082/
>>> v4: fix missing unlock introduced in v3, and
>>>  solve a rebase conflict
>>> v5: replace obj->cache_level with pat_set_by_user,
>>>  fix i915_cache_level_str() for legacy platforms.
>>> v6: rebased on https://patchwork.freedesktop.org/series/117480/
>>> v7: rebased on https://patchwork.freedesktop.org/series/117528/
>>> v8: dropped the two dependent patches that has been merged
>>>  separately. Add IGT link and Tested-by (MESA).
>>> v9: addressing comments (Andi)
>>> v10: acked-by and tested-by MESA
>>> v11: drop "end support for set caching ioctl" (merged)
>>>   remove tools/include/uapi/drm/i915_drm.h
>>> v12: drop Bspec reference in comment. add to commit message instead
>>> v13: sent to test with igt@gem_create@create-ext-set-pat
>>> v14: sent to test with igt@gem_create@create-ext-set-pat
>>> v15: update commit message with documentation note and t-b/a-b from
>>>   Media driver folks.
>>>
>>> Fei Yang (1):
>>>drm/i915: Allow user to set cache at BO creation
>>>
>>>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 +++
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>>>   include/uapi/drm/i915_drm.h| 41 ++
>>>   3 files changed, 83 insertions(+)
>>>
>>
>> Try with:
>>
>> Test-with: 20230526172221.1438998-1-fei.y...@intel.com
>>
>> That is how it is supposed to be done, to do a CI run against a test
>> case not yet merged that is.

Yes, the result can be found at https://patchwork.freedesktop.org/series/116870/
, under rev14, expand Fi.CI.IGT, you would see,

New IGT tests (1)
igt@gem_create@create-ext-set-pat:
Statuses : 6 pass(s)
Exec time: [0.0] s

> Or I see that IGT has been since merged so you probably have results
> already?

Seems like the last update ran into some random failure which caused CI to stop.

> Regards,
>
> Tvrtko



Re: [PATCH v15 1/1] drm/i915: Allow user to set cache at BO creation

2023-06-04 Thread Yang, Fei
> Hi Fei,
>
> On Wed, May 31, 2023 at 10:10:08AM -0700, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> To comply with the design that buffer objects shall have immutable
>> cache setting through out their life cycle, {set, get}_caching ioctl's
>> are no longer supported from MTL onward. With that change caching
>> policy can only be set at object creation time. The current code
>> applies a default (platform dependent) cache setting for all objects.
>> However this is not optimal for performance tuning. The patch extends
>> the existing gem_create uAPI to let user set PAT index for the object
>> at creation time.
>> The new extension is platform independent, so UMD's can switch to using
>> this extension for older platforms as well, while {set, get}_caching are
>> still supported on these legacy paltforms for compatibility reason.
>>
>> Note: The detailed description of PAT index is missing in current PRM
>> even for older hardware and will be added by the next PRM update under
>> chapter name "Memory Views".
>
> Documentation has been updated:
>
> https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/tiger-lake.html
>
> If it's OK with you, before pushing I can replace this Note with:
>
>"
>The documentation related to the PAT/MOCS tables is currently
>available for Tiger Lake here:
>
>https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/tiger-lake.html
>"

Looks good to me. Thank you Andi and Tvrtko for all your help.

-Fei

>Thank you Tvrtko for the intution you had about the documentation
>and for pushing for the update. It is greate to have this uAPI
>well documented!
>
>Andi
>
>> BSpec: 45101
>>
>> Mesa support has been submitted in this merge request:
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878
>>
>> The media driver is supported by the following commits:
>> https://github.com/intel/media-driver/commit/92c00a857433ebb34ec575e9834f473c6fcb6341
>> https://github.com/intel/media-driver/commit/fd375cf2c5e1f6bf6b43258ff797b3134aadc9fd
>> https://github.com/intel/media-driver/commit/08dd244b22484770a33464c2c8ae85430e548000
>>
>> The IGT test related to this change is
>> igt@gem_create@create-ext-set-pat
>>
>> Signed-off-by: Fei Yang 
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Cc: Andi Shyti 
>> Reviewed-by: Andi Shyti 
>> Acked-by: Jordan Justen 
>> Tested-by: Jordan Justen 
>> Acked-by: Carl Zhang 
>> Tested-by: Lihao Gu 
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 +++
>>  drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>>  include/uapi/drm/i915_drm.h| 41 ++
>>  3 files changed, 83 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> index bfe1dbda4cb7..644a936248ad 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> @@ -245,6 +245,7 @@ struct create_ext {
>>unsigned int n_placements;
>>unsigned int placement_mask;
>>unsigned long flags;
>> + unsigned int pat_index;
>>  };
>>
>>  static void repr_placements(char *buf, size_t size,
>> @@ -394,11 +395,39 @@ static int ext_set_protected(struct 
>> i915_user_extension __user *base, void *data
>>return 0;
>>  }
>>
>> +static int ext_set_pat(struct i915_user_extension __user *base, void *data)
>> +{
>> + struct create_ext *ext_data = data;
>> + struct drm_i915_private *i915 = ext_data->i915;
>> + struct drm_i915_gem_create_ext_set_pat ext;
>> + unsigned int max_pat_index;
>> +
>> + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
>> +  offsetofend(struct drm_i915_gem_create_ext_set_pat, 
>> rsvd));
>> +
>> + if (copy_from_user(, base, sizeof(ext)))
>> + return -EFAULT;
>> +
>> + max_pat_index = INTEL_INFO(i915)->max_pat_index;
>> +
>> + if (ext.pat_index > max_pat_index) {
>> + drm_dbg(>drm, "PAT index is invalid: %u\n",
>> + ext.pat_index);
>> + return -EINVAL;
>> + }
>> +
>> + ext_data->pat_index = ext.pat_index;
>> +
>> + return 0;
>> +}
>> +
>>  static const i915_user_extension_fn create_extensions[] = {
>>[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>> + [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
>>  };
>>
>> +#define PAT_INDEX_NOT_SET0x
>>  /**
>>   * i915_gem_create_ext_ioctl - Creates a new mm object and returns a handle 
>> to it.
>>   * @dev: drm device pointer
>> @@ -418,6 +447,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
>> *data,
>>if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
>>return -EINVAL;
>>
>> + ext_data.pat_index = PAT_INDEX_NOT_SET;
>>ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
>>   

RE: [PATCH 2/2] drm/i915/gt: Fix parameter in gmch_ggtt_insert_{entries,page}()

2023-05-30 Thread Yang, Fei
> Subject: [PATCH 2/2] drm/i915/gt: Fix parameter in 
> gmch_ggtt_insert_{entries,page}()
>
> When building with clang's -Wincompatible-function-pointer-types-strict,
> the following warnings occur:
>
>   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c:102:23: error: incompatible 
> function pointer types assigning to 'void (*)(struct i915_address_space *, 
> dma_addr_t, u64, unsigned int, u32)' (aka 'void (*)(struct i915_address_space 
> *, unsigned int, unsigned long long, unsigned int, unsigned int)') from 'void 
> (struct i915_address_space *, dma_addr_t, u64, enum i915_cache_level, u32)' 
> (aka 'void (struct i915_address_space *, unsigned int, unsigned long long, 
> enum i915_cache_level, unsigned int)') 
> [-Werror,-Wincompatible-function-pointer-types-strict]
>   ggtt->vm.insert_page = gmch_ggtt_insert_page;
>^ ~
>   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c:103:26: error: incompatible 
> function pointer types assigning to 'void (*)(struct i915_address_space *, 
> struct i915_vma_resource *, unsigned int, u32)' (aka 'void (*)(struct 
> i915_address_space *, struct i915_vma_resource *, unsigned int, unsigned 
> int)') from 'void (struct i915_address_space *, struct i915_vma_resource *, 
> enum i915_cache_level, u32)' (aka 'void (struct i915_address_space *, struct 
> i915_vma_resource *, enum i915_cache_level, unsigned int)') [-Werror, 
> -Wincompatible-function-pointer-types-strict]
>   ggtt->vm.insert_entries = gmch_ggtt_insert_entries;
>   ^ 
>   2 errors generated.
>
> The warning is pointing out that while 'enum i915_cache_level' and 'unsigned 
> int' are ABI compatible, these indirect calls will fail clang's kernel 
> Control Flow Integrity (kCFI) checks, as the callback's signature does not 
> exactly match the prototype's signature.
>
> To fix this, replace the cache_level parameter with pat_index, as was done in 
> other places within i915 where there is no difference between cache_level and 
> pat_index on certain generations.
>
> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Signed-off-by: Nathan Chancellor 

Reviewed-by: Fei Yang 

> ---
>  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c 
> b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> index d6a74ae2527b..866c416afb73 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
> @@ -18,10 +18,10 @@
>  static void gmch_ggtt_insert_page(struct i915_address_space *vm,
> dma_addr_t addr,
> u64 offset,
> -   enum i915_cache_level cache_level,
> +   unsigned int pat_index,
> u32 unused)
>  {
> - unsigned int flags = (cache_level == I915_CACHE_NONE) ?
> + unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>   AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>
>   intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags); @@ 
> -29,10 +29,10 @@ static void gmch_ggtt_insert_page(struct i915_address_space 
> *vm,
>
>  static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
>struct i915_vma_resource *vma_res,
> -  enum i915_cache_level cache_level,
> +  unsigned int pat_index,
>u32 unused)
>  {
> - unsigned int flags = (cache_level == I915_CACHE_NONE) ?
> + unsigned int flags = (pat_index == I915_CACHE_NONE) ?
>   AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
>
>   intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> 
> PAGE_SHIFT,
>
> --
> 2.40.1


RE: [PATCH 1/2] drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks

2023-05-30 Thread Yang, Fei
> Subject: [PATCH 1/2] drm/i915/gt: Fix second parameter type of pre-gen8 
> pte_encode callbacks
>
> When booting a kernel compiled with CONFIG_CFI_CLANG (kCFI), there is a CFI 
> failure in ggtt_probe_common() when trying to call hsw_pte_encode() via an 
> indirect call:
>
>   [5.030027] CFI failure at ggtt_probe_common+0xd1/0x130 [i915] (target: 
> hsw_pte_encode+0x0/0x30 [i915]; expected type: 0xf5c1d0fc)
>
> With kCFI, indirect calls are validated against their expected type versus 
> actual type and failures occur when the two types do not match.
>
> clang's -Wincompatible-function-pointer-types-strict can catch this at 
> compile time but it is not enabled for the kernel yet:
>
>   drivers/gpu/drm/i915/gt/intel_ggtt.c:1155:23: error: incompatible function 
> pointer types assigning to 'u64 (*)(dma_addr_t, unsigned int, u32)' (aka 
> 'unsigned long long (*)(unsigned int, unsigned int, unsigned int)') from 'u64 
> (dma_addr_t,
>   enum i915_cache_level, u32)' (aka 'unsigned long long (unsigned int, enum 
> i915_cache_level, unsigned int)') 
> [-Werror,-Wincompatible-function-pointer-types-strict]
>   ggtt->vm.pte_encode = iris_pte_encode;
>   ^ ~~~
>   drivers/gpu/drm/i915/gt/intel_ggtt.c:1157:23: error: incompatible function 
> pointer types assigning to 'u64 (*)(dma_addr_t, unsigned int, u32)' (aka 
> 'unsigned long long (*)(unsigned int, unsigned int, unsigned int)') from 'u64 
> (dma_addr_t,
>   enum i915_cache_level, u32)' (aka 'unsigned long long (unsigned int, enum 
> i915_cache_level, unsigned int)') 
> [-Werror,-Wincompatible-function-pointer-types-strict]
>   ggtt->vm.pte_encode = hsw_pte_encode;
>   ^ ~~
>   drivers/gpu/drm/i915/gt/intel_ggtt.c:1159:23: error: incompatible function 
> pointer types assigning to 'u64 (*)(dma_addr_t, unsigned int, u32)' (aka 
> 'unsigned long long (*)(unsigned int, unsigned int, unsigned int)') from 'u64 
> (dma_addr_t,
>   enum i915_cache_level, u32)' (aka 'unsigned long long (unsigned int, enum 
> i915_cache_level, unsigned int)') 
> [-Werror,-Wincompatible-function-pointer-types-strict]
>   ggtt->vm.pte_encode = byt_pte_encode;
>   ^ ~~
>   drivers/gpu/drm/i915/gt/intel_ggtt.c:1161:23: error: incompatible function 
> pointer types assigning to 'u64 (*)(dma_addr_t, unsigned int, u32)' (aka 
> 'unsigned long long (*)(unsigned int, unsigned int, unsigned int)') from 'u64 
> (dma_addr_t,
>   enum i915_cache_level, u32)' (aka 'unsigned long long (unsigned int, enum 
> i915_cache_level, unsigned int)') 
> [-Werror,-Wincompatible-function-pointer-types-strict]
>   ggtt->vm.pte_encode = ivb_pte_encode;
>   ^ ~~
>   drivers/gpu/drm/i915/gt/intel_ggtt.c:1163:23: error: incompatible function 
> pointer types assigning to 'u64 (*)(dma_addr_t, unsigned int, u32)' (aka 
> 'unsigned long long (*)(unsigned int, unsigned int, unsigned int)') from 'u64 
> (dma_addr_t,
>   enum i915_cache_level, u32)' (aka 'unsigned long long (unsigned int, enum 
> i915_cache_level, unsigned int)') 
> [-Werror,-Wincompatible-function-pointer-types-strict]
>   ggtt->vm.pte_encode = snb_pte_encode;
>   ^ ~~
>   5 errors generated.
>
> In this case, the pre-gen8 pte_encode functions have a second parameter type 
> of 'enum i915_cache_level' whereas the function pointer prototype in 'struct 
> i915_address_space' expects a second parameter type of 'unsigned int'.
>
> Update the second parameter of the callbacks and the comment above them 
> noting that these statements are still valid, which matches other functions 
> and files, to clear up the kCFI failures at run time.
>
> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Signed-off-by: Nathan Chancellor 

Reviewed-by: Fei Yang 

> ---
>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 2a7942fac798..122197737ef2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -1015,16 +1015,16 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>
>  /*
>   * For pre-gen8 platforms pat_index is the same as enum i915_cache_level,
> - * so these PTE encode functions are left with using cache_level.
> + * so the switch-case statements in these PTE encode functions are still 
> valid.
>   * See translation table LEGACY_CACHELEVEL.
>   */
>  static u64 snb_pte_encode(dma_addr_t addr,
> -   enum i915_cache_level level,
> +   unsigned int pat_index,
> u32 flags)
>  {
>   gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | 

RE: [Intel-gfx] [PATCH v12 0/1] drm/i915: Allow user to set cache at BO creation

2023-05-25 Thread Yang, Fei
Sorry replied on top of wrong thread.

From: Yang, Fei 
Sent: Thursday, May 25, 2023 8:12 AM
To: Tvrtko Ursulin ; 
intel-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org; Vivi, Rodrigo 
Subject: Re: [Intel-gfx] [PATCH v12 0/1] drm/i915: Allow user to set cache at 
BO creation


Sounds weird to have a platform restriction on uAPI though. UMD not using this 
extension is not a problem, is it?


From: Tvrtko Ursulin 
mailto:tvrtko.ursu...@linux.intel.com>>
Sent: Thursday, May 25, 2023 1:33 AM
To: Yang, Fei mailto:fei.y...@intel.com>>; 
intel-...@lists.freedesktop.org<mailto:intel-...@lists.freedesktop.org> 
mailto:intel-...@lists.freedesktop.org>>
Cc: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> 
mailto:dri-devel@lists.freedesktop.org>>; 
Vivi, Rodrigo mailto:rodrigo.v...@intel.com>>
Subject: Re: [Intel-gfx] [PATCH v12 0/1] drm/i915: Allow user to set cache at 
BO creation


On 24/05/2023 21:02, fei.y...@intel.com<mailto:fei.y...@intel.com> wrote:
> From: Fei Yang mailto:fei.y...@intel.com>>
>
> This series introduce a new extension for GEM_CREATE,
> 1. end support for set caching ioctl [PATCH 1/2]
> 2. add set_pat extension for gem_create [PATCH 2/2]
>
> v2: drop one patch that was merged separately
>  commit 341ad0e8e254 ("drm/i915/mtl: Add PTE encode function")
> v3: rebased on https://patchwork.freedesktop.org/series/117082/
> v4: fix missing unlock introduced in v3, and
>  solve a rebase conflict
> v5: replace obj->cache_level with pat_set_by_user,
>  fix i915_cache_level_str() for legacy platforms.
> v6: rebased on https://patchwork.freedesktop.org/series/117480/
> v7: rebased on https://patchwork.freedesktop.org/series/117528/
> v8: dropped the two dependent patches that has been merged
>  separately. Add IGT link and Tested-by (MESA).
> v9: addressing comments (Andi)
> v10: acked-by and tested-by MESA
> v11: drop "end support for set caching ioctl" (merged)
>   remove tools/include/uapi/drm/i915_drm.h
> v12: drop Bspec reference in comment. add to commit message instead
>
> Fei Yang (1):
>drm/i915: Allow user to set cache at BO creation
>
>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 +++
>   drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>   include/uapi/drm/i915_drm.h| 41 ++
>   3 files changed, 83 insertions(+)
>

Do you also have a Test-with: run against the new IGT somewhere?

Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v12 0/1] drm/i915: Allow user to set cache at BO creation

2023-05-25 Thread Yang, Fei
> On 24/05/2023 21:02, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> This series introduce a new extension for GEM_CREATE,
>> 1. end support for set caching ioctl [PATCH 1/2]
>> 2. add set_pat extension for gem_create [PATCH 2/2]
>>
>> v2: drop one patch that was merged separately
>>  commit 341ad0e8e254 ("drm/i915/mtl: Add PTE encode function")
>> v3: rebased on https://patchwork.freedesktop.org/series/117082/
>> v4: fix missing unlock introduced in v3, and
>>  solve a rebase conflict
>> v5: replace obj->cache_level with pat_set_by_user,
>>  fix i915_cache_level_str() for legacy platforms.
>> v6: rebased on https://patchwork.freedesktop.org/series/117480/
>> v7: rebased on https://patchwork.freedesktop.org/series/117528/
>> v8: dropped the two dependent patches that has been merged
>>  separately. Add IGT link and Tested-by (MESA).
>> v9: addressing comments (Andi)
>> v10: acked-by and tested-by MESA
>> v11: drop "end support for set caching ioctl" (merged)
>>   remove tools/include/uapi/drm/i915_drm.h
>> v12: drop Bspec reference in comment. add to commit message instead
>>
>> Fei Yang (1):
>>drm/i915: Allow user to set cache at BO creation
>>
>>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 +++
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>>   include/uapi/drm/i915_drm.h| 41 ++
>>   3 files changed, 83 insertions(+)
>>
>
> Do you also have a Test-with: run against the new IGT somewhere?

I ran locate test, not sure how to get a IGT result with updated kernel on 
patchwork,
looks like a chicken and egg problem.

> Regards,
>
> Tvrtko



Re: [Intel-gfx] [PATCH v12 0/1] drm/i915: Allow user to set cache at BO creation

2023-05-25 Thread Yang, Fei
Sounds weird to have a platform restriction on uAPI though. UMD not using this 
extension is not a problem, is it?



From: Tvrtko Ursulin 
Sent: Thursday, May 25, 2023 1:33 AM
To: Yang, Fei ; intel-...@lists.freedesktop.org 

Cc: dri-devel@lists.freedesktop.org ; Vivi, 
Rodrigo 
Subject: Re: [Intel-gfx] [PATCH v12 0/1] drm/i915: Allow user to set cache at 
BO creation


On 24/05/2023 21:02, fei.y...@intel.com wrote:
> From: Fei Yang 
>
> This series introduce a new extension for GEM_CREATE,
> 1. end support for set caching ioctl [PATCH 1/2]
> 2. add set_pat extension for gem_create [PATCH 2/2]
>
> v2: drop one patch that was merged separately
>  commit 341ad0e8e254 ("drm/i915/mtl: Add PTE encode function")
> v3: rebased on https://patchwork.freedesktop.org/series/117082/
> v4: fix missing unlock introduced in v3, and
>  solve a rebase conflict
> v5: replace obj->cache_level with pat_set_by_user,
>  fix i915_cache_level_str() for legacy platforms.
> v6: rebased on https://patchwork.freedesktop.org/series/117480/
> v7: rebased on https://patchwork.freedesktop.org/series/117528/
> v8: dropped the two dependent patches that has been merged
>  separately. Add IGT link and Tested-by (MESA).
> v9: addressing comments (Andi)
> v10: acked-by and tested-by MESA
> v11: drop "end support for set caching ioctl" (merged)
>   remove tools/include/uapi/drm/i915_drm.h
> v12: drop Bspec reference in comment. add to commit message instead
>
> Fei Yang (1):
>drm/i915: Allow user to set cache at BO creation
>
>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 +++
>   drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>   include/uapi/drm/i915_drm.h| 41 ++
>   3 files changed, 83 insertions(+)
>

Do you also have a Test-with: run against the new IGT somewhere?

Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v9 2/2] drm/i915: Allow user to set cache at BO creation

2023-05-17 Thread Yang, Fei
> On 16/05/2023 19:11, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> To comply with the design that buffer objects shall have immutable
>> cache setting through out their life cycle, {set, get}_caching ioctl's
>> are no longer supported from MTL onward. With that change caching
>> policy can only be set at object creation time. The current code
>> applies a default (platform dependent) cache setting for all objects.
>> However this is not optimal for performance tuning. The patch extends
>> the existing gem_create uAPI to let user set PAT index for the object
>> at creation time.
>> The new extension is platform independent, so UMD's can switch to using
>> this extension for older platforms as well, while {set, get}_caching are
>> still supported on these legacy paltforms for compatibility reason.
>>
>> Test igt@gem_create@create_ext_set_pat posted at
>> https://patchwork.freedesktop.org/series/117695/
>>
>> Tested with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878
>>
>> Signed-off-by: Fei Yang 
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Cc: Andi Shyti 
>> Reviewed-by: Andi Shyti 
>> Tested-by: Jordan Justen 
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 +++
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>>   include/uapi/drm/i915_drm.h| 42 ++
>>   tools/include/uapi/drm/i915_drm.h  | 42 ++
>>   4 files changed, 126 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> index bfe1dbda4cb7..644a936248ad 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
>> @@ -245,6 +245,7 @@ struct create_ext {
>>unsigned int n_placements;
>>unsigned int placement_mask;
>>unsigned long flags;
>> + unsigned int pat_index;
>>   };
>>
>>   static void repr_placements(char *buf, size_t size,
>> @@ -394,11 +395,39 @@ static int ext_set_protected(struct 
>> i915_user_extension __user *base, void *data
>>return 0;
>>   }
>>
>> +static int ext_set_pat(struct i915_user_extension __user *base, void *data)
>> +{
>> + struct create_ext *ext_data = data;
>> + struct drm_i915_private *i915 = ext_data->i915;
>> + struct drm_i915_gem_create_ext_set_pat ext;
>> + unsigned int max_pat_index;
>> +
>> + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
>> +  offsetofend(struct drm_i915_gem_create_ext_set_pat, 
>> rsvd));
>> +
>> + if (copy_from_user(, base, sizeof(ext)))
>> + return -EFAULT;
>> +
>> + max_pat_index = INTEL_INFO(i915)->max_pat_index;
>> +
>> + if (ext.pat_index > max_pat_index) {
>> + drm_dbg(>drm, "PAT index is invalid: %u\n",
>> + ext.pat_index);
>> + return -EINVAL;
>> + }
>> +
>> + ext_data->pat_index = ext.pat_index;
>> +
>> + return 0;
>> +}
>> +
>>   static const i915_user_extension_fn create_extensions[] = {
>>[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>>[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
>> + [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
>>   };
>>
>> +#define PAT_INDEX_NOT_SET0x
>>   /**
>>* i915_gem_create_ext_ioctl - Creates a new mm object and returns a 
>> handle to it.
>>* @dev: drm device pointer
>> @@ -418,6 +447,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
>> *data,
>>if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
>>return -EINVAL;
>>
>> + ext_data.pat_index = PAT_INDEX_NOT_SET;
>>ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
>>   create_extensions,
>>   ARRAY_SIZE(create_extensions),
>> @@ -454,5 +484,11 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
>> *data,
>>if (IS_ERR(obj))
>>return PTR_ERR(obj);
>>
>> + if (ext_data.pat_index != PAT_INDEX_NOT_SET) {
>> + i915_gem_object_set_pat_index(obj, ext_data.pat_index);
>> + /* Mark pat_index is set by UMD */
>> + obj->pat_set_by_user = true;
>> + }
>> +
>>return i915_gem_publish(obj, file, >size, >handle);
>>   }
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 46a19b099ec8..97ac6fb37958 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -208,6 +208,12 @@ bool i915_gem_object_can_bypass_llc(struct 
>> drm_i915_gem_object *obj)
>>if (!(obj->flags & I915_BO_ALLOC_USER))
>>return false;
>>
>> + /*
>> +  * Always flush cache for UMD objects at creation time.
>> +  */
>> + if (obj->pat_set_by_user)
>> + return true;
>> +
>>/*
>> * EHL and JSL 

Re: [PATCH v8 2/2] drm/i915: Allow user to set cache at BO creation

2023-05-15 Thread Yang, Fei
> Hi Fei,
>
> On Fri, May 12, 2023 at 04:28:25PM -0700, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> To comply with the design that buffer objects shall have immutable
>> cache setting through out their life cycle, {set, get}_caching ioctl's
>> are no longer supported from MTL onward. With that change caching
>> policy can only be set at object creation time. The current code
>> applies a default (platform dependent) cache setting for all objects.
>> However this is not optimal for performance tuning. The patch extends
>> the existing gem_create uAPI to let user set PAT index for the object
>> at creation time.
>> The new extension is platform independent, so UMD's can switch to using
>> this extension for older platforms as well, while {set, get}_caching are
>> still supported on these legacy paltforms for compatibility reason.
>>
>> IGT posted at https://patchwork.freedesktop.org/series/117695/
>
> Test gem_create@create-ext-set-pat
>
>> Tested with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878
>>
>> Tested-by: Jordan Justen 
>
> we need here an explicit ack to have the paper work in place. So
> that I still have to ask Jordan and Mesa folks to give an ack if
> things look right.

Will update once an a-b is in place.

> Thanks!
> Andi
>
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Cc: Andi Shyti 
>> Signed-off-by: Fei Yang 
>> Reviewed-by: Andi Shyti 
>
>PS:
>
> nitnitnitpick: the tags need to come in chronological order. So
> that:
>
>  - first you wrote it (Sob: Fei...)
>  - then you sent it (Cc: ...)
>  - then it has been reviewd (R-b)
>  - finally tested (T-b)
>
> I see that many people put the "Cc:" before the "Sob:" and I
> consider it a matter of taste (which might mean "I first prepare
> the mail (Cc:) and then I send it (Sob:)").
>
> But... don't mind too much at these things.
>
>Andi



Re: [PATCH v7 4/4] drm/i915: Allow user to set cache at BO creation

2023-05-12 Thread Yang, Fei
> On 2023-05-10 15:14:16, Andi Shyti wrote:
>> Hi,
>>
>> On Tue, May 09, 2023 at 09:59:42AM -0700, fei.y...@intel.com wrote:
>>> From: Fei Yang 
>>>
>>> To comply with the design that buffer objects shall have immutable
>>> cache setting through out their life cycle, {set, get}_caching ioctl's
>>> are no longer supported from MTL onward. With that change caching
>>> policy can only be set at object creation time. The current code
>>> applies a default (platform dependent) cache setting for all objects.
>>> However this is not optimal for performance tuning. The patch extends
>>> the existing gem_create uAPI to let user set PAT index for the object
>>> at creation time.
>>> The new extension is platform independent, so UMD's can switch to using
>>> this extension for older platforms as well, while {set, get}_caching are
>>> still supported on these legacy paltforms for compatibility reason.
>>>
>>> Cc: Chris Wilson 
>>> Cc: Matt Roper 
>>> Cc: Andi Shyti 
>>> Signed-off-by: Fei Yang 
>>> Reviewed-by: Andi Shyti 
>>
>> just for a matter of completeness, this is new uapi is tested
>> through the "create-ext-set-pat" test case from the "gem_create"
>> igt test[1]. Can any of the igt maintainers give it a look,
>> comment and ack?
>>
>> The mesa merge request is here [2]. As there is a merge request
>> in progress, would anyone from mesa be so kind to give an ack to
>> this patch, as well?
>>
>> With the mesa ack in place this patch should be ready to go and
>> I'm looking forward to having it in.
>
> I tested my MR [2] in our CI. There was some bad news, but I don't
> think it needs to block these patches.
>
> The good news was that I found that OpenGL testing with our iris
> driver appeared to have ok results when using this interface.
>
> But, our Vulkan Anvil driver was not stable with the current patches
> in the Mesa MR. We will need to debug this further before using the
> interface on Vulkan.
>
> I don't suspect that this is an issue with the kernel interface, so
> you can add:
>
> Tested-by: Jordan Justen 

v8 sent with updates.

> -Jordan

Thanks Jordan.

>>
>> Thanks,
>> Andi
>>
>> [1] https://patchwork.freedesktop.org/patch/534955/?series=117185=1
>> [2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22878
>>



Re: [PATCH v5 2/3] drm/i915: use pat_index instead of cache_level

2023-05-08 Thread Yang, Fei
> On Sun, May 07, 2023 at 11:39:18PM -0700, Yang, Fei wrote:
>>> On Wed, May 03, 2023 at 03:50:59PM -0700, fei.y...@intel.com wrote:
>>>> From: Fei Yang 
>>>>
>>>> Currently the KMD is using enum i915_cache_level to set caching policy for
>>>> buffer objects. This is flaky because the PAT index which really controls
>>>> the caching behavior in PTE has far more levels than what's defined in the
>>>> enum. In addition, the PAT index is platform dependent, having to translate
>>>> between i915_cache_level and PAT index is not reliable, and makes the code
>>>> more complicated.
>>>>
>>>> From UMD's perspective there is also a necessity to set caching policy for
>>>> performance fine tuning. It's much easier for the UMD to directly use PAT
>>>> index because the behavior of each PAT index is clearly defined in Bspec.
>>>> Having the abstracted i915_cache_level sitting in between would only cause
>>>> more ambiguity.
>>>
>>> It may be worth mentioning here that PAT is expected to work much like
>>> MOCS already works today --- the contract on the exact platform-specific
>>> meaning of each index is documented in the hardware spec and userspace
>>> is expected to select the index that exactly matches the behavior it
>>> desires.
>> will update.

Please review https://patchwork.freedesktop.org/series/117480/

>>>>
>>>> For these reasons this patch replaces i915_cache_level with PAT index. Also
>>>> note, the cache_level is not completely removed yet, because the KMD still
>>>> has the need of creating buffer objects with simple cache settings such as
>>>> cached, uncached, or writethrough. For such simple cases, using cache_level
>>>> would help simplify the code.
>>>
>>> See my comment farther down, but the implementation of
>>> i915_gem_object_has_cache_level() seems a bit confusing and you may want
>>> to elaborate on it here.
>>>
>>> Also somewhat confusing from a high-level skim is if/how we're still
>>> using obj->cache_coherent once userspace has taken direct control of the
>>> cache behavior.  Some PAT indices give coherency and others don't (and
>>> the combinations will likely get more complicated on future platforms).
>>> If obj->cache_coherent is still being considered even once PAT indices
>>> are being controlled by userspace, I think we need some explanation of
>>> how that works in the commit message (and likely in the kerneldoc for
>>> that field too).
>> For the objects with pat_index set by user space, coherency is managed by
>> user space. Things like obj->cache_coherent and obj->cache_dirty are still
>> there for objects created by kernel.
>
> Right, that's the challenge --- userspace is taking over control of this
> stuff, but the fields are still around and still used internally within
> the driver.  How we reconcile those two things needs to be clearly
> explained in the commit message and kerneldoc, otherwise we're going to
> wind up with confusing code that's very difficult to maintain down the
> road.
>
> All the cache behavior is complicated enough that it probably wouldn't
> be a bad idea to have a dedicated section of the kerneldoc focused on
> cache behavior.

Updated with kerneldoc and comments.
As discussed off-line, I have also squashed the pte_encode patch.

>>>>
>>>> Cc: Chris Wilson 
>>>> Cc: Matt Roper 
>>>> Signed-off-by: Fei Yang 
>>>> Reviewed-by: Andi Shyti 
>>>> ---
>>>>  drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
>>>>  drivers/gpu/drm/i915/gem/i915_gem_domain.c| 45 ++
>>>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++-
>>>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
>>>>  drivers/gpu/drm/i915/gem/i915_gem_object.c| 51 +++-
>>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
>>>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +-
>>>>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
>>>>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>>>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>>>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>>>  .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
>>>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
>>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 71 
>>&

RE: [PATCH v5 2/5] drm/i915: use pat_index instead of cache_level

2023-05-06 Thread Yang, Fei

   static u64 mtl_pte_encode(dma_addr_t addr,
 -   enum i915_cache_level level,
 +   unsigned int pat_index,
 u32 flags)
>>> Prototype and implementation changed here for mtl_pte_encode.
>>>
>>> And we have:
>>>
>>>if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
>>> ppgtt->vm.pte_encode = mtl_pte_encode;
>>>else
>>> ppgtt->vm.pte_encode = gen8_pte_encode;
>>> So should be same prototype. And:
>>>
>>> u64 (*pte_encode)(dma_addr_t addr,
>>>- enum i915_cache_level level,
>>>+ unsigned int pat_index,
>>>  u32 flags);
>>>/* Create a valid PTE */
>>> Patch relies on the compiler considering enum equal to unsigned int?
>>
>> yes, caller is passing in unsigned int and gets used as enum.
>>
>>> But the implementation of gen8_pte_encode and most ggtt
>>> counterparts is looking at the passed in pat index and thinks it is cache 
>>> level.
>>>
>>> How is that supposed to work?! Or I am blind and am missing something?
>>
>> For legacy platforms translation through LEGACY_CACHELEVEL would not
>> change the value of cache_level. The cache_level and pat_index are
>> basically the same for these platforms.
>
> Oh that is nasty little trick! And I did not spot it being described
> anywhere in the commit message or code comments.

Will add.

> So you are saying for legacy cache_level equals pat_index for what
> caching behaviour is concerned. Ie:
>
> +#define LEGACY_CACHELEVEL \
> + .cachelevel_to_pat = { \
> + [I915_CACHE_NONE]   = 0, \
> + [I915_CACHE_LLC]= 1, \
> + [I915_CACHE_L3_LLC] = 2, \
> + [I915_CACHE_WT] = 3, \
> + }
>
> And because:
>
> enum i915_cache_level {
>   I915_CACHE_NONE = 0,
>   I915_CACHE_LLC,
>   I915_CACHE_L3_LLC,
>   I915_CACHE_WT,
> };
>
> This indeed ends up a 1:1 reversible mapping.
>
> But it is hidden and fragile. What prevents someone from changing the
> enum i915_cache_level? There is no explicit linkage with hardware PAT
> values anywhere. Or at least I don't see it.
>
> I would say all pte_encode signatures have to be changed to pat index.
>
> Which means all pte encode implementations have to understand what pat 
> indices mean.
>
> Which brings us back to that idea of a 2nd table, I paraphrase:
>
> .legacy_pat_to_cache = {
>   [0] = I915_PAT_UC,
>   [1] = I915_PAT_WB,
>   [2] = I915_PAT_WB | I915_PAT_LLC /* not sure on this one */
>   [3] = I915_PAT_WT,
> };
>
> Pat_encode implementations then instead:
>
>   switch (level) {
>   case I915_CACHE_NONE:
>   pte |= PPAT_UNCACHED;
>   ...
>
> Do:
>
>   if (i915->pat_to_cache_flags[pat_index] & I915_PAT_UC)
>   pte |= PPAT_UNCACHED;
>   else if
>   ...
>
>
> But it would require i915 to be passed in which is admittedly a
> noisy diff. Hm.. benefit of hardware agnostic enum i915_cache_level..

That's was one of the problem I ran into when creating this patch.

> Maybe convert pat_index to I915_PAT_.. flags in the callers? Like this:
>
> gen8_ppgtt_insert_pte(...)
> ...
>   const u32 pat_flags = i915->pat_to_cache_flags[pat_index];
>   const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_flags, flags);
>
> Etc. That would be smaller churn on the pte_encode signature.
>
> Maybe helper for i915->pat_to_cache_flags lookup so it can check the array 
> bounds?
>
> If this all sounds too much to you maybe we can do it as a followup.

It's getting more complicated...
But I believe it should be doable to define a PAT table (same in Bspec) and drop
the cache_level. Together with the PPAT bit masks (such as MTL_PPAT_L4_3_UC) 
defined
in intel_gtt.h, the code can be simpler without translation.

> Or perhaps it is actually pointing towards that obj->pat_index is not the most
> elegant choice to be used as a single point of truth.. perhaps 
> obj->cache_flags
> would be better. It would be set at same entry points and it would be hw 
> agnostic
> so could end up more elegant in the driver.
>
> But then I think we need at minimum something like the below in this patch, 
> somewhere:
>
> /*
>   * On pre-Gen12 platforms enum i915_cache_level happens to align
>   * with caching modes as specified in hardware PAT indices. Our
>   * implementation relies on that due tricks played (explain the
>   * tricks) in the pte_encode vfuncs.
>   * Ensure this trick keeps working until the driver can be fully
>   * refactored to support pat indices better.
>   */
> BUILD_BUG_ON(I915_CACHE_NONE != 0);
> ... etc for all enums ...

Will add

> if (gen < 12) {
>   GEM_WARN_ON(i915_gem_get_pat_index(i915, I915_CACHE_NONE) != 0);
>   ... etc for all enums ...

This should not be necessary as long as the enum is verified.

> }
>
>> It is broken for gen12 here. I was asked to separate the
>> gen12_pte_encode change to another patch in the 

Re: [PATCH v5 2/5] drm/i915: use pat_index instead of cache_level

2023-05-04 Thread Yang, Fei
> On 04/05/2023 00:02, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> Currently the KMD is using enum i915_cache_level to set caching policy for
>> buffer objects. This is flaky because the PAT index which really controls
>> the caching behavior in PTE has far more levels than what's defined in the
>> enum. In addition, the PAT index is platform dependent, having to translate
>> between i915_cache_level and PAT index is not reliable, and makes the code
>> more complicated.
>>
>>>From UMD's perspective there is also a necessity to set caching policy for
>> performance fine tuning. It's much easier for the UMD to directly use PAT
>> index because the behavior of each PAT index is clearly defined in Bspec.
>> Having the abstracted i915_cache_level sitting in between would only cause
>> more ambiguity.
>>
>> For these reasons this patch replaces i915_cache_level with PAT index. Also
>> note, the cache_level is not completely removed yet, because the KMD still
>> has the need of creating buffer objects with simple cache settings such as
>> cached, uncached, or writethrough. For such simple cases, using cache_level
>> would help simplify the code.
>>
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Signed-off-by: Fei Yang 
>> Reviewed-by: Andi Shyti 
>
> [snip]
>
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index bb6998d67133..f2334a713c4e 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -56,7 +56,7 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>   }
>
>^^^
>
> How come there are no changes to gen8_pte_encode?

For legacy platforms cache_level is equal to pat_index, so I was thinking
more about reducing number of lines changed.

>vvv
>
>>
>>   static u64 mtl_pte_encode(dma_addr_t addr,
>> -   enum i915_cache_level level,
>> +   unsigned int pat_index,
>>  u32 flags)
>
> Prototype and implementation changed here for mtl_pte_encode.
>
> And we have:
>
>if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
>ppgtt->vm.pte_encode = mtl_pte_encode;
>else
>ppgtt->vm.pte_encode = gen8_pte_encode;
>
> So should be same prototype. And:
>
> u64 (*pte_encode)(dma_addr_t addr,
>- enum i915_cache_level level,
>+ unsigned int pat_index,
>   u32 flags); /* Create a valid PTE */
>
> Patch relies on the compiler considering enum equal to unsigned int?

yes, caller is passing in unsigned int and gets used as enum.

> But the implementation of gen8_pte_encode and most ggtt counterparts is
> looking at the passed in pat index and thinks it is cache level.
>
> How is that supposed to work?! Or I am blind and am missing something?

For legacy platforms translation through LEGACY_CACHELEVEL would not
change the value of cache_level. The cache_level and pat_index are basically
the same for these platforms.

It is broken for gen12 here. I was asked to separate the gen12_pte_encode
change to another patch in the series, but that breaks bisect. Should I
squash 2/5 and 3/5?

> Regards,
>
> Tvrtko



RE: [Intel-gfx] [PATCH v4 2/3] drm/i915: use pat_index instead of cache_level

2023-05-03 Thread Yang, Fei
[...]

>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 8c70a0ec7d2f..27c948350b5b 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -54,6 +54,25 @@ unsigned int i915_gem_get_pat_index(struct 
>> drm_i915_private *i915,
>>  return INTEL_INFO(i915)->cachelevel_to_pat[level];
>>   }
>>
>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>> + enum i915_cache_level lvl)
>> +{
>> +/*
>> + * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
>> + * caching policy through pat_index, in which case the KMD should
>> + * leave the coherency to be managed by user space, simply return
>> + * true here.
>> + */
>> +if (obj->cache_level == I915_CACHE_INVAL)
>> +return true;
>> +
>> +/*
>> + * Otherwise the pat_index should have been converted from cache_level
>> + * so that the following comparison is valid.
>> + */
>> +return obj->pat_index == i915_gem_get_pat_index(obj_to_i915(obj), lvl);
>> +}
>> +
>>   struct drm_i915_gem_object *i915_gem_object_alloc(void)
>>   {
>>  struct drm_i915_gem_object *obj;
>> @@ -133,7 +152,7 @@ void i915_gem_object_set_cache_coherency(struct 
>> drm_i915_gem_object *obj,
>>   {
>>  struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>
>> -obj->cache_level = cache_level;
>> +obj->pat_index = i915_gem_get_pat_index(i915, cache_level);
>
> obj->cache_level is only ever set to "invalid" from the set pat
> extension? Doesn't that make it a boolean so there is no need for three
> bits to hold the enum, just the "pat has been externally set" bit really?

Will update.

>>
>>  if (cache_level != I915_CACHE_NONE)
>>  obj->cache_coherent = (I915_BO_CACHE_COHERENT_FOR_READ |

[...]

>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
>> b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 41389a32e998..9a4922da3a71 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -139,21 +139,56 @@ static const char *stringify_vma_type(const struct 
>> i915_vma *vma)
>>  return "ppgtt";
>>   }
>>
>> -static const char *i915_cache_level_str(struct drm_i915_private *i915, int 
>> type)
>> -{
>> -switch (type) {
>> -case I915_CACHE_NONE: return " uncached";
>> -case I915_CACHE_LLC: return HAS_LLC(i915) ? " LLC" : " snooped";
>> -case I915_CACHE_L3_LLC: return " L3+LLC";
>> -case I915_CACHE_WT: return " WT";
>> -default: return "";
>> +static const char *i915_cache_level_str(struct drm_i915_gem_object *obj)
>> +{
>> +struct drm_i915_private *i915 = obj_to_i915(obj);
>> +
>> +if (IS_METEORLAKE(i915)) {
>> +switch (obj->pat_index) {
>> +case 0: return " WB";
>> +case 1: return " WT";
>> +case 2: return " UC";
>> +case 3: return " WB (1-Way Coh)";
>> +case 4: return " WB (2-Way Coh)";
>> +default: return " not defined";
>> +}
>> +} else if (IS_PONTEVECCHIO(i915)) {
>> +switch (obj->pat_index) {
>> +case 0: return " UC";
>> +case 1: return " WC";
>> +case 2: return " WT";
>> +case 3: return " WB";
>> +case 4: return " WT (CLOS1)";
>> +case 5: return " WB (CLOS1)";
>> +case 6: return " WT (CLOS2)";
>> +case 7: return " WT (CLOS2)";
>> +default: return " not defined";
>> +}
>> +} else if (GRAPHICS_VER(i915) >= 12) {
>> +switch (obj->pat_index) {
>> +case 0: return " WB";
>> +case 1: return " WC";
>> +case 2: return " WT";
>> +case 3: return " UC";
>> +default: return " not defined";
>> +}
>> +} else {
>> +if (i915_gem_object_has_cache_level(obj, I915_CACHE_NONE))
>> +return " uncached";
>
> This will print uncached for all legacy platforms if set pat extension
> has been used, regardless of the index set.

Will update. Should just use obj->pat_index here.

> Are we okay with that? I find it questionable and would say no. It
> diverges from >= 12 and so is confusing.
>
>> +else if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC))
>> +return HAS_LLC(i915) ? " LLC" : " snooped";
>> +else if (i915_gem_object_has_cache_level(obj, 
>> I915_CACHE_L3_LLC))
>> +return " L3+LLC";
>> +else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>> +return " WT";
>> +else
>> +return " not defined";
>
> Another thing is why use different names for caching modes between
> "legacy" and the rest?

For new platforms the string matches bspec. For legacy platforms I think 

Re: [Intel-gfx] [PATCH v2 0/5] drm/i915: Allow user to set cache at BO creation

2023-04-28 Thread Yang, Fei
> On 27/04/2023 17:07, Yang, Fei wrote:
>>> On 26/04/2023 16:41, Yang, Fei wrote:
>>>>> On 26/04/2023 07:24, fei.y...@intel.com wrote:
>>>>>> From: Fei Yang 
>>>>>>
>>>>>> The first three patches in this series are taken from
>>>>>> https://patchwork.freedesktop.org/series/116868/
>>>>>> These patches are included here because the last patch
>>>>>> has dependency on the pat_index refactor.
>>>>>>
>>>>>> This series is focusing on uAPI changes,
>>>>>> 1. end support for set caching ioctl [PATCH 4/5]
>>>>>> 2. add set_pat extension for gem_create [PATCH 5/5]
>>>>>>
>>>>>> v2: drop one patch that was merged separately
>>>>>>  341ad0e8e254 drm/i915/mtl: Add PTE encode function
>>>>>
>>>>> Are the re-sends for stabilizing the series, or focusing on merge?
>>>>
>>>> v2 was sent just to drop one of patches that has already been merged.
>>>>
>>>>> If the latter then opens are:
>>>>>
>>>>> 1) Link to Mesa MR reviewed and ready to merge.
>>>>>
>>>>> 2) IGT reviewed.
>>>>>
>>>>> 3) I raised an open that get/set_caching should not "lie" but return an
>>>>> error if set pat extension has been used. I don't see a good reason not
>>>>> to do that.
>>>>
>>>> I don't think it's "lying" to the userspace. the comparison is only valid
>>>> for objects created by KMD because only such object uses the pat_index
>>>> from the cachelevel_to_pat table.
>>>
>>> Lets double check my understanding is correct. Userspace sequence of
>>> operations:
>>> 1)
>>> obj = gem_create()+set_pat(PAT_UC)
>>>
>>> 2a)
>>> get_caching(obj)
>>> What gets reported?
>>
>> I see your point here. nice catch.
>> That should be handled by,
>> if (obj->cachel_level == I915_CACHE_INVAL) /* indicated pat-index is set by 
>> userspace */
>>   return -EOPNOTSUPP;
>> Will update the patch.
>>
>>>
>>> 2b)
>>> set_caching(obj, I915_CACHE_LLC)
>>> What is the return code?
>>
>> This will either return -EOPNOTSUPP, or ignored because set_pat
>> extension was called.
>
> I see that you made it fail instead of fake success in the latest respin
> and I definitely agree with that.

Thanks for your picky eyes. :)

>>>
>>> If answer to 2a is I915_CACHING_CACHED and to 2b) success, then please
>>> state a good reason why both shouldn't return an error.
>>>
>>>>
>>>>> + Joonas on this one.
>>>>>
>>>>> 4) Refactoring as done is not very pretty and I proposed an idea for a
>>>>> nicer approach. Feasible or not, open for discussion.
>>>>
>>>> Still digesting your proposal. but not sure how would you define things
>>>> like PAT_UC as that is platform dependent, not a constant.
>>>
>>> "PAT_UC" in my outline was purely a driver specific value, similarly as
>>> I915_CACHE_... are.
>>
>> Okay. Then you were suggesting to add a translation table for
>> pat_index-to-(PAT-UC/WB/WT)?
>> It's going to be a N:1 mapping.
>
> PAT index to a value, probably a bitmask, built out of bits which define
> caching modes. Like "PAT_WB | PAT_1WAY_COHERENT", or just PAT_WB. Would
> that approach be enough to express everything?

I was thinking about dumping the PAT tables from Bspec into struct 
intel_device_info {}.
But that would be only useful to unify all those *_setup_private_ppat() 
functions in
intel_gtt.c. Kernel doesn't really care much about PAT index except making sure 
the bits
are programmed correctly in PTE.

>>
>>> With the whole point that driver can ask if
>>> something is uncached, WT or whatever. Using the platform specific
>>> mapping table which converts platform specific obj->pat_index to driver
>>> representation of caching modes (PAT_UC etc).
>>
>> Are you suggesting completely remove i915_cache_level and use
>> (PAT-UC/WB/WT) instead?
>
> Not completely but throughout the most internal code paths, which would
> all just work on PAT index. Basically object always has PAT index set,
> with a separate boolean saying whether it came from gem_create_ext or
> set_cache_level.

What's in my mind as an improvement is to completely remove i915_cache_level.
KMD is 

Re: [PATCH v3 0/5] drm/i915: Allow user to set cache at BO creation

2023-04-28 Thread Yang, Fei
>> On 4/28/23 17:19, Yang, Fei wrote:
>>> On 4/28/23 07:47, fei.y...@intel.com wrote:
>>>> From: Fei Yang 
>>>>
>>>> The first three patches in this series are taken from
>>>> https://patchwork.freedesktop.org/series/116868/
>>>> These patches are included here because the last patch
>>>> has dependency on the pat_index refactor.
>>>>
>>>> This series is focusing on uAPI changes,
>>>> 1. end support for set caching ioctl [PATCH 4/5]
>>>> 2. add set_pat extension for gem_create [PATCH 5/5]
>>>>
>>>> v2: drop one patch that was merged separately
>>>>  341ad0e8e254 drm/i915/mtl: Add PTE encode function
>>>> v3: rebase on https://patchwork.freedesktop.org/series/117082/
>>>
>>> Hi, Fei.
>>>
>>> Does this uAPI update also affect any discrete GPUs supported by i915,
>>
>> It does.
>>
>>> And in that case, does it allow setting non-snooping PAT indices on
>>> those devices?
>>
>> It allows setting PAT indices specified in
>> KMD does a sanity check so that it won't go over the max recommended
>> by bspec.
>>
>>> If so, since the uAPI for discrete GPU devices doesn't allow incoherency
>>> between GPU and CPU (apart from write-combining buffering), the correct
>>> CPU caching mode matching the PAT index needs to be selected for the
>>> buffer object in i915_ttm_select_tt_caching().
>>
>> The patch doesn't affect the CPU caching mode setting logic though.
>> And the caching settings for objects created by kernel should remain
>> the same for both CPU and GPU, objects created by userspace are
>> managed completely by userspace.
>>
>> One question though, what do you mean by non-snooping PAT indices?
>
> Yes, that was actually the bottom question: What do these PAT settings
> allow you to do WRT the snooping on supported discrete devices (DG2) on
> i915?
> If they indeed don't allow disabling snooping, then that's not a problem.

When dGPU's access SysMem, the PCIe default is for that access to snoop the
host's caches. All of our current dGPU's do that -- independent of PAT setting.

> If they do, the ttm code there needs some modification.

I'm not familiar with ttm, but if your concern is that certain PAT index
could disable snooping, that is not possible for current dGPU's.
I think it is possible for Xe2/3 though, because there will be COH_MODE
defined in the PAT registers going forward.

-Fei



Re: [PATCH v3 0/5] drm/i915: Allow user to set cache at BO creation

2023-04-28 Thread Yang, Fei
> On 4/28/23 07:47, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> The first three patches in this series are taken from
>> https://patchwork.freedesktop.org/series/116868/
>> These patches are included here because the last patch
>> has dependency on the pat_index refactor.
>>
>> This series is focusing on uAPI changes,
>> 1. end support for set caching ioctl [PATCH 4/5]
>> 2. add set_pat extension for gem_create [PATCH 5/5]
>>
>> v2: drop one patch that was merged separately
>>  341ad0e8e254 drm/i915/mtl: Add PTE encode function
>> v3: rebase on https://patchwork.freedesktop.org/series/117082/
>
> Hi, Fei.
>
> Does this uAPI update also affect any discrete GPUs supported by i915,

It does.

> And in that case, does it allow setting non-snooping PAT indices on
> those devices?

It allows setting PAT indices specified in 
https://gfxspecs.intel.com/Predator/Home/Index/45101 .
KMD does a sanity check so that it won't go over the max recommended
by bspec.

> If so, since the uAPI for discrete GPU devices doesn't allow incoherency
> between GPU and CPU (apart from write-combining buffering), the correct
> CPU caching mode matching the PAT index needs to be selected for the
> buffer object in i915_ttm_select_tt_caching().

The patch doesn't affect the CPU caching mode setting logic though.
And the caching settings for objects created by kernel should remain
the same for both CPU and GPU, objects created by userspace are
managed completely by userspace.

One question though, what do you mean by non-snooping PAT indices?
The PAT index registers don't really control coherency mode in the past,
I believe MTL is the first one that has COH_MODE in the PAT registers.
Aren't discrete GPUs snooping CPU cache automatically?

-Fei

> Thanks,
> Thomas
>
>>
>> Fei Yang (5):
>>drm/i915: preparation for using PAT index
>>drm/i915: use pat_index instead of cache_level
>>drm/i915: make sure correct pte encode is used
>>drm/i915/mtl: end support for set caching ioctl
>>drm/i915: Allow user to set cache at BO creation
>>
>>   drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
>>   drivers/gpu/drm/i915/gem/i915_gem_create.c| 36 +
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c| 46 ++-
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++-
>>   drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c| 67 +++-
>>   drivers/gpu/drm/i915/gem/i915_gem_object.h|  8 ++
>>   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 26 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  9 ++-
>>   drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
>>   drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>   .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>   .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
>>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 73 +
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c  | 76 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.h   | 20 +++--
>>   drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
>>   drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c |  6 +-
>>   drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 +--
>>   drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
>>   drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>>   drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
>>   drivers/gpu/drm/i915/i915_debugfs.c   | 55 ++---
>>   drivers/gpu/drm/i915/i915_gem.c   | 16 +++-
>>   drivers/gpu/drm/i915/i915_gpu_error.c |  8 +-
>>   drivers/gpu/drm/i915/i915_pci.c   | 79 ---
>>   drivers/gpu/drm/i915/i915_vma.c   | 16 ++--
>>   drivers/gpu/drm/i915/i915_vma.h   |  2 +-
>>   drivers/gpu/drm/i915/i915_vma_types.h |  2 -
>>   drivers/gpu/drm/i915/intel_device_info.h  |  5 ++
>>   drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +-
>>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>>   .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 +++
>>   drivers/gpu/drm/i915/selftests/mock_gtt.c |  8 +-
>>   include/uapi/drm/i915_drm.h   | 36 +
>>   tools/include/uapi/drm/i915_drm.h | 36 +
>>   44 files changed, 621 insertions(+), 243 deletions(-)
>>



Re: [Intel-gfx] [PATCH v2 0/5] drm/i915: Allow user to set cache at BO creation

2023-04-27 Thread Yang, Fei
> On 26/04/2023 16:41, Yang, Fei wrote:
>>> On 26/04/2023 07:24, fei.y...@intel.com wrote:
>>>> From: Fei Yang 
>>>>
>>>> The first three patches in this series are taken from
>>>> https://patchwork.freedesktop.org/series/116868/
>>>> These patches are included here because the last patch
>>>> has dependency on the pat_index refactor.
>>>>
>>>> This series is focusing on uAPI changes,
>>>> 1. end support for set caching ioctl [PATCH 4/5]
>>>> 2. add set_pat extension for gem_create [PATCH 5/5]
>>>>
>>>> v2: drop one patch that was merged separately
>>>>  341ad0e8e254 drm/i915/mtl: Add PTE encode function
>>>
>>> Are the re-sends for stabilizing the series, or focusing on merge?
>>
>> v2 was sent just to drop one of patches that has already been merged.
>>
>>> If the latter then opens are:
>>>
>>> 1) Link to Mesa MR reviewed and ready to merge.
>>>
>>> 2) IGT reviewed.
>>>
>>> 3) I raised an open that get/set_caching should not "lie" but return an
>>> error if set pat extension has been used. I don't see a good reason not
>>> to do that.
>>
>> I don't think it's "lying" to the userspace. the comparison is only valid
>> for objects created by KMD because only such object uses the pat_index
>> from the cachelevel_to_pat table.
>
> Lets double check my understanding is correct. Userspace sequence of
> operations:
> 1)
> obj = gem_create()+set_pat(PAT_UC)
>
> 2a)
> get_caching(obj)
> What gets reported?

I see your point here. nice catch.
That should be handled by,
if (obj->cachel_level == I915_CACHE_INVAL) /* indicated pat-index is set by 
userspace */
  return -EOPNOTSUPP;
Will update the patch.

>
> 2b)
> set_caching(obj, I915_CACHE_LLC)
> What is the return code?

This will either return -EOPNOTSUPP, or ignored because set_pat extension was 
called.

>
> If answer to 2a is I915_CACHING_CACHED and to 2b) success, then please
> state a good reason why both shouldn't return an error.
>
>>
>>> + Joonas on this one.
>>>
>>> 4) Refactoring as done is not very pretty and I proposed an idea for a
>>> nicer approach. Feasible or not, open for discussion.
>>
>> Still digesting your proposal. but not sure how would you define things
>> like PAT_UC as that is platform dependent, not a constant.
>
> "PAT_UC" in my outline was purely a driver specific value, similarly as
> I915_CACHE_... are.

Okay. Then you were suggesting to add a translation table for 
pat_index-to-(PAT-UC/WB/WT)?
It's going to be a N:1 mapping.

> With the whole point that driver can ask if
> something is uncached, WT or whatever. Using the platform specific
> mapping table which converts platform specific obj->pat_index to driver
> representation of caching modes (PAT_UC etc).

Are you suggesting completely remove i915_cache_level and use (PAT-UC/WB/WT) 
instead?
Let's say a KMD object wants to be set as WB, how you get the exact pat_index 
to use?
Note that there are multiple PAT indices having caching mod WB, but they are 
different,
e.g. do you want just WB or WB with 1-way-coherency or WB with 2-way coherency?

Also, in case a checking of pat_index is needed, do we say WB with 
1-way-coherency is
equal to WB or not?

BTW, isn't PAT-UC/WB/WT the same kind of abstraction as enum i915_cache_level, 
I'm not
sure how that would simplify anything.

> Quite possible I missed some detail, but if I haven't then it could be
> a neat and lightweight solution.
>
> Regards,
>
> Tvrtko


Re: [Intel-gfx] [PATCH v2 0/5] drm/i915: Allow user to set cache at BO creation

2023-04-26 Thread Yang, Fei
> On 26/04/2023 07:24, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> The first three patches in this series are taken from
>> https://patchwork.freedesktop.org/series/116868/
>> These patches are included here because the last patch
>> has dependency on the pat_index refactor.
>>
>> This series is focusing on uAPI changes,
>> 1. end support for set caching ioctl [PATCH 4/5]
>> 2. add set_pat extension for gem_create [PATCH 5/5]
>>
>> v2: drop one patch that was merged separately
>>  341ad0e8e254 drm/i915/mtl: Add PTE encode function
>
> Are the re-sends for stabilizing the series, or focusing on merge?

v2 was sent just to drop one of patches that has already been merged.

> If the latter then opens are:
>
> 1) Link to Mesa MR reviewed and ready to merge.
>
> 2) IGT reviewed.
>
> 3) I raised an open that get/set_caching should not "lie" but return an
> error if set pat extension has been used. I don't see a good reason not
> to do that.

I don't think it's "lying" to the userspace. the comparison is only valid
for objects created by KMD because only such object uses the pat_index
from the cachelevel_to_pat table.

> + Joonas on this one.
>
> 4) Refactoring as done is not very pretty and I proposed an idea for a
> nicer approach. Feasible or not, open for discussion.

Still digesting your proposal. but not sure how would you define things
like PAT_UC as that is platform dependent, not a constant.
- Fei

> At a push I can look past that and someone can attempt to tidy the
> driver later.
>
> But without 1-3 we cannot merge this.
>
> Regards,
>
> Tvrtko



Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function

2023-04-24 Thread Yang, Fei
> On Sun, Apr 23, 2023 at 12:37:27AM -0700, Yang, Fei wrote:
>>> On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
>>>>> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.y...@intel.com wrote:
>>>>>> From: Fei Yang 
>>>>>>
>>>>>> PTE encode functions are platform dependent. This patch implements
>>>>>> PTE functions for MTL, and ensures the correct PTE encode function
>>>>>> is used by calling pte_encode function pointer instead of the
>>>>>> hardcoded gen8 version of PTE encode.
>>>>>>
>>>>>> Signed-off-by: Fei Yang 
>>>>>> Reviewed-by: Andrzej Hajda 
>>>>>> Reviewed-by: Andi Shyti 
>>>>>> Acked-by: Nirmoy Das 
>>>>>
>>>>> Bspec: 45015, 45040
>>>>>
>>>>>> ---
>>>>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>>>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 45 
>>>>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 +--
>>>>>>  3 files changed, 72 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
>>>>b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>>>> index b8027392144d..c5eacfdba1a5 100644
>>>>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>>>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>>>>>vm->vma_ops.bind_vma= dpt_bind_vma;
>>>>>>vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>>>>>
>>>>>> - vm->pte_encode = gen8_ggtt_pte_encode;
>>>>>> + vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>>>>>
>>>>>>dpt->obj = dpt_obj;
>>>>>>dpt->obj->is_dpt = true;
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>>  b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>> index 4daaa6f55668..11b91e0453c8 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>>>>return pte;
>>>>>>  }
>>>>>>
>>>>>> +static u64 mtl_pte_encode(dma_addr_t addr,
>>>>>> +   enum i915_cache_level level,
>>>>>> +   u32 flags)
>>>>>> +{
>>>>>> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>>>>> +
>>>>>> + if (unlikely(flags & PTE_READ_ONLY))
>>>>>> + pte &= ~GEN8_PAGE_RW;
>>>>>> +
>>>>>> + if (flags & PTE_LM)
>>>>>> + pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>>>>>
>>>>> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
>>>>> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
>>>>> this trying to do?
>>>>
>>>> This takes effect only for PTE_LM, doesn't affect MTL.
>>>> PTE_NC is needed for PVC (use of access counter).
>>>> I believe this function was writen based on the one for PVC. And this
>>>> function did get extended to cover all gen12 in a later patch.
>>>
>>> Even though MTL doesn't have local memory, PTE_LM is supposed to be
>>> used on MTL for access to BAR2 stolen memory.
>>
>> You were right, but I still think this code is fine because this bit is
>> ignored for MTL anyway and it is needed for other platforms with LMEM.
>> Otherwise this code would have some sort of platform checking which is
>> hard to do because we don't have platform info here.
>> Or we would have to define another PTE encode function for platforms
>> needing PTE_NC just for this one difference, then manage the function
>> pointer correctly.
>
> MTL is the only platform that uses this function right now:
>
>   +   if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
>   +   ppgtt->vm.pte_encode = mtl_pte_encode;
>   +   else
>   +   ppgtt->vm.pte_encode = gen8_pte_encode;
>
> If this is intended for PVC, then you have it in the wrong function to
> begin with (and it also shouldn't be in a patch labelled "mtl").  If
> you're trying to future-proof for some post-MTL discrete platform, then
> such code should be saved until we enable that platform so that it can
> be properly reviewed.

dropped GEN12_PPGTT_PTE_NC bit in v2 of 
https://patchwork.freedesktop.org/series/116900/

> Matt
>
>>
>> -Fei
>>
>>> Matt
>>>
>>>> -Fei
>>>>> Matt
>>>>>
>>>>>> +
>>>>>> + switch (level) {
>>>>>> + case I915_CACHE_NONE:
>>>>>> + pte |= GEN12_PPGTT_PTE_PAT1;
>>>>>> + break;


RE: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function

2023-04-23 Thread Yang, Fei
> On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
>>> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.y...@intel.com wrote:
>>>> From: Fei Yang 
>>>>
>>>> PTE encode functions are platform dependent. This patch implements
>>>> PTE functions for MTL, and ensures the correct PTE encode function
>>>> is used by calling pte_encode function pointer instead of the
>>>> hardcoded gen8 version of PTE encode.
>>>>
>>>> Signed-off-by: Fei Yang 
>>>> Reviewed-by: Andrzej Hajda 
>>>> Reviewed-by: Andi Shyti 
>>>> Acked-by: Nirmoy Das 
>>>
>>> Bspec: 45015, 45040
>>>
>>>> ---
>>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 45 
>>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 +--
>>>>  3 files changed, 72 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
>>b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> index b8027392144d..c5eacfdba1a5 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>>>vm->vma_ops.bind_vma= dpt_bind_vma;
>>>>vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>>>
>>>> - vm->pte_encode = gen8_ggtt_pte_encode;
>>>> + vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>>>
>>>>dpt->obj = dpt_obj;
>>>>dpt->obj->is_dpt = true;
>>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>>  b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> index 4daaa6f55668..11b91e0453c8 100644
>>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>>return pte;
>>>>  }
>>>>
>>>> +static u64 mtl_pte_encode(dma_addr_t addr,
>>>> +   enum i915_cache_level level,
>>>> +   u32 flags)
>>>> +{
>>>> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>>> +
>>>> + if (unlikely(flags & PTE_READ_ONLY))
>>>> + pte &= ~GEN8_PAGE_RW;
>>>> +
>>>> + if (flags & PTE_LM)
>>>> + pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>>>
>>> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
>>> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
>>> this trying to do?
>>
>> This takes effect only for PTE_LM, doesn't affect MTL.
>> PTE_NC is needed for PVC (use of access counter).
>> I believe this function was writen based on the one for PVC. And this
>> function did get extended to cover all gen12 in a later patch.
>
> Even though MTL doesn't have local memory, PTE_LM is supposed to be
> used on MTL for access to BAR2 stolen memory.

You were right, but I still think this code is fine because this bit is
ignored for MTL anyway and it is needed for other platforms with LMEM.
Otherwise this code would have some sort of platform checking which is
hard to do because we don't have platform info here.
Or we would have to define another PTE encode function for platforms
needing PTE_NC just for this one difference, then manage the function
pointer correctly.

-Fei

> Matt
>
>> -Fei
>>> Matt
>>>
>>>> +
>>>> + switch (level) {
>>>> + case I915_CACHE_NONE:
>>>> + pte |= GEN12_PPGTT_PTE_PAT1;
>>>> + break;


RE: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level

2023-04-23 Thread Yang, Fei
> On 20/04/2023 00:00, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> Currently the KMD is using enum i915_cache_level to set caching policy for
>> buffer objects. This is flaky because the PAT index which really controls
>> the caching behavior in PTE has far more levels than what's defined in the
>> enum. In addition, the PAT index is platform dependent, having to translate
>> between i915_cache_level and PAT index is not reliable, and makes the code
>> more complicated.
>>
>>>From UMD's perspective there is also a necessity to set caching policy for
>> performance fine tuning. It's much easier for the UMD to directly use PAT
>> index because the behavior of each PAT index is clearly defined in Bspec.
>> Having the abstracted i915_cache_level sitting in between would only cause
>> more ambiguity.
>>
>> For these reasons this patch replaces i915_cache_level with PAT index. Also
>> note, the cache_level is not completely removed yet, because the KMD still
>> has the need of creating buffer objects with simple cache settings such as
>> cached, uncached, or writethrough. For such simple cases, using cache_level
>> would help simplify the code.
>>
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Signed-off-by: Fei Yang 
>> Reviewed-by: Andi Shyti 
>
> I think have some ideas no how to perhaps make this simpler, please bear
> with me.
>
> In my mind get/set caching ioctls need to be failing once explicit pat
> index has been set by userspace. Or at least not return false information.

By design we are ending the support for set caching ioctl. The patch is included
in this series, "drm/i915/mtl: end support for set caching ioctl"

+   if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+   return -EOPNOTSUPP;
+

> And I don't like i915_gem_object_has_cache_level and
> i915_gem_get_pat_index as a refactoring step.
>
> It also seems that the driver has a need to query the caching mode set
> regardless of the route (of setting).

Only for the objects created by the KMD. For UMD created objects with PAT
index set KMD should never touch the setting.

> So how about this.
>
> Three callers which query the caching mode: use_cpu_reloc, vm_fault_gtt,
> gpu_write_needs_clflush.
>
> We convert them to be like:
>
> i915_gem_object_has_caching_mode(obj, PAT_UC / PAT_WT / ...);

PAT_UC/WT/WB are platform dependent 
(https://gfxspecs.intel.com/Predator/Home/Index/45101),
performing this check you would have to do something like,

if (MTL)
...
else if (PVC)
...
else if (GEN12)
...
else
...

> Then apart from the per platform tables for mapping between cache level
> to pat index, you add tables which map pat index to caching modes
> (PAT_UC, etc, naming TBD, just enums or bitmasks also TBD, I haven't
> looked at the bspec to see how exactly it works).
>
> You would use that table in the i915_gem_object_has_caching_mode helper,
> called from the above three functions instead of obj->cache_level direct
> comparison.
>
> I am assuming at least for instance cache_level != I915_CACHE_NONE would
> be equivalent to i915_gem_object_has_caching_mode(obj, PAT_UC), etc.

So far kernel only needs 4 cache levels defined in enum i915_cache_level,
kernel doesn't need to understand all PAT indices. By desgin if the userspace
is setting PAT index directly, kernel only needs to pass the setting to PTE.

For objects created by kernel (including objects created by userspace without
specifying pat index), there are only 4 options (defined in the 
cachelevel_to_pat).

For objects created by userspace with PAT index set (GEM_CREATE + set_pat 
extension),
kernel should not touch the setting, just pass it to the PAT index bits in PTE.

That's why I was only checking cache_level. Handling PAT index is much more
complicated because of its platform dependent nature and even the number of
PAT indices varies from platform to platform. Fortunately kernel doesn't need
to understand that.

-Fei

> Same mapping table could also be used in debugfs (i915_cache_level_str)
> to universally describe any obj->pat_index, with no need to have
> anything platform dependend there.
>
> In set caching set you always set obj->pat_index and so low level code
> can always just use that.
>
> Unless I am missing something (possible) I think like that we end up
> with no i915_gem_get_pat_index sprinkled around and also no confusing
> i915_gem_object_has_cache_level.
>
> Obj->pat_index would be a single point of truth, while obj->cache_level
> is just a legacy field for get/set_caching ioctl - not used in the
> internal driver flows.
>
> We would need an additional field for storing the boolean of whether
> userspace had overriden the PAT.
>
> Regards,
>
> Tvrtko


RE: [Intel-gfx] [PATCH 7/8] drm/i915: use pat_index instead of cache_level

2023-04-23 Thread Yang, Fei
> On 20/04/2023 00:00, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> Currently the KMD is using enum i915_cache_level to set caching policy
>> for buffer objects. This is flaky because the PAT index which really
>> controls the caching behavior in PTE has far more levels than what's
>> defined in the enum. In addition, the PAT index is platform dependent,
>> having to translate between i915_cache_level and PAT index is not
>> reliable, and makes the code more complicated.
>>
>> From UMD's perspective there is also a necessity to set caching policy for
>> performance fine tuning. It's much easier for the UMD to directly use
>> PAT index because the behavior of each PAT index is clearly defined in Bspec.
>> Having the abstracted i915_cache_level sitting in between would only
>> cause more ambiguity.
>>
>> For these reasons this patch replaces i915_cache_level with PAT index.
>> Also note, the cache_level is not completely removed yet, because the
>> KMD still has the need of creating buffer objects with simple cache
>> settings such as cached, uncached, or writethrough. For such simple
>> cases, using cache_level would help simplify the code.
>>
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Signed-off-by: Fei Yang 
>> Reviewed-by: Andi Shyti 
>
> [snip]
>
>>
>>   bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object
>> *obj) @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct 
>> drm_i915_gem_object *obj,
>>   {
>>  int ret;
>>
>> -if (obj->cache_level == cache_level)
>> +if (i915_gem_object_has_cache_level(obj, cache_level))
>>  return 0;
>
> When userspace calls i915_gem_set_caching_ioctl

We are ending the support for set_caching_ioctl.

> after having set the PAT index explicitly this will make it silently succeed
> regardless of the cache level passed in, no? Because of:

Yes, that's the point. For objects created by userspace with PAT index set,
KMD is not supposed to touch the setting.

> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
> +  enum i915_cache_level lvl)
> +{
> + /*
> +  * cache_level == I915_CACHE_INVAL indicates the UMD's have set the
> +  * caching policy through pat_index, in which case the KMD should
> +  * leave the coherency to be managed by user space, simply return
> +  * true here.
> +  */
> + if (obj->cache_level == I915_CACHE_INVAL)
> + return true;
>
> I think we need to let it know it is doing it wrong with an error.

This is not an error, by design userspace should know exactly what it's doing.

-Fei

> Regards,
>
> Tvrtko


Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function

2023-04-21 Thread Yang, Fei
> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> PTE encode functions are platform dependent. This patch implements
>> PTE functions for MTL, and ensures the correct PTE encode function
>> is used by calling pte_encode function pointer instead of the
>> hardcoded gen8 version of PTE encode.
>>
>> Signed-off-by: Fei Yang 
>> Reviewed-by: Andrzej Hajda 
>> Reviewed-by: Andi Shyti 
>> Acked-by: Nirmoy Das 
>
> Bspec: 45015, 45040
>
>> ---
>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 45 
>>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 +--
>>  3 files changed, 72 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
>> b/drivers/gpu/drm/i915/display/intel_dpt.c
>> index b8027392144d..c5eacfdba1a5 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>vm->vma_ops.bind_vma= dpt_bind_vma;
>>vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>
>> - vm->pte_encode = gen8_ggtt_pte_encode;
>> + vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>
>>dpt->obj = dpt_obj;
>>dpt->obj->is_dpt = true;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 4daaa6f55668..11b91e0453c8 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>return pte;
>>  }
>>
>> +static u64 mtl_pte_encode(dma_addr_t addr,
>> +   enum i915_cache_level level,
>> +   u32 flags)
>> +{
>> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>> +
>> + if (unlikely(flags & PTE_READ_ONLY))
>> + pte &= ~GEN8_PAGE_RW;
>> +
>> + if (flags & PTE_LM)
>> + pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>
> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
> this trying to do?

This takes effect only for PTE_LM, doesn't affect MTL.
PTE_NC is needed for PVC (use of access counter).

I believe this function was writen based on the one for PVC. And this function
did get extended to cover all gen12 in a later patch.

-Fei

> Matt
>
>> +
>> + switch (level) {
>> + case I915_CACHE_NONE:
>> + pte |= GEN12_PPGTT_PTE_PAT1;
>> + break;



Re: [Intel-gfx] [PATCH 8/8] drm/i915: Allow user to set cache at BO creation

2023-04-20 Thread Yang, Fei
> On 20/04/2023 12:39, Andi Shyti wrote:
>> Hi Fei,
>>
>>> To comply with the design that buffer objects shall have immutable
>>> cache setting through out their life cycle, {set, get}_caching ioctl's
>>> are no longer supported from MTL onward. With that change caching
>>> policy can only be set at object creation time. The current code
>>> applies a default (platform dependent) cache setting for all objects.
>>> However this is not optimal for performance tuning. The patch extends
>>> the existing gem_create uAPI to let user set PAT index for the object
>>> at creation time.
>>> The new extension is platform independent, so UMD's can switch to using
>>> this extension for older platforms as well, while {set, get}_caching are
>>> still supported on these legacy paltforms for compatibility reason.
>>>
>>> Cc: Chris Wilson 
>>> Cc: Matt Roper 
>>> Cc: Andi Shyti 
>>> Signed-off-by: Fei Yang 
>>> Reviewed-by: Andi Shyti 
>>
>> because this is an API change, we need some more information
>> here.
>>
>> First of all you need to CC the userspace guys that have been
>> working on top of your series and get their ack's.
>
> Yes, and a link to a Mesa merge request which uses the uapi should be
> included.

Working with Mesa team on this, stay tuned.

> IGTs should be ready to before we can merge. I glanced over igt-dev but
> did not spot anything.

There is a IGT patch posted to gfx-internal-devel, titled "test/gem_create: 
exercise gem_create_ext_set_pat"

> Regards,
>
> Tvrtko
>
>>
>> I also believe that this series has also been tested on a
>> separate repository, would you link it in the commit message?
>>
>> Thanks,
>> Andi



RE: [Intel-gfx] [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-19 Thread Yang, Fei
> Hi Fei,
>
>> +#define MTL_PPGTT_PTE_PAT3  BIT_ULL(62)
>>  #define GEN12_PPGTT_PTE_LM  BIT_ULL(11)
>> +#define GEN12_PPGTT_PTE_PAT2BIT_ULL(7)
>> +#define GEN12_PPGTT_PTE_NC  BIT_ULL(5)
>> +#define GEN12_PPGTT_PTE_PAT1BIT_ULL(4)
>> +#define GEN12_PPGTT_PTE_PAT0BIT_ULL(3)
>>
>> -#define GEN12_GGTT_PTE_LM   BIT_ULL(1)
>> +#define GEN12_GGTT_PTE_LM   BIT_ULL(1)
>> +#define MTL_GGTT_PTE_PAT0   BIT_ULL(52)
>> +#define MTL_GGTT_PTE_PAT1   BIT_ULL(53)
>> +#define GEN12_GGTT_PTE_ADDR_MASKGENMASK_ULL(45, 12)
>> +#define MTL_GGTT_PTE_PAT_MASK   GENMASK_ULL(53, 52)
>>
>>  #define GEN12_PDE_64K BIT(6)
>>  #define GEN12_PTE_PS64 BIT(8)
>> @@ -147,6 +156,15 @@ typedef u64 gen8_pte_t;  #define GEN8_PDE_IPS_64K
>> BIT(11)
>>  #define GEN8_PDE_PS_2M   BIT(7)
>>
>> +#define MTL_PPAT_L4_CACHE_POLICY_MASK   REG_GENMASK(3, 2)
>> +#define MTL_PAT_INDEX_COH_MODE_MASK REG_GENMASK(1, 0)
>> +#define MTL_PPAT_L4_3_UCREG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 3)
>> +#define MTL_PPAT_L4_1_WTREG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 1)
>> +#define MTL_PPAT_L4_0_WBREG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 0)
>> +#define MTL_3_COH_2WREG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 3)
>> +#define MTL_2_COH_1WREG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 2)
>> +#define MTL_0_COH_NON   REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
>
> BTW, are all these defines needed? Not all of them look to be used.

Yes, these are all being used, not in this patch though, but in the next patch 
defining pte_encode functions.
I think the only one that might not be used is MTL_GGTT_PTE_PAT_MASK, because I 
ended up checking each bit instead of taking the PAT bits out and comparing 
against possible values.

-Fei

> Andi


RE: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Set has_llc=0

2023-04-19 Thread Yang, Fei
> Hi Fei,
>
> On Wed, Apr 19, 2023 at 02:12:12PM -0700, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> On MTL, LLC is not shared between GT and CPU, set has_llc=0.
>>
>> Signed-off-by: Fei Yang 
>
> just an unanswered questino from Nirmoy:
>
> This statement is bit unclear to me.  I would say "On MTL, LLC is not shared 
> between GT and CPU"

I have updated the commit message accordingly in this version. see above.

> Reviewed-by: Andi Shyti 
> Reviewed-by: Andrzej Hajda 
> Reviewed-by: Nirmoy Das 
>
> Andi


RE: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media

2023-04-19 Thread Yang, Fei
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index 1803a633ed64..98e682b7df07 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -415,12 +415,6 @@ static int ct_write(struct intel_guc_ct *ct,
>>>  }
>>>  GEM_BUG_ON(tail > size);
>>>
>>> -/*
>>> - * make sure H2G buffer update and LRC tail update (if this triggering 
>>> a
>>> - * submission) are visible before updating the descriptor tail
>>> - */
>>> -intel_guc_write_barrier(ct_to_guc(ct));
>>> -
>>>  /* update local copies */
>>>  ctb->tail = tail;
>>>  GEM_BUG_ON(atomic_read(>space) < len + GUC_CTB_HDR_LEN); @@
>>> -429,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct,
>>>  /* now update descriptor */
>>>  WRITE_ONCE(desc->tail, tail);
>>>
>>> +/*
>>> + * make sure H2G buffer update and LRC tail update (if this triggering 
>>> a
>>> + * submission) are visible before updating the descriptor tail
>>> + */
>>> +intel_guc_write_barrier(ct_to_guc(ct));
>>
>> The comment above needs update,

Never mind, I decided to revert this change because it's not necessary. There 
is a
MMIO write following the ct_write() call which would flush the write combining
buffer anyway, so the barrier is redundant here.

>
>Will update the comment.
>
>> if this is correct change. The question is why it is correct? If yes,
>> it implies that old barrier is incorrect, maybe it should be then separate 
>> fix?
>
> There is WRITE_ONCE(desc->tail, tail) right after the H2G buffer update which 
> is also
> seen by the GuC firmware, the barrier is needed for both, thus moved it down 
> a few
> lines to cover them all.
>
>> I am not an expert, but previous location of the barrier seems sane to
>> me - assure GuC will see proper buffer, before updating buffer's tail.
>
> That is correct, but the barrier is needed for both H2G buffer and 
> descriptor, as they are all shared with GuC firmware.
>
> -Fei
>
>> And according to commit message this new barrier should flush WC
>> buffer, so for me it seems to be different thing.
>> Am I missing something?
>>
>>
>> Regards
>> Andrzej


RE: [Intel-gfx] [PATCH 5/8] drm/i915/mtl: end support for set caching ioctl

2023-04-19 Thread Yang, Fei
> On 17.04.2023 08:25, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> The design is to keep Buffer Object's caching policy immutable through
>> out its life cycle. This patch ends the support for set caching ioctl
>> from MTL onward. While doing that we also set BO's to be 1-way
>> coherent at creation time because GPU is no longer automatically
>> snooping CPU cache.
>>
>> Signed-off-by: Fei Yang 
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c  | 9 -
>>   2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> index d2d5a24301b2..bb3575b1479f 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>> @@ -337,6 +337,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, 
>> void *data,
>>  if (IS_DGFX(i915))
>>  return -ENODEV;
>>
>> +if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
>> +return -EOPNOTSUPP;
>> +
>>  switch (args->caching) {
>>  case I915_CACHING_NONE:
>>  level = I915_CACHE_NONE;
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> index 37d1efcd3ca6..e602c323896b 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> @@ -601,7 +601,14 @@ static int shmem_object_init(struct intel_memory_region 
>> *mem,
>>  obj->write_domain = I915_GEM_DOMAIN_CPU;
>>  obj->read_domains = I915_GEM_DOMAIN_CPU;
>>
>> -if (HAS_LLC(i915))
>> +/*
>> + * MTL doesn't snooping CPU cache by default for GPU access (namely
>
> Sounds strange, "doesn't snoop" ?

Will update.

>> + * 1-way coherency). However some UMD's are currently depending on
>> + * that. Make 1-way coherent the default setting for MTL. A follow
>> + * up patch will extend the GEM_CREATE uAPI to allow UMD's specify
>> + * caching mode at BO creation time
>
> Shouldn't such comment be a part of patch description? or at least removed by
> follow-up patch.

Will update.

-Fei

> Reviewed-by: Andrzej Hajda 
>
> Regards
> Andrzej
>
>
>> + */
>> +if (HAS_LLC(i915) || (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)))
>>  /* On some devices, we can have the GPU use the LLC (the CPU
>>   * cache) for about a 10% performance improvement
>>   * compared to uncached.  Graphics requests other than


RE: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media

2023-04-19 Thread Yang, Fei
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> index 1803a633ed64..98e682b7df07 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> @@ -415,12 +415,6 @@ static int ct_write(struct intel_guc_ct *ct,
>>  }
>>  GEM_BUG_ON(tail > size);
>>
>> -/*
>> - * make sure H2G buffer update and LRC tail update (if this triggering a
>> - * submission) are visible before updating the descriptor tail
>> - */
>> -intel_guc_write_barrier(ct_to_guc(ct));
>> -
>>  /* update local copies */
>>  ctb->tail = tail;
>>  GEM_BUG_ON(atomic_read(>space) < len + GUC_CTB_HDR_LEN); @@
>> -429,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct,
>>  /* now update descriptor */
>>  WRITE_ONCE(desc->tail, tail);
>>
>> +/*
>> + * make sure H2G buffer update and LRC tail update (if this triggering a
>> + * submission) are visible before updating the descriptor tail
>> + */
>> +intel_guc_write_barrier(ct_to_guc(ct));
>
> The comment above needs update,

Will update the comment.

> if this is correct change. The question is why it is correct? If yes, it 
> implies
> that old barrier is incorrect, maybe it should be then separate fix?

There is WRITE_ONCE(desc->tail, tail) right after the H2G buffer update which 
is also
seen by the GuC firmware, the barrier is needed for both, thus moved it down a 
few
lines to cover them all.

> I am not an expert, but previous location of the barrier seems sane to me - 
> assure
> GuC will see proper buffer, before updating buffer's tail.

That is correct, but the barrier is needed for both H2G buffer and descriptor, 
as they
are all shared with GuC firmware.

-Fei

> And according to commit message this new barrier should flush WC buffer, so 
> for me
> it seems to be different thing.
> Am I missing something?
>
>
> Regards
> Andrzej


RE: [Intel-gfx] [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-19 Thread Yang, Fei
>> On MTL, GT can no longer allocate on LLC - only the CPU can.
>> This, along with addition of support for L4 cache calls a
> s/calls a/calls for a

Will update

>> MOCS/PAT table update.
>> Alos the PAT index registers are multicasted for primary GT,
> s/Alos/Also
>> and there is an address jump from index 7 to 8. This patch makes sure 
>> these registers are programmed in the proper way.
>
> "Makes sure that"

Will update.


RE: [Intel-gfx] [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-19 Thread Yang, Fei
>> void setup_private_pat(struct intel_gt *gt)
>>
>>  GEM_BUG_ON(GRAPHICS_VER(i915) < 8);
>>
>> -if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
>> -xehp_setup_private_ppat(gt);
>> -else if (GRAPHICS_VER(i915) >= 12)
>> -tgl_setup_private_ppat(uncore);
>> -else if (GRAPHICS_VER(i915) >= 11)
>> -icl_setup_private_ppat(uncore);
>> -else if (IS_CHERRYVIEW(i915) || IS_GEN9_LP(i915))
>> -chv_setup_private_ppat(uncore);
>> -else
>> -bdw_setup_private_ppat(uncore);
>> +if (gt->type == GT_MEDIA) {
>> +xelpmp_setup_private_ppat(gt->uncore);
>> +} else {
>> +if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
>
> You could just use "else if" here to avoid indendation, for now it would 
> work, up to you.

Will update.

>> +xelpg_setup_private_ppat(gt);
>> +else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
>> +xehp_setup_private_ppat(gt);
>> +else if (GRAPHICS_VER(i915) >= 12)
>> +tgl_setup_private_ppat(uncore);
>> +else if (GRAPHICS_VER(i915) >= 11)
>> +icl_setup_private_ppat(uncore);
>> +else if (IS_CHERRYVIEW(i915) || IS_GEN9_LP(i915))
>> +chv_setup_private_ppat(uncore);
>> +else
>> +bdw_setup_private_ppat(uncore);
>> +}
>>   }
..



RE: [Intel-gfx] [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-19 Thread Yang, Fei
> Hi Fei,
>
> On Sun, Apr 16, 2023 at 11:24:57PM -0700, fei.y...@intel.com wrote:
>> From: Madhumitha Tolakanahalli Pradeep
>> 
>>
>> On MTL, GT can no longer allocate on LLC - only the CPU can.
>> This, along with addition of support for L4 cache calls a MOCS/PAT
>> table update.
>> Alos the PAT index registers are multicasted for primary GT,
>
> /Alos/Also/

Will update

>[...]
>
>> +static void xelpmp_setup_private_ppat(struct intel_uncore *uncore) {
>> +intel_uncore_write(uncore, XELPMP_PAT_INDEX(0), MTL_PPAT_L4_0_WB);
>> +intel_uncore_write(uncore, XELPMP_PAT_INDEX(1), MTL_PPAT_L4_1_WT);
>> +intel_uncore_write(uncore, XELPMP_PAT_INDEX(2), MTL_PPAT_L4_3_UC);
>> +intel_uncore_write(uncore, XELPMP_PAT_INDEX(3),
>> +   MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
>> +intel_uncore_write(uncore, XELPMP_PAT_INDEX(4),
>> +   MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
>
> nit: I think it's more readable if we either keep everything in one
> line or we break the line for everyone. Even if we break the 80
> characters rule.

Will update

>[...]
>
>> @@ -603,16 +639,22 @@ void setup_private_pat(struct intel_gt *gt)
>>
>>  GEM_BUG_ON(GRAPHICS_VER(i915) < 8);
>>
>> -if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
>> -xehp_setup_private_ppat(gt);
>> -else if (GRAPHICS_VER(i915) >= 12)
>> -tgl_setup_private_ppat(uncore);
>> -else if (GRAPHICS_VER(i915) >= 11)
>> -icl_setup_private_ppat(uncore);
>> -else if (IS_CHERRYVIEW(i915) || IS_GEN9_LP(i915))
>> -chv_setup_private_ppat(uncore);
>> -else
>> -bdw_setup_private_ppat(uncore);
>> +if (gt->type == GT_MEDIA) {
>> +xelpmp_setup_private_ppat(gt->uncore);
>
>nit: if you add a return here you save the else.

Will update.

>> +} else {
>> +if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
>> +xelpg_setup_private_ppat(gt);
>> +else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
>> +xehp_setup_private_ppat(gt);
>> +else if (GRAPHICS_VER(i915) >= 12)
>> +tgl_setup_private_ppat(uncore);
>> +else if (GRAPHICS_VER(i915) >= 11)
>> +icl_setup_private_ppat(uncore);
>> +else if (IS_CHERRYVIEW(i915) || IS_GEN9_LP(i915))
>> +chv_setup_private_ppat(uncore);
>> +else
>> +bdw_setup_private_ppat(uncore);
>> +}
>
> [...]
>
>> -static u32 global_mocs_offset(void)
>> +static u32 global_mocs_offset(struct intel_gt *gt)
>>  {
>> -return i915_mmio_reg_offset(GEN12_GLOBAL_MOCS(0));
>> +return i915_mmio_reg_offset(GEN12_GLOBAL_MOCS(0)) +
>> +gt->uncore->gsi_offset;
>
> There is one open question here coming from one of previous Matt's reviews.
> Would it make sense to have this in a different patch?

I would prefer keeping it in this patch because the function is called by
intel_mocs_init(). Without it the MOCS initialization would be broken for
media GT.

> Andi


RE: [Intel-gfx] [PATCH 0/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-17 Thread Yang, Fei
> fei.y...@intel.com kirjoitti 17.4.2023 klo 9.24:
>> From: Fei Yang 
>> 
>> The series includes patches needed to enable MTL.
>> Also add new extension for GEM_CREATE uAPI to let user space set cache 
>> policy for buffer objects.
>
> if I'm counting right, this would be version 5 of the series, yet that is not 
> shown anywhere nor the changes between series..

Yes, mostly addressing minor issues, just want to keep the commit message clean 
as it's the enablement patch set for new platform.

> --
> t



RE: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-13 Thread Yang, Fei
> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO 
> creation
>
> On 2023-04-05 13:26:43, Jordan Justen wrote:
>> On 2023-04-05 00:45:24, Lionel Landwerlin wrote:
>>> On 04/04/2023 19:04, Yang, Fei wrote:
>>>>> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set 
>>>>> cache at BO creation
>>>>>
>>>>> Just like the protected content uAPI, there is no way for 
>>>>> userspace to tell this feature is available other than trying using it.
>>>>>
>>>>> Given the issues with protected content, is it not thing we could want to 
>>>>> add?
>>>> Sorry I'm not aware of the issues with protected content, could you 
>>>> elaborate?
>>>> There was a long discussion on teams uAPI channel, could you 
>>>> comment there if any concerns?
>>>>
>>> 
>>> We wanted to have a getparam to detect protected support and were 
>>> told to detect it by trying to create a context with it.
>>> 
>> 
>> An extensions system where the detection mechanism is "just try it", 
>> and assume it's not supported if it fails. ??
>> 
>
> I guess no one wants to discuss the issues with this so-called detection
> mechanism for i915 extensions. (Just try it and if it fails, it must not
 be supported.)
>
> I wonder how many ioctls we will be making a couple years down the road
> just to see what the kernel supports.
>
> Maybe we'll get more fun 8 second timeouts to deal with. Maybe these probing
> ioctls failing or succeeding will alter the kmd's state in some unexpected 
> way.

For this SET_PAT extension, I can assure you there is no 8 second wait :)
This is definitely a non-blocking call.

> It'll also be fun to debug cases where the driver is not starting up with the
> noise of a bunch of probing ioctls flying by.
>
> I thought about suggesting at least something like 
> I915_PARAM_CMD_PARSER_VERSION,
> but I don't know if that could have prevented this 8 second timeout for 
> creating
> a protected content context. Maybe it's better than nothing though.
>
> Of course, there was also the vague idea I threw out below for getting a list 
> of
> supported extentions.

The detection mechanism itself is an uAPI change, I don't think it's a good 
idea to
combine that with this SET_PAT extension patch.
I suggest we start a discussion in the "i915 uAPI changes" teams channel just 
like
how we sorted out a solution for this setting cache policy issue. Would that 
work?

https://teams.microsoft.com/l/channel/19%3af1767bda6734476ba0a9c7d147b928d1%40thread.skype/i915%2520uAPI%2520changes?groupId=379f3ae1-d138-4205-bb65-d4c7d38cb481=46c98d88-e344-4ed4-8496-4ed7712e255d

-Fei

> -Jordan
>
>> 
>> This seem likely to get more and more problematic as a detection 
>> mechanism as more extensions are added.
>> 
>> > 
>> > Now it appears trying to create a protected context can block for 
>> > several seconds.
>> > 
>> > Since we have to report capabilities to the user even before it 
>> > creates protected contexts, any app is at risk of blocking.
>> > 
>> 
>> This failure path is not causing any re-thinking about using this as 
>> the extension detection mechanism?
>> 
>> Doesn't the ioctl# + input-struct-size + u64-extension# identify the 
>> extension such that the kernel could indicate if it is supported or 
>> not. (Or, perhaps return an array of the supported extensions so the 
>> umd doesn't have to potentially make many ioctls for each extension of
>> interest.)
>> 
>> -Jordan


RE: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-11 Thread Yang, Fei
> Subject: Re: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Define MOCS and PAT tables 
> for MTL
>
> On Mon, Apr 10, 2023 at 08:55:16PM -0700, Yang, Fei wrote:
> ...
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>
>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>
>>>> index 69ce55f517f5..b632167eaf2e 100644
>>
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>
>>>> @@ -88,9 +88,18 @@ typedef u64 gen8_pte_t;
>>
>>>>  #define BYT_PTE_SNOOPED_BY_CPU_CACHES REG_BIT(2)
>>
>>>>  #define BYT_PTE_WRITEABLEREG_BIT(1)
>>
>>>>
>>
>>>> +#define GEN12_PPGTT_PTE_PAT3BIT_ULL(62)
>>
>>>>  #define GEN12_PPGTT_PTE_LM  BIT_ULL(11)
>>
>>>> +#define GEN12_PPGTT_PTE_PAT2BIT_ULL(7)
>>
>>>> +#define GEN12_PPGTT_PTE_NC  BIT_ULL(5)
>>
>>>> +#define GEN12_PPGTT_PTE_PAT1BIT_ULL(4)
>>
>>>> +#define GEN12_PPGTT_PTE_PAT0BIT_ULL(3)
>>
>>> Which bspec page is this from?  The PPGTT descriptions in
>>
>>> the bspec are kind of hard to track down.
>>
>>
>>
>>[9]https://gfxspecs.intel.com/Predator/Home/Index/53521
>
> The bspec tagging is a bit bizarre in this area, but I don't believe
> this page is intended to apply to MTL. Note that this page is inside
> a section specifically listed as "57b VA Support" --- i.e., this
> general section is for platforms like PVC rather than MTL.  MTL only
> has 48b virtual address space (bspec 55416), so I think one of the
> pages in the 48b sections is what we should be referencing instead.
>
> If they screwed up and put MTL-specific details only on a PVC-specific
> page of the bspec, we should probably file a bspec issue about that to
> get it fixed.

The Bspec is a bit confusing on these. Looked at the Bsec with filter set
to TGL/ADL/MTL/ALL respectively. Here are the differences,
>>PAT_Index[2:0] = {PAT, PCD, PWT} = BIT(7, 4, 3)

These 3 PAT index bits are defined for all gen12+.
>>PAT_Index[3] = BIT(62)

PAT_Index[3] is defined for MTL/ARL, will update this one to MTL_xxx

>>PAT_Index[4] = BIT(61)

PAT_Index[4] shows up only when there is no filter set. And this bit is
marked as [NOT VALID FOR SPEC: GENERALASSETSXE], not sure how to interpret
this, but seems like it should not be used at all. Any suggestion?

>>
>>
>>> But if these only apply to MTL, why are they labelled as "GEN12?"
>>
>>These apply to GEN12plus.
>
> That's not what your patch is doing; you're using them in a function
> that only gets called on MTL.

That PTE encode will be generalized to gen12 in a patch after after the
pat_index change.

> So the question is whether these
> definitions truly applied to older platforms like TGL/RKL/ADL/etc too
> (and we need to go back and fix that code), or whether they're
> Xe_LPG-specific, in which case the "GEN12_" prefix is incorrect.

The only difference is that MTL has PAT[3] defined, so we can still apply
the same PTE encode function for all gen12+.

> Also, handling the MTL-specific PTE encoding later in the series, after
> the transition from cache_level to PAT index, might be best since then
> you can just implement it correctly at the time the code is introduced;
> no need to add the cache_level implementation first (which can't even
> use all the bits) just to come back a few patches later and replace it
> all with PAT code.

I will squash the PTE encode patches.

>>>> -#define GEN12_GGTT_PTE_LM   BIT_ULL(1)
>>>> +#define GEN12_GGTT_PTE_LM BIT_ULL(1)
>>>> +#define MTL_GGTT_PTE_PAT0  BIT_ULL(52)
>>>> +#define MTL_GGTT_PTE_PAT1  BIT_ULL(53)
>>>> +#define GEN12_GGTT_PTE_ADDR_MASK   GENMASK_ULL(45, 12)
>>>> +#define MTL_GGTT_PTE_PAT_MASK  GENMASK_ULL(53, 52)
>>>>
>>>>  #define GEN12_PDE_64K BIT(6)
>>>>  #define GEN12_PTE_PS64 BIT(8)
>>>> @@ -147,6 +156,15 @@ typedef u64 gen8_pte_t;  #define
>> GEN8_PDE_IPS_64K
>>
>>>> BIT(11)
>>
>>>>  #define GEN8_PDE_PS_2M   BIT(7)
>>
>>>> +#define MTL_PPAT_L4_CACHE_POLICY_MASK
>> REG_GENMASK(3, 2)
>>>> +#define MTL_PAT_INDEX_COH_MODE_MASK  REG_GENMASK(1,
>> 0)
>>>> +#define MTL_PPAT_L4_3_UC
>>REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 3)
>>>> +#define MTL_PPAT_L4_1_WT
>>REG_FIELD_PREP(MTL_PPAT_L

RE: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-10 Thread Yang, Fei
>Subject: Re: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Define MOCS and PAT tables 
>for MTL
>
> On Fri, Apr 07, 2023 at 12:12:29AM -0700, 
> fei.y...@intel.com wrote:
>> From: Fei Yang fei.y...@intel.com
>>
>> On MTL, GT can no longer allocate on LLC - only the CPU can.
>> This, along with addition of support for ADM/L4 cache calls a MOCS/PAT
>> table update. Also defines PTE encode functions for MTL as it has
>> different PAT index definition than previous platforms.
>
> It might be best to keep the PTE encoding as a separate patch from the
> MOCS/PAT tables.  It's a different enough topic that it probably deserves
> a patch of its own.

Will update in the next revision.

>>
>> BSpec: 44509, 45101, 44235
>>
>> Cc: Matt Roper matthew.d.ro...@intel.com
>> Cc: Lucas De Marchi lucas.demar...@intel.com
>> Signed-off-by: Madhumitha Tolakanahalli Pradeep
>> madhumitha.tolakanahalli.prad...@intel.com
>> Signed-off-by: Aravind Iddamsetty 
>> aravind.iddamse...@intel.com
>> Signed-off-by: Fei Yang fei.y...@intel.com
>> ---
>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c| 28 +
>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h|  3 +
>>  drivers/gpu/drm/i915/gt/intel_ggtt.c| 27 +
>>  drivers/gpu/drm/i915/gt/intel_gtt.c | 23 +++-
>>  drivers/gpu/drm/i915/gt/intel_gtt.h | 20 ++-
>>  drivers/gpu/drm/i915/gt/intel_mocs.c| 76 +++--
>>  drivers/gpu/drm/i915/gt/selftest_mocs.c |  2 +-
>>  drivers/gpu/drm/i915/i915_pci.c |  1 +
>>  8 files changed, 173 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 4daaa6f55668..df4073d32114 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>  return pte;
>>  }
>>
>> +static u64 mtl_pte_encode(dma_addr_t addr,
>> +   enum i915_cache_level level,
>> +   u32 flags)
>> +{
>> +   gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>> +
>> +   if (unlikely(flags & PTE_READ_ONLY))
>> +  pte &= ~GEN8_PAGE_RW;
>> +
>> +   if (flags & PTE_LM)
>> +  pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>> +
>> +   switch (level) {
>> +   case I915_CACHE_NONE:
>> +  pte |= GEN12_PPGTT_PTE_PAT1;
>> +  break;
>> +   case I915_CACHE_LLC:
>> +   case I915_CACHE_L3_LLC:
>> +  pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
>> +  break;
>> +   case I915_CACHE_WT:
>> +  pte |= GEN12_PPGTT_PTE_PAT0;
>> +  break;
>> +   }
>> +
>> +   return pte;
>> +}
>> +
>>  static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool
>> create)  {
>>  struct drm_i915_private *i915 = ppgtt->vm.i915; diff --git
>> a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> index f541d19264b4..6b8ce7f4d25a 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> @@ -18,5 +18,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt
>> *gt,
>>  u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>> enum i915_cache_level level,
>> u32 flags);
>> +u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>> + unsigned int pat_index,
>> + u32 flags);
>>
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index 3c7f1ed92f5b..4a16bfcde1de 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -220,6 +220,33 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
>>  }
>>  }
>>
>> +u64 mtl_ggtt_pte_encode(dma_addr_t addr,
>> + enum i915_cache_level level,
>> + u32 flags)
>> +{
>> +   gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
>> +
>> +   GEM_BUG_ON(addr & ~GEN12_GGTT_PTE_ADDR_MASK);
>> +
>> +   if (flags & PTE_LM)
>> +  pte |= GEN12_GGTT_PTE_LM;
>> +
>> +   switch (level) {
>> +   case I915_CACHE_NONE:
>> +  pte |= MTL_GGTT_PTE_PAT1;
>> +  break;
>> +   case I915_CACHE_LLC:
>> +   case I915_CACHE_L3_LLC:
>> +  pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
>> +  break;
>> +   case I915_CACHE_WT:
>> +  pte |= MTL_GGTT_PTE_PAT0;
>> 

RE: [Intel-gfx] [PATCH 1/7] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-06 Thread Yang, Fei
>Subject: Re: [Intel-gfx] [PATCH 1/7] drm/i915/mtl: Define MOCS and PAT tables 
>for MTL
>
> Hi Fei,
>
> On Mon, Apr 03, 2023 at 03:50:26PM +0300, Jani Nikula wrote:
>> On Fri, 31 Mar 2023, fei.y...@intel.com wrote:
>>> From: Fei Yang 
>>>
>>> On MTL, GT can no longer allocate on LLC - only the CPU can.
>>> This, along with addition of support for ADM/L4 cache calls a 
>>> MOCS/PAT table update.
>>> Also add PTE encode functions for MTL as it has different PAT index 
>>> definition than previous platforms.
>> 
>> As a general observation, turning something into a function pointer 
>> and extending it to more platforms should be two separate changes.
>
> Agree with Jani. Fei, would you mind splitting this patch? It eases the 
> review, as well.

Yes, I'm working on this. Still need to address another comment from Ville.
Will send an update soon.

>Thanks,
>Andi
>
>> BR,
>> Jani.
>> 
>>>
>>> BSpec: 44509, 45101, 44235
>>>
>>> Cc: Matt Roper 
>>> Cc: Lucas De Marchi 
>>> Signed-off-by: Madhumitha Tolakanahalli Pradeep 
>>> 
>>> Signed-off-by: Aravind Iddamsetty 
>>> Signed-off-by: Fei Yang 
>>> ---
>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 43 --
>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h |  3 +
>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 ++-
>>>  drivers/gpu/drm/i915/gt/intel_gtt.c  | 23 ++-
>>>  drivers/gpu/drm/i915/gt/intel_gtt.h  | 20 ++-
>>>  drivers/gpu/drm/i915/gt/intel_mocs.c | 76 ++--
>>>  drivers/gpu/drm/i915/gt/selftest_mocs.c  |  2 +-
>>>  drivers/gpu/drm/i915/i915_pci.c  |  1 +
>>>  9 files changed, 189 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
>>> b/drivers/gpu/drm/i915/display/intel_dpt.c
>>> index b8027392144d..c5eacfdba1a5 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>> vm->vma_ops.bind_vma= dpt_bind_vma;
>>> vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>>  
>>> -   vm->pte_encode = gen8_ggtt_pte_encode;
>>> +   vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>>  
>>> dpt->obj = dpt_obj;
>>> dpt->obj->is_dpt = true;
>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
>>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> index 4daaa6f55668..4197b43150cc 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>> return pte;
>>>  }
>>>  
>>> +static u64 mtl_pte_encode(dma_addr_t addr,
>>> + enum i915_cache_level level,
>>> + u32 flags)
>>> +{
>>> +   gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>> +
>>> +   if (unlikely(flags & PTE_READ_ONLY))
>>> +   pte &= ~GEN8_PAGE_RW;
>>> +
>>> +   if (flags & PTE_LM)
>>> +   pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>>> +
>>> +   switch (level) {
>>> +   case I915_CACHE_NONE:
>>> +   pte |= GEN12_PPGTT_PTE_PAT1;
>>> +   break;
>>> +   case I915_CACHE_LLC:
>>> +   case I915_CACHE_L3_LLC:
>>> +   pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
>>> +   break;
>>> +   case I915_CACHE_WT:
>>> +   pte |= GEN12_PPGTT_PTE_PAT0;
>>> +   break;
>>> +   }
>>> +
>>> +   return pte;
>>> +}
>>> +
>>>  static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool 
>>> create)  {
>>> struct drm_i915_private *i915 = ppgtt->vm.i915; @@ -427,7 +455,7 
>>> @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>>>   u32 flags)
>>>  {
>>> struct i915_page_directory *pd;
>>> -   const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
>>> +   const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, 
>>> +flags);
>>> gen8_pte_t *vaddr;
>>>  
>>> pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); @@ -580,7 +608,7 
>>> @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>>>enum i915_cache_level cache_level,
>>>u32 flags)
>>>  {
>>> -   const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
>>> +   const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, 
>>> +flags);
>>> unsigned int rem = sg_dma_len(iter->sg);
>>> u64 start = vma_res->start;
>>>  
>>> @@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct 
>>> i915_address_space *vm,
>>> GEM_BUG_ON(pt->is_compact);
>>>  
>>> vaddr = px_vaddr(pt);
>>> -   vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
>>> +   vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
>>> drm_clflush_virt_range([gen8_pd_index(idx, 0)], 
>>> sizeof(*vaddr));  }
>>>  
>>> @@ -773,7 +801,7 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct 
>>> i915_address_space *vm,
>>> 

Re: [Intel-gfx] [PATCH 1/7] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-06 Thread Yang, Fei
> On 4/1/2023 8:38 AM, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> On MTL, GT can no longer allocate on LLC - only the CPU can.
>> This, along with addition of support for ADM/L4 cache calls a
>> MOCS/PAT table update.
>> Also add PTE encode functions for MTL as it has different PAT
>> index definition than previous platforms.
>>
>> BSpec: 44509, 45101, 44235
>>
>> Cc: Matt Roper 
>> Cc: Lucas De Marchi 
>> Signed-off-by: Madhumitha Tolakanahalli Pradeep 
>> 
>> Signed-off-by: Aravind Iddamsetty 
>> Signed-off-by: Fei Yang 
>> ---
>>   drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 43 --
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.h |  3 +
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 ++-
>>   drivers/gpu/drm/i915/gt/intel_gtt.c  | 23 ++-
>>   drivers/gpu/drm/i915/gt/intel_gtt.h  | 20 ++-
>>   drivers/gpu/drm/i915/gt/intel_mocs.c | 76 ++--
>>   drivers/gpu/drm/i915/gt/selftest_mocs.c  |  2 +-
>>   drivers/gpu/drm/i915/i915_pci.c  |  1 +
>>   9 files changed, 189 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
>> b/drivers/gpu/drm/i915/display/intel_dpt.c
>> index b8027392144d..c5eacfdba1a5 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>vm->vma_ops.bind_vma= dpt_bind_vma;
>>vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>
>> - vm->pte_encode = gen8_ggtt_pte_encode;
>> + vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>
>>dpt->obj = dpt_obj;
>>dpt->obj->is_dpt = true;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
>> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 4daaa6f55668..4197b43150cc 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>return pte;
>>   }
>>
>> +static u64 mtl_pte_encode(dma_addr_t addr,
>> +   enum i915_cache_level level,
>> +   u32 flags)
>> +{
>> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>> +
>> + if (unlikely(flags & PTE_READ_ONLY))
>> + pte &= ~GEN8_PAGE_RW;
>> +
>> + if (flags & PTE_LM)
>> + pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>> +
>> + switch (level) {
>> + case I915_CACHE_NONE:
>> + pte |= GEN12_PPGTT_PTE_PAT1;
>> + break;
>> + case I915_CACHE_LLC:
>> + case I915_CACHE_L3_LLC:
>> + pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
>> + break;
>> + case I915_CACHE_WT:
>> + pte |= GEN12_PPGTT_PTE_PAT0;
>> + break;
>> + }
>> +
>> + return pte;
>> +}
>> +
>>   static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
>>   {
>>struct drm_i915_private *i915 = ppgtt->vm.i915;
>> @@ -427,7 +455,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>>  u32 flags)
>>   {
>>struct i915_page_directory *pd;
>> - const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
>> + const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, 
>> flags);
>>gen8_pte_t *vaddr;
>>
>>pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
>> @@ -580,7 +608,7 @@ static void gen8_ppgtt_insert_huge(struct 
>> i915_address_space *vm,
>>   enum i915_cache_level cache_level,
>>   u32 flags)
>>   {
>> - const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
>> + const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
>>unsigned int rem = sg_dma_len(iter->sg);
>>u64 start = vma_res->start;
>>
>> @@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct 
>> i915_address_space *vm,
>>GEM_BUG_ON(pt->is_compact);
>>
>>vaddr = px_vaddr(pt);
>> - vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
>> + vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
>>drm_clflush_virt_range([gen8_pd_index(idx, 0)], sizeof(*vaddr));
>>   }
>>
>> @@ -773,7 +801,7 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct 
>> i915_address_space *vm,
>>}
>>
>>vaddr = px_vaddr(pt);
>> - vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, 
>> flags);
>> + vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
>>   }
>>
>>   static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
>> @@ -820,7 +848,7 @@ static int gen8_init_scratch(struct i915_address_space 
>> *vm)
>>pte_flags |= PTE_LM;
>>
>>vm->scratch[0]->encode =
>> - gen8_pte_encode(px_dma(vm->scratch[0]),
>> + 

RE: [PATCH 5/7] drm/i915: use pat_index instead of cache_level

2023-04-06 Thread Yang, Fei
> On Mon, Apr 03, 2023 at 07:39:37PM +0000, Yang, Fei wrote:
>>> Subject: Re: [PATCH 5/7] drm/i915: use pat_index instead of cache_level
>>>
>>> On Mon, Apr 03, 2023 at 04:57:21PM +, Yang, Fei wrote:
>>>>> Subject: Re: [PATCH 5/7] drm/i915: use pat_index instead of
>>>>> cache_level
>>>>>
>>>>> On Fri, Mar 31, 2023 at 11:38:28PM -0700, fei.y...@intel.com wrote:
>>>>>> From: Fei Yang 
>>>>>>
>>>>>> Currently the KMD is using enum i915_cache_level to set caching
>>>>>> policy for buffer objects. This is flaky because the PAT index
>>>>>> which really controls the caching behavior in PTE has far more
>>>>>> levels than what's defined in the enum.
>>>>>
>>>>> Then just add more enum values.
>>>>
>>>> That would be really messy because PAT index is platform dependent,
>>>> you would have to maintain many tables for the the translation.
>>>>
>>>>> 'pat_index' is absolutely meaningless to the reader, it's just an
>>>>> arbitrary number. Whereas 'cache_level' conveys how the thing is
>>>>> actually going to get used and thus how the caches should behave.
>>>>
>>>> By design UMD's understand PAT index. Both UMD and KMD should stand
>>>> on the same ground, the Bspec, to avoid any potential ambiguity.
>>>>
>>>>>> In addition, the PAT index is platform dependent, having to
>>>>>> translate between i915_cache_level and PAT index is not reliable,
>>>>>
>>>>> If it's not realiable then the code is clearly broken.
>>>>
>>>> Perhaps the word "reliable" is a bit confusing here. What I really
>>>> meant to say is 'difficult to maintain', or 'error-prone'.
>>>>
>>>>>> and makes the code more complicated.
>>>>>
>>>>> You have to translate somewhere anyway. Looks like you're now
>>>>> adding translations the other way (pat_index->cache_level). How is that 
>>>>> better?
>>>>
>>>> No, there is no pat_index->cache_level translation.
>>>
>>> i915_gem_object_has_cache_level() is exactly that. And that one does
>>> look actually fragile since it assumes only one PAT index maps to
>>> each cache level. So if the user picks any other pat_index anything
>>> using
>>> i915_gem_object_has_cache_level() is likely to do the wrong thing.
>>
>> That is still one way transaltion, from cache_level to pat_index.
>
> Not really. The actual input to the thing is obj->pat_index.
> And as stated, the whole thing is simply broken whenever
> obj->pat_index isn't one of the magic numbers that you get
> back from i915_gem_get_pat_index().

I proposed a patch for diic which is directly applicable to drm-tip as well.
Could you review http://intel-gfx-pw.fi.intel.com/series/19405/ and let me
know if that would address your concern here?

-Fei

> --
> Ville Syrjälä
> Intel


RE: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-05 Thread Yang, Fei
>Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO 
>creation
>
>On 04/04/2023 19:04, Yang, Fei wrote:
>>> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set
>>> cache at BO creation
>>>
>>> On 01/04/2023 09:38, fei.y...@intel.com wrote:
>>>> From: Fei Yang 
>>>>
>>>> To comply with the design that buffer objects shall have immutable
>>>> cache setting through out its life cycle, {set, get}_caching ioctl's
>>>> are no longer supported from MTL onward. With that change caching
>>>> policy can only be set at object creation time. The current code
>>>> applies a default (platform dependent) cache setting for all objects.
>>>> However this is not optimal for performance tuning. The patch
>>>> extends the existing gem_create uAPI to let user set PAT index for
>>>> the object at creation time.
>>>> The new extension is platform independent, so UMD's can switch to
>>>> using this extension for older platforms as well, while {set,
>>>> get}_caching are still supported on these legacy paltforms for
>>>> compatibility reason.
>>>>
>>>> Cc: Chris Wilson 
>>>> Cc: Matt Roper 
>>>> Signed-off-by: Fei Yang 
>>>> Reviewed-by: Andi Shyti 
>>>
>>> Just like the protected content uAPI, there is no way for userspace
>>> to tell this feature is available other than trying using it.
>>>
>>> Given the issues with protected content, is it not thing we could want to 
>>> add?
>> Sorry I'm not aware of the issues with protected content, could you 
>> elaborate?
>> There was a long discussion on teams uAPI channel, could you comment
>> there if any concerns?
>>
>> https://teams.microsoft.com/l/message/19:f1767bda6734476ba0a9c7d147b92
>> 8d1@thread.skype/1675860924675?tenantId=46c98d88-e344-4ed4-8496-4ed771
>> 2e255d=379f3ae1-d138-4205-bb65-d4c7d38cb481=16
>> 75860924675=GSE%20OSGC=i915%20uAPI%20changes
>> tedTime=1675860924675=false
>>
>> Thanks,
>> -Fei
>
>
> We wanted to have a getparam to detect protected support and were told
> to detect it by trying to create a context with it.
>
> Now it appears trying to create a protected context can block for several
> seconds.
>
> Since we have to report capabilities to the user even before it creates
> protected contexts, any app is at risk of blocking.

Can we detect this capability by creating a buffer object? This extension is
not blocking, it just provide a way to set caching policy, and should complete
very fast. There is a IGT test I created for this extension (not merged yet),
please take a look at http://intel-gfx-pw.fi.intel.com/series/19149/

I'm not familiar with getparam, will take a look there as well. But I think it
would be easier just create an object.

-Fei

>-Lionel
>
>
>>
>>> Thanks,
>>>
>>> -Lionel
>>>
>>>
>>>> ---
>>>>drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 
>>>>include/uapi/drm/i915_drm.h| 36 ++
>>>>tools/include/uapi/drm/i915_drm.h  | 36 ++
>>>>3 files changed, 105 insertions(+)
>>>>


RE: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-04 Thread Yang, Fei
> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO 
> creation
>
> On 01/04/2023 09:38, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> To comply with the design that buffer objects shall have immutable
>> cache setting through out its life cycle, {set, get}_caching ioctl's
>> are no longer supported from MTL onward. With that change caching
>> policy can only be set at object creation time. The current code
>> applies a default (platform dependent) cache setting for all objects.
>> However this is not optimal for performance tuning. The patch extends
>> the existing gem_create uAPI to let user set PAT index for the object
>> at creation time.
>> The new extension is platform independent, so UMD's can switch to
>> using this extension for older platforms as well, while {set,
>> get}_caching are still supported on these legacy paltforms for compatibility 
>> reason.
>>
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Signed-off-by: Fei Yang 
>> Reviewed-by: Andi Shyti 
>
>
> Just like the protected content uAPI, there is no way for userspace to tell
> this feature is available other than trying using it.
>
> Given the issues with protected content, is it not thing we could want to add?

Sorry I'm not aware of the issues with protected content, could you elaborate?
There was a long discussion on teams uAPI channel, could you comment there if
any concerns?

https://teams.microsoft.com/l/message/19:f1767bda6734476ba0a9c7d147b928d1@thread.skype/1675860924675?tenantId=46c98d88-e344-4ed4-8496-4ed7712e255d=379f3ae1-d138-4205-bb65-d4c7d38cb481=1675860924675=GSE%20OSGC=i915%20uAPI%20changes=1675860924675=false

Thanks,
-Fei

>Thanks,
>
>-Lionel
>
>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 
>>   include/uapi/drm/i915_drm.h| 36 ++
>>   tools/include/uapi/drm/i915_drm.h  | 36 ++
>>   3 files changed, 105 insertions(+)
>>


RE: [PATCH 5/7] drm/i915: use pat_index instead of cache_level

2023-04-03 Thread Yang, Fei
>Subject: Re: [PATCH 5/7] drm/i915: use pat_index instead of cache_level
>
>On Mon, Apr 03, 2023 at 04:57:21PM +0000, Yang, Fei wrote:
>>> Subject: Re: [PATCH 5/7] drm/i915: use pat_index instead of
>>> cache_level
>>>
>>> On Fri, Mar 31, 2023 at 11:38:28PM -0700, fei.y...@intel.com wrote:
>>>> From: Fei Yang 
>>>>
>>>> Currently the KMD is using enum i915_cache_level to set caching
>>>> policy for buffer objects. This is flaky because the PAT index
>>>> which really controls the caching behavior in PTE has far more
>>>> levels than what's defined in the enum.
>>>
>>> Then just add more enum values.
>>
>> That would be really messy because PAT index is platform dependent,
>> you would have to maintain many tables for the the translation.
>>
>>> 'pat_index' is absolutely meaningless to the reader, it's just an
>>> arbitrary number. Whereas 'cache_level' conveys how the thing is
>>> actually going to get used and thus how the caches should behave.
>>
>> By design UMD's understand PAT index. Both UMD and KMD should stand on
>> the same ground, the Bspec, to avoid any potential ambiguity.
>>
>>>> In addition, the PAT index is platform dependent, having to
>>>> translate between i915_cache_level and PAT index is not reliable,
>>>
>>> If it's not realiable then the code is clearly broken.
>>
>> Perhaps the word "reliable" is a bit confusing here. What I really
>> meant to say is 'difficult to maintain', or 'error-prone'.
>>
>>>> and makes the code more complicated.
>>>
>>> You have to translate somewhere anyway. Looks like you're now adding
>>> translations the other way (pat_index->cache_level). How is that better?
>>
>> No, there is no pat_index->cache_level translation.
>
> i915_gem_object_has_cache_level() is exactly that. And that one does look
> actually fragile since it assumes only one PAT index maps to each cache
> level. So if the user picks any other pat_index anything using
> i915_gem_object_has_cache_level() is likely to do the wrong thing.

That is still one way transaltion, from cache_level to pat_index.
The cache_level is only a KMD concept now. And inside the KMD, we have one
table to translate between cache_level to pat_index. Only KMD would be able
to trigger a comparison on pat_index for a KMD allocated BO.
User is not allowed to set pat_index dynamically any more. By design the cahce
setting for user space BO's should be immutable. That's why even the set caching
ioctl has been killed (from MTL onward).

> If we do switch to pat_index then I think cache_level should be made a
> purely uapi concept,

UMD's directly use pat_index because they are supposed to follow the b-spec.
The abstracted cache_level is no longer exposed to user space.

-Fei

> and all the internal code should instead be made to
> query various aspects of the caching behaviour of the current pat_index
> (eg. is LLC caching enabled, and thus do I need to clflush?).
>
> --
> Ville Syrjälä
> Intel


RE: [PATCH 5/7] drm/i915: use pat_index instead of cache_level

2023-04-03 Thread Yang, Fei
> Subject: Re: [PATCH 5/7] drm/i915: use pat_index instead of cache_level
>
> On Fri, Mar 31, 2023 at 11:38:28PM -0700, fei.y...@intel.com wrote:
>> From: Fei Yang 
>> 
>> Currently the KMD is using enum i915_cache_level to set caching policy for
>> buffer objects. This is flaky because the PAT index which really controls
>> the caching behavior in PTE has far more levels than what's defined in the
>> enum.
>
> Then just add more enum values.

That would be really messy because PAT index is platform dependent, you would
have to maintain many tables for the the translation.

> 'pat_index' is absolutely meaningless to the reader, it's just an
> arbitrary number. Whereas 'cache_level' conveys how the thing is
> actually going to get used and thus how the caches should behave.

By design UMD's understand PAT index. Both UMD and KMD should stand on the
same ground, the Bspec, to avoid any potential ambiguity.

>> In addition, the PAT index is platform dependent, having to translate
>> between i915_cache_level and PAT index is not reliable,
>
>If it's not realiable then the code is clearly broken.

Perhaps the word "reliable" is a bit confusing here. What I really meant to
say is 'difficult to maintain', or 'error-prone'.

>> and makes the code more complicated.
>
> You have to translate somewhere anyway. Looks like you're now adding
> translations the other way (pat_index->cache_level). How is that better?

No, there is no pat_index->cache_level translation.
There is only a small table for cache_level->pat_index translation. That is
added for the convenience of KMD coding, no exposure to UMD.

-Fei

>> 
>> >From UMD's perspective there is also a necessity to set caching policy for
>> performance fine tuning. It's much easier for the UMD to directly use PAT
>> index because the behavior of each PAT index is clearly defined in Bspec.
>> Haivng the abstracted i915_cache_level sitting in between would only cause
>> more ambiguity.
>> 
>> For these reasons this patch replaces i915_cache_level with PAT index. Also
>> note, the cache_level is not completely removed yet, because the KMD still
>> has the need of creating buffer objects with simple cache settings such as
>> cached, uncached, or writethrough. For these simple cases, using cache_level
>> would help simplify the code.
>> 
>> Cc: Chris Wilson 
>> Cc: Matt Roper 
>> Signed-off-by: Fei Yang 
>> Reviewed-by: Andi Shyti 
>> ---
>>  drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
>>  drivers/gpu/drm/i915/gem/i915_gem_domain.c| 27 ++
>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++-
>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_object.c| 39 -
>>  drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 18 ++--
>>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
>>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>  .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 76 -
>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
>>  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 82 +--
>>  drivers/gpu/drm/i915/gt/intel_gtt.h   | 20 ++---
>>  drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
>>  drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
>>  drivers/gpu/drm/i915/gt/intel_ppgtt.c |  6 +-
>>  drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 ++-
>>  drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
>>  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>>  drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
>>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
>>  drivers/gpu/drm/i915/i915_debugfs.c   | 55 ++---
>>  drivers/gpu/drm/i915/i915_gem.c   | 16 +++-
>>  drivers/gpu/drm/i915/i915_gpu_error.c |  8 +-
>>  drivers/gpu/drm/i915/i915_vma.c   | 16 ++--
>>  drivers/gpu/drm/i915/i915_vma.h   |  2 +-
>>  drivers/gpu/drm/i915/i915_vma_types.h |  2 -
>>  drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +-
>>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>>  .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>>  drivers/gpu/drm/i915/selftests/mock_gtt.c |  8 +-
>>  36 files changed, 361 insertions(+), 241 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
>> b/drivers/gpu/drm/i915/display/intel_dpt.c
>> index c5eacfdba1a5..7c5fddb203ba 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t 
>> pte)
>>  static 

RE: [PATCH] drm/i915/selftests: keep same cache settings as timeline

2023-03-16 Thread Yang, Fei
>> From: Fei Yang 
>>
>> On MTL, objects allocated through i915_gem_object_create_internal() are
>> mapped as uncached in GPU by default because HAS_LLC is false. However
>> in the live_hwsp_read selftest these watcher objects are mapped as WB
>> on CPU side. The conseqence is that the updates done by the GPU are not
>> immediately visible to CPU, thus the selftest is randomly failing due to
>> the stale data in CPU cache. Solution can be either setting WC for CPU +
>> UC for GPU, or WB for CPU + 1-way coherent WB for GPU.
>> To keep the consistency, let's simply inherit the same cache settings
>> from the timeline, which is WB for CPU + 1-way coherent WB for GPU,
>> because this test is supposed to emulate the behavior of the timeline
>> anyway.
>>
>> Signed-off-by: Fei Yang 
>
> It looks like there might be an indentation mistake on the second line
> of the i915_gem_object_pin_map_unlocked() call, but we can fix that up
> while applying; no need to re-send.
>
> Reviewed-by: Matt Roper 

Thanks for reviewing.

> Is there an FDO issue # for the random failures thar were being seen?
> If so, we should add a Closes: or References: tag here.

I'm not aware of a FDO filed for this failure. That might be because the
issue is reproduced on MTL which might not be widely available to the
community yet.

> Matt
>> ---
>>  drivers/gpu/drm/i915/gt/selftest_timeline.c | 14 +++---
>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c 
>> b/drivers/gpu/drm/i915/gt/selftest_timeline.c
>> index 522d0190509c..631aaed9bc3d 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
>> @@ -825,7 +825,8 @@ static bool cmp_gte(u32 a, u32 b)
>>return a >= b;
>>  }
>>
>> -static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt)
>> +static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt,
>> +  struct intel_timeline *tl)
>>  {
>>struct drm_i915_gem_object *obj;
>>struct i915_vma *vma;
>> @@ -834,7 +835,10 @@ static int setup_watcher(struct hwsp_watcher *w, struct 
>> intel_gt *gt)
>>if (IS_ERR(obj))
>>return PTR_ERR(obj);
>>
>> - w->map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
>> + /* keep the same cache settings as timeline */
>> + i915_gem_object_set_cache_coherency(obj, 
>> tl->hwsp_ggtt->obj->cache_level);
>> + w->map = i915_gem_object_pin_map_unlocked(obj,
>> + page_unmask_bits(tl->hwsp_ggtt->obj->mm.mapping));
>>if (IS_ERR(w->map)) {
>>i915_gem_object_put(obj);
>>return PTR_ERR(w->map);
>> @@ -1004,8 +1008,10 @@ static int live_hwsp_read(void *arg)
>>if (!tl->has_initial_breadcrumb)
>>goto out_free;
>>
>> + selftest_tl_pin(tl);
>> +
>>for (i = 0; i < ARRAY_SIZE(watcher); i++) {
>> - err = setup_watcher([i], gt);
>> + err = setup_watcher([i], gt, tl);
>>if (err)
>>goto out;
>>}
>> @@ -1160,6 +1166,8 @@ static int live_hwsp_read(void *arg)
>>for (i = 0; i < ARRAY_SIZE(watcher); i++)
>>cleanup_watcher([i]);
>>
>> + intel_timeline_unpin(tl);
>> +
>>if (igt_flush_test(gt->i915))
>>err = -EIO;
>>
>> --
>> 2.25.1


RE: [PATCH 1/2] drm/i915/xehp: Add compute engine ABI

2022-04-25 Thread Yang, Fei
>> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>> > @@ -1175,6 +1175,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>> >[VIDEO_DECODE_CLASS]= GEN12_VD_TLB_INV_CR,
>> >[VIDEO_ENHANCEMENT_CLASS]   = GEN12_VE_TLB_INV_CR,
>> >[COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR,
>> > +  [COMPUTE_CLASS] = GEN12_GFX_TLB_INV_CR,
>> 
>> Do you know what 0xcf04 is?

Looks like that is the TLB invalidation register for each compute context.

>> 
>> Or if GEN12_GFX_TLB_INV_CR is correct then I think get_reg_and_bit() 
>> might need adjusting to always select bit 0 for any compute engine 
>> instance. Not sure how hardware would behave if value other than '1' 
>> would be written into 0xced8.
> 
> I think Prathap and Fei have more familiarity with the MMIO TLB invalidation; 
> adding them for their thoughts.

I believe GEN12_GFX_TLB_INV_CR is the right one to use because we are 
invalidating the TLB for each engine.
I'm not sure if we could narrow down to exact which compute context the TLB 
needs to be invalidated though. If that's possible it might be a bit more 
efficient.

> Matt

>> 
>> Regards,
>> 
>> Tvrtko


RE: [Intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv

2022-03-28 Thread Yang, Fei
>> +u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
>> -static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
>
> I think all helpers which emit to ring take cs as the first argument so it 
> would be good to make this consistent.

Updated the patch, please review rev10.
This helper function has been there for a long while, I was hesitant to change. 
But I agree cs should be the first argument. Since I removed the "static" 
anyway, so might just change the order all together.

>> @@ -329,15 +306,10 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
>> mode)
>>  *cs++ = 0; /* value */
>>   
>>  if (aux_inv) { /* hsdes: 1809175790 */
>> -struct intel_engine_cs *engine;
>> -unsigned int tmp;
>> -
>> -*cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv));
>> -for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) {
>> -*cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
>> -*cs++ = AUX_INV;
>> -}
>> -*cs++ = MI_NOOP;
>> +if (rq->engine->class == VIDEO_DECODE_CLASS)
>> +cs = gen12_emit_aux_table_inv(GEN12_VD0_AUX_NV, cs);
>> +else
>> +cs = gen12_emit_aux_table_inv(GEN12_VE0_AUX_NV, cs);
>
> Not sure if, here and below, it would be worth to put register in a local and 
> then have a single function call - up to you.

I feel it's easier to check the code correctness in the *_rcs/*_xcs functions 
and leave the helper function as simple as possible.

>
> Apart from the cs re-order looks good to me.

If no other problems with rev10, would you please help push it upstream? I 
don't have the commit right, will need to find someone to help take it further.

Thanks,
-Fei

> Reviewed-by: Tvrtko Ursulin 



RE: [Intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv

2022-03-18 Thread Yang, Fei
>>   static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
>>   {
>>  *cs++ = MI_LOAD_REGISTER_IMM(1);
>> @@ -296,7 +272,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
>> mode)
>>  if (!HAS_FLAT_CCS(rq->engine->i915)) {
>>  aux_inv = rq->engine->mask & ~BIT(BCS0);
>>  if (aux_inv)
>> -cmd += 2 * hweight32(aux_inv) + 2;
>> +cmd += 4;
>>  }
>>  }
>>   
>> @@ -329,14 +305,16 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
>> mode)
>>  *cs++ = 0; /* value */
>>   
>>  if (aux_inv) { /* hsdes: 1809175790 */
>> -struct intel_engine_cs *engine;
>> -unsigned int tmp;
>> -
>> -*cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv));
>> -for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) {
>> -*cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
>> -*cs++ = AUX_INV;
>> +*cs++ = MI_LOAD_REGISTER_IMM(1) | MI_LRI_MMIO_REMAP_EN;
> 
> Cool, I didn't know this exists. First Bspec link I found did not mention 
> these register, but second did.
> That one however (29545) has a worrying "removed by" tag which seems to point 
> to a story suggesting the
> remapping table might be gone on machines with flat ccs?! Could you double 
> check please?

The variable aux_inv is set only if (!HAS_FLAT_CCS(rq->engine->i915)).

>> +if (rq->engine->class == VIDEO_DECODE_CLASS) {
>> +*cs++ = i915_mmio_reg_offset(GEN12_VD0_AUX_NV);
>> +} else if (rq->engine->class == VIDEO_ENHANCEMENT_CLASS) {
>> +*cs++ = i915_mmio_reg_offset(GEN12_VE0_AUX_NV);
>> +} else {
>> +GEM_BUG_ON("unknown aux_inv reg\n");
>> +*cs++ = i915_mmio_reg_offset(INVALID_MMIO_REG);
>
> I'd consider not emitting the LRI if we don't know what to put in unless 
> there is some hidden point to do it?

That's true. I was following the original logic flow here. I think it would be 
better to check for engine class before setting the variable aux_inv.

>
>>  }
>> +*cs++ = AUX_INV;
>>  *cs++ = MI_NOOP;
>>  }
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
>> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> index d112ffd56418..2d150eec5c65 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> @@ -144,6 +144,7 @@
>>   #define MI_LOAD_REGISTER_IMM(x)MI_INSTR(0x22, 2*(x)-1)
>>   /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
>>   #define   MI_LRI_LRM_CS_MMIO   REG_BIT(19)
>> +#define   MI_LRI_MMIO_REMAP_EN  (1 << 17)
>>   #define   MI_LRI_FORCE_POSTED  (1<<12)
>
> Passing observation - three bits, three flavours of expressing them, sigh...
Haha, REG_BIT(17) it is. The other one causes a CHECK:SPACING, but don't want 
to touch that in this patch.

>>   #define MI_LOAD_REGISTER_IMM_MAX_REGS (126)
>>   #define MI_STORE_REGISTER_MEMMI_INSTR(0x24, 1)


RE: [Intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv

2022-03-16 Thread Yang, Fei
>> diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c 
>> b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> index 1c82caf525c3..0ec4986e4805 100644
>> --- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>> @@ -37,6 +37,9 @@ int gen2_emit_flush(struct i915_request *rq, u32 
>> mode)
>>   
>>  intel_ring_advance(rq, cs);
>>   
>> +/* hsdes: 1809175790. No fixup needed for gen2 */
>> +rq->aux_inv_fixup = NULL;
>
> Same thing that Stuart mentioned - would it not work for instance to 
> initialize this in __i915_request_create?

I didn't try __i915_request_create because there is code like the following in 
the driver, and I'm not sure how many such allocation is there. I will give it 
a shot.
struct measure_breadcrumb {
struct i915_request rq;
struct intel_ring ring;
u32 cs[2048];
};

static int measure_breadcrumb_dw(struct intel_context *ce)
{
struct intel_engine_cs *engine = ce->engine;
struct measure_breadcrumb *frame;
int dw;

GEM_BUG_ON(!engine->gt->scratch);

frame = kzalloc(sizeof(*frame), GFP_KERNEL);
if (!frame)
return -ENOMEM;

frame->rq.engine = engine;
frame->rq.context = ce;
...
}
>> +
>>  return 0;
>>   }
>>   


RE: [PATCH] drm/i915: avoid concurrent writes to aux_inv

2022-03-16 Thread Yang, Fei
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> index e1470bb60f34..7e8552414275 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> @@ -1258,6 +1258,34 @@ static bool completed(const struct i915_request 
>>> *rq)
>>> return __i915_request_is_complete(rq);  }
>>>  
>>> +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) {
>>> +   static const i915_reg_t vd[] = {
>>> +   GEN12_VD0_AUX_NV,
>>> +   GEN12_VD1_AUX_NV,
>>> +   GEN12_VD2_AUX_NV,
>>> +   GEN12_VD3_AUX_NV,
>>> +   };
>>> +
>>> +   static const i915_reg_t ve[] = {
>>> +   GEN12_VE0_AUX_NV,
>>> +   GEN12_VE1_AUX_NV,
>>> +   };
>>> +
>>> +   if (engine->class == VIDEO_DECODE_CLASS) {
>>> +   GEM_BUG_ON(engine->instance >= ARRAY_SIZE(vd));
>>> +   return vd[engine->instance];
>>> +   }
>>> +
>>> +   if (engine->class == VIDEO_ENHANCEMENT_CLASS) {
>>> +   GEM_BUG_ON(engine->instance >= ARRAY_SIZE(ve));
>>> +   return ve[engine->instance];
>>> +   }
>>> +
>>> +   GEM_BUG_ON("unknown aux_inv reg\n");
>>> +   return INVALID_MMIO_REG;
>>> +}
>>> +
>>>  static void execlists_dequeue(struct intel_engine_cs *engine)
>> 
>> So in the previous implementation, this "worked" for both execlists and guc 
>> submission. But how will this work now for GuC based submission?
>> This flow and the address of the engine is owned by the GuC.
>>
>> If we are going to say this is an execlist only requirement (e.g.
>> platforms using GuC submission don't need this workaround), you should add 
>> an if (!using guc submission) in the sequence you added to the various 
>> emit_flush() routines above.
>
> Good point.
> I didn't consider GuC submission because Chrome doesn't enable GuC for TGL. 
> But it is true that the implementation will have problem with GuC submission.
> I'm not sure if it's possible for i915 to know which engine will eventually 
> carry out the request because it might be scheduled by GuC. I will need to 
> investigate.

I think the same can be done in intel_guc_submission.c after 
__i915_request_submit(rq) calls.

>> Thanks,
>> Stuart



RE: [PATCH] drm/i915: avoid concurrent writes to aux_inv

2022-03-15 Thread Yang, Fei
>> @@ -157,6 +163,9 @@ int gen11_emit_flush_rcs(struct i915_request *rq,
>> u32 mode)
>>  intel_ring_advance(rq, cs);
>>  }
>>  
>> +/* hsdes: 1809175790. No fixup needed for gen11 rcs */
>> +rq->aux_inv_fixup = NULL;
>
> This is a little ugly to me. Can we just set this to 0 or 0xdeadbeef by 
> default maybe and check that value below instead of retroactively adding all 
> of these assignments?

The problem is there are many code paths that a i915_request could be 
allocated, I'm not aware of a unified routine where I could initialize the 
pointer for all i915_requests.

>> +
>>  return 0;
>>  }
>>  
>> +/*
>> + * We don't know which engine will eventually carry out
>> + * this request, so the mmio aux_inv register address
>> is
>> + * unknown at this moment. We save the cs pointer
>> supposed
>> + * to hold the aux_inv address in rq->aux_inv_fixup and
>> set
>> + * it in execlists_dequeue() when the engine instance
>> + * carrying out this request becomes certain
>> + */
>> +*cs++ = MI_LOAD_REGISTER_IMM(1);
>> +rq->aux_inv_fixup = cs; /* save the pointer to aux_inv
>> */
>> +*cs++ = 0; /* mmio addr to be set at submission to HW
>> */
>
>Maybe MI_NOOP instead?

This is supposed to be the mmio address field for the MI_LOAD_REGISTER_IMM 
instruction, setting it to 0 makes more sense?

>> +*cs++ = AUX_INV;
>>  *cs++ = MI_NOOP;
>> -}
>> +} else
>
> Can you add the brackets here on the else:
> } else {
>aux_inv_fixup = NULL
> }
>
>Also good to run checkpatch. I see this showing up as a warning in the 
>checkpatch results.

I noticed the warning, will update.

>> +rq->aux_inv_fixup = NULL;
>>  
>>  if (mode & EMIT_INVALIDATE)
>>  *cs++ = preparser_disable(false);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> index e1470bb60f34..7e8552414275 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> @@ -1258,6 +1258,34 @@ static bool completed(const struct i915_request 
>> *rq)
>>  return __i915_request_is_complete(rq);  }
>>  
>> +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) {
>> +static const i915_reg_t vd[] = {
>> +GEN12_VD0_AUX_NV,
>> +GEN12_VD1_AUX_NV,
>> +GEN12_VD2_AUX_NV,
>> +GEN12_VD3_AUX_NV,
>> +};
>> +
>> +static const i915_reg_t ve[] = {
>> +GEN12_VE0_AUX_NV,
>> +GEN12_VE1_AUX_NV,
>> +};
>> +
>> +if (engine->class == VIDEO_DECODE_CLASS) {
>> +GEM_BUG_ON(engine->instance >= ARRAY_SIZE(vd));
>> +return vd[engine->instance];
>> +}
>> +
>> +if (engine->class == VIDEO_ENHANCEMENT_CLASS) {
>> +GEM_BUG_ON(engine->instance >= ARRAY_SIZE(ve));
>> +return ve[engine->instance];
>> +}
>> +
>> +GEM_BUG_ON("unknown aux_inv reg\n");
>> +return INVALID_MMIO_REG;
>> +}
>> +
>>  static void execlists_dequeue(struct intel_engine_cs *engine)
> 
> So in the previous implementation, this "worked" for both execlists and guc 
> submission. But how will this work now for GuC based submission?
> This flow and the address of the engine is owned by the GuC.
>
> If we are going to say this is an execlist only requirement (e.g.
> platforms using GuC submission don't need this workaround), you should add an 
> if (!using guc submission) in the sequence you added to the various 
> emit_flush() routines above.

Good point.
I didn't consider GuC submission because Chrome doesn't enable GuC for TGL. But 
it is true that the implementation will have problem with GuC submission.
I'm not sure if it's possible for i915 to know which engine will eventually 
carry out the request because it might be scheduled by GuC. I will need to 
investigate.

> Thanks,
> Stuart



RE: [intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv (rev3)

2022-03-02 Thread Yang, Fei
Hi Chris, for some reason I didn't receive the review email, so I copied your 
comments from patchwork and faked this email.

>>  static void execlists_dequeue(struct intel_engine_cs *engine)
>>  {
>> struct intel_engine_execlists * const execlists = >execlists;
>> @@ -1538,6 +1566,16 @@ static void execlists_dequeue(struct intel_engine_cs 
>> *engine)
>> }
>>
>> if (__i915_request_submit(rq)) {
>> +   /* hsdes: 1809175790 */
>> +   if ((GRAPHICS_VER(engine->i915) == 12) &&
>> +   rq->vd_ve_aux_inv &&
>> +   (engine->class == VIDEO_DECODE_CLASS ||
>> +engine->class == 
>> VIDEO_ENHANCEMENT_CLASS)) {

> We do not need the extra checks here; we just do as we are told. We only
> tell ourselves to apply the fixup when required.

Without checking GRAPHICS_VER, I'm seeing a lot of regressions on older 
platforms in the CI result.
This workaround was only implemented for gen12 (gen12_emit_flush_xcs).
Without checking engine->class, I'm seeing boot issues due to GEM_BUG_ON() in 
aux_inv_reg().
Any rq will go through this code regardless of engine class and gen version, so 
the checking seems to be
necessary.

>> +   *rq->vd_ve_aux_inv = 
>> i915_mmio_reg_offset

> Likewise, vd_ve is overspecific, aux_inv_fixup or aux_inv_wa (or
> wa_aux_iv, fixup_aux_inv).

I wanted to be specific because the workaround was only implemented for vd/ve 
engines.
But I'm ok with your proposal.

>> +   (aux_inv_reg(engine));
>> +   rq->vd_ve_aux_inv = NULL;

> Move this to i915_request initialisation so that we only set aux_inv
> when required, which probably explains the extra defence.

The pointer is currently initialized with 0x5a5a. I set it to NULL in 
gen12_emit_flush_xcs, otherwise the rq will
enter that if-statement with an invalid pointer.
I'm not familiar with the code, there seems to be multiple functions allocating 
the structure. I agree it's better
to set it to NULL at initialization, but need some guidance on where is the 
best place to do so.

>> +   rq->execution_mask = engine->mask;
>> +   }
>> if (!merge) {
>> *port++ = i915_request_get(last);
>> last = NULL;
>> diff --git a/drivers/gpu/drm/i915/i915_request.h 
>> b/drivers/gpu/drm/i915/i915_request.h
>> index 28b1f9db5487..69de32e5e15d 100644
>> --- a/drivers/gpu/drm/i915/i915_request.h
>> +++ b/drivers/gpu/drm/i915/i915_request.h
>> @@ -350,6 +350,8 @@ struct i915_request {
>> struct list_head link;
>> unsigned long delay;
>> } mock;)
>> +
>> +   u32 *vd_ve_aux_inv;

> Not at the end of the struct; that's where we put things in the dungeon.
> The selftest struct should be last; I do hope no one has been putting
> things at random places in the struct without considering the layout and
> semantics. From the flow, this is akin to batch, capture_list; before
> emitted_jiffies would be a good spot.

Got it, will change. I thought adding at the end would be safer, thanks for the 
explanation.

> -Chris