Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
On Wed, Apr 19, 2023 at 04:00:54PM -0700, fei.y...@intel.com wrote: > From: Fei Yang > > This patch implements Wa_22016122933. > > In MTL, memory writes initiated by Media tile update the whole > cache line even for partial writes. This creates a coherency > problem for cacheable memory if both CPU and GPU are writing data > to different locations within a single cache line. CTB communication > is impacted by this issue because the head and tail pointers are > adjacent words within a cache line (see struct guc_ct_buffer_desc), > where one is written by GuC and the other by the host. > This patch circumvents the issue by making CPU/GPU shared memory > uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for > CTB which is being updated by both CPU and GuC, mfence instruction > is added to make sure the CPU writes are visible to GPU right away > (flush the write combining buffer). Is this description accurate? This patch doesn't insert an mfence instruction itself, it just calls intel_guc_write_barrier(). On platforms like MTL that aren't using local memory, that issues a wmb() barrier, which I believe is implemented as an sfence, not mfence. You'd need to be doing a mb() call to get an mfence. I think in general this level of explanation is unnecessary; you can just give a high-level description indicating that we force the write-combine buffer to be flushed and not give the low-level specifics of what instruction that translates to at the x86 level. Aside from simplifying the commit message, Reviewed-by: Matt Roper > > While fixing the CTB issue, we noticed some random GSC firmware > loading failure because the share buffers are cacheable (WB) on CPU > side but uncached on GPU side. To fix these issues we need to map > such shared buffers as WC on CPU side. Since such allocations are > not all done through GuC allocator, to avoid too many code changes, > the i915_coherent_map_type() is now hard coded to return WC for MTL. > > BSpec: 45101 > > Signed-off-by: Fei Yang > Reviewed-by: Andi Shyti > Acked-by: Nirmoy Das > --- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 - > drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 + > drivers/gpu/drm/i915/gt/uc/intel_guc.c| 7 +++ > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++ > 4 files changed, 30 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > index ecd86130b74f..89fc8ea6bcfc 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct > drm_i915_private *i915, > struct drm_i915_gem_object *obj, > bool always_coherent) > { > - if (i915_gem_object_is_lmem(obj)) > + /* > + * Wa_22016122933: always return I915_MAP_WC for MTL > + */ > + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) > return I915_MAP_WC; > if (HAS_LLC(i915) || always_coherent) > return I915_MAP_WB; > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > index 1d9fdfb11268..236673c02f9a 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) > if (obj->base.size < gsc->fw.size) > return -ENOSPC; > > + /* > + * Wa_22016122933: For MTL the shared memory needs to be mapped > + * as WC on CPU side and UC (PAT index 2) on GPU side > + */ > + if (IS_METEORLAKE(i915)) > + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); > + > dst = i915_gem_object_pin_map_unlocked(obj, > i915_coherent_map_type(i915, > obj, true)); > if (IS_ERR(dst)) > @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) > memset(dst, 0, obj->base.size); > memcpy(dst, src, gsc->fw.size); > > + /* > + * Wa_22016122933: Making sure the data in dst is > + * visible to GSC right away > + */ > + intel_guc_write_barrier(>->uc.guc); > + > i915_gem_object_unpin_map(gsc->fw.obj); > i915_gem_object_unpin_map(obj); > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > index e89f16ecf1ae..c9f20385f6a0 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc > *guc, u32 size) > if (IS_ERR(obj)) > return ERR_CAST(obj); > > + /* > + * Wa_22016122933: For MTL the shared memory needs to be mapped > + * as WC on CPU side and UC (PAT index 2) on GPU side > + */ > +
Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
This is a important fix and can be pushed without depending on this series. I will send this out to mailing list separately for CI. Regards, Nirmoy On 4/20/2023 1:00 AM, fei.y...@intel.com wrote: From: Fei Yang This patch implements Wa_22016122933. In MTL, memory writes initiated by Media tile update the whole cache line even for partial writes. This creates a coherency problem for cacheable memory if both CPU and GPU are writing data to different locations within a single cache line. CTB communication is impacted by this issue because the head and tail pointers are adjacent words within a cache line (see struct guc_ct_buffer_desc), where one is written by GuC and the other by the host. This patch circumvents the issue by making CPU/GPU shared memory uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for CTB which is being updated by both CPU and GuC, mfence instruction is added to make sure the CPU writes are visible to GPU right away (flush the write combining buffer). While fixing the CTB issue, we noticed some random GSC firmware loading failure because the share buffers are cacheable (WB) on CPU side but uncached on GPU side. To fix these issues we need to map such shared buffers as WC on CPU side. Since such allocations are not all done through GuC allocator, to avoid too many code changes, the i915_coherent_map_type() is now hard coded to return WC for MTL. BSpec: 45101 Signed-off-by: Fei Yang Reviewed-by: Andi Shyti Acked-by: Nirmoy Das --- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 - drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 + drivers/gpu/drm/i915/gt/uc/intel_guc.c| 7 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++ 4 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index ecd86130b74f..89fc8ea6bcfc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, struct drm_i915_gem_object *obj, bool always_coherent) { - if (i915_gem_object_is_lmem(obj)) + /* +* Wa_22016122933: always return I915_MAP_WC for MTL +*/ + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) return I915_MAP_WC; if (HAS_LLC(i915) || always_coherent) return I915_MAP_WB; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c index 1d9fdfb11268..236673c02f9a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) if (obj->base.size < gsc->fw.size) return -ENOSPC; + /* +* Wa_22016122933: For MTL the shared memory needs to be mapped +* as WC on CPU side and UC (PAT index 2) on GPU side +*/ + if (IS_METEORLAKE(i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + dst = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(i915, obj, true)); if (IS_ERR(dst)) @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) memset(dst, 0, obj->base.size); memcpy(dst, src, gsc->fw.size); + /* +* Wa_22016122933: Making sure the data in dst is +* visible to GSC right away +*/ + intel_guc_write_barrier(>->uc.guc); + i915_gem_object_unpin_map(gsc->fw.obj); i915_gem_object_unpin_map(obj); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index e89f16ecf1ae..c9f20385f6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) if (IS_ERR(obj)) return ERR_CAST(obj); + /* +* Wa_22016122933: For MTL the shared memory needs to be mapped +* as WC on CPU side and UC (PAT index 2) on GPU side +*/ + if (IS_METEORLAKE(gt->i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + vma = i915_vma_instance(obj, >->ggtt->vm, NULL); if (IS_ERR(vma)) goto err; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 1803a633ed64..99a0a89091e7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* now update descriptor */ WRITE_ONCE(desc->head, head); + /* +* Wa_22016122933: Ma
Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
On 20.04.2023 01:00, fei.y...@intel.com wrote: From: Fei Yang This patch implements Wa_22016122933. In MTL, memory writes initiated by Media tile update the whole cache line even for partial writes. This creates a coherency problem for cacheable memory if both CPU and GPU are writing data to different locations within a single cache line. CTB communication is impacted by this issue because the head and tail pointers are adjacent words within a cache line (see struct guc_ct_buffer_desc), where one is written by GuC and the other by the host. This patch circumvents the issue by making CPU/GPU shared memory uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for CTB which is being updated by both CPU and GuC, mfence instruction is added to make sure the CPU writes are visible to GPU right away (flush the write combining buffer). While fixing the CTB issue, we noticed some random GSC firmware loading failure because the share buffers are cacheable (WB) on CPU side but uncached on GPU side. To fix these issues we need to map such shared buffers as WC on CPU side. Since such allocations are not all done through GuC allocator, to avoid too many code changes, the i915_coherent_map_type() is now hard coded to return WC for MTL. BSpec: 45101 Signed-off-by: Fei Yang Reviewed-by: Andi Shyti Acked-by: Nirmoy Das Reviewed-by: Andrzej Hajda Regards Andrzej --- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 - drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 + drivers/gpu/drm/i915/gt/uc/intel_guc.c| 7 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++ 4 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index ecd86130b74f..89fc8ea6bcfc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, struct drm_i915_gem_object *obj, bool always_coherent) { - if (i915_gem_object_is_lmem(obj)) + /* +* Wa_22016122933: always return I915_MAP_WC for MTL +*/ + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) return I915_MAP_WC; if (HAS_LLC(i915) || always_coherent) return I915_MAP_WB; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c index 1d9fdfb11268..236673c02f9a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) if (obj->base.size < gsc->fw.size) return -ENOSPC; + /* +* Wa_22016122933: For MTL the shared memory needs to be mapped +* as WC on CPU side and UC (PAT index 2) on GPU side +*/ + if (IS_METEORLAKE(i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + dst = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(i915, obj, true)); if (IS_ERR(dst)) @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) memset(dst, 0, obj->base.size); memcpy(dst, src, gsc->fw.size); + /* +* Wa_22016122933: Making sure the data in dst is +* visible to GSC right away +*/ + intel_guc_write_barrier(>->uc.guc); + i915_gem_object_unpin_map(gsc->fw.obj); i915_gem_object_unpin_map(obj); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index e89f16ecf1ae..c9f20385f6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) if (IS_ERR(obj)) return ERR_CAST(obj); + /* +* Wa_22016122933: For MTL the shared memory needs to be mapped +* as WC on CPU side and UC (PAT index 2) on GPU side +*/ + if (IS_METEORLAKE(gt->i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + vma = i915_vma_instance(obj, >->ggtt->vm, NULL); if (IS_ERR(vma)) goto err; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 1803a633ed64..99a0a89091e7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* now update descriptor */ WRITE_ONCE(desc->head, head); + /* +* Wa_22016122933: Making sure the head update is +* visible to GuC right away +*/ + intel_guc_write_barr
Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
Hi Fei, On Wed, Apr 19, 2023 at 02:12:15PM -0700, fei.y...@intel.com wrote: > From: Fei Yang > > This patch implements Wa_22016122933. > > In MTL, memory writes initiated by Media tile update the whole > cache line even for partial writes. This creates a coherency > problem for cacheable memory if both CPU and GPU are writing data > to different locations within a single cache line. CTB communication > is impacted by this issue because the head and tail pointers are > adjacent words within a cache line (see struct guc_ct_buffer_desc), > where one is written by GuC and the other by the host. > This patch circumvents the issue by making CPU/GPU shared memory > uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for > CTB which is being updated by both CPU and GuC, mfence instruction > is added to make sure the CPU writes are visible to GPU right away > (flush the write combining buffer). > > While fixing the CTB issue, we noticed some random GSC firmware > loading failure because the share buffers are cacheable (WB) on CPU > side but uncached on GPU side. To fix these issues we need to map > such shared buffers as WC on CPU side. Since such allocations are > not all done through GuC allocator, to avoid too many code changes, > the i915_coherent_map_type() is now hard coded to return WC for MTL. > > BSpec: 45101 > > Signed-off-by: Fei Yang Reviewed-by: Andi Shyti Acked-by: Nirmoy Das Still one comment below. [...] > + /* > + * Wa_22016122933: Making sure the head update is > + * visible to GuC right away > + */ > + intel_guc_write_barrier(ct_to_guc(ct)); I thought you were going to revert this. Is this really needed, BTW? I'm fine with leaving it. Andi
RE: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >>> index 1803a633ed64..98e682b7df07 100644 >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >>> @@ -415,12 +415,6 @@ static int ct_write(struct intel_guc_ct *ct, >>> } >>> GEM_BUG_ON(tail > size); >>> >>> -/* >>> - * make sure H2G buffer update and LRC tail update (if this triggering >>> a >>> - * submission) are visible before updating the descriptor tail >>> - */ >>> -intel_guc_write_barrier(ct_to_guc(ct)); >>> - >>> /* update local copies */ >>> ctb->tail = tail; >>> GEM_BUG_ON(atomic_read(&ctb->space) < len + GUC_CTB_HDR_LEN); @@ >>> -429,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct, >>> /* now update descriptor */ >>> WRITE_ONCE(desc->tail, tail); >>> >>> +/* >>> + * make sure H2G buffer update and LRC tail update (if this triggering >>> a >>> + * submission) are visible before updating the descriptor tail >>> + */ >>> +intel_guc_write_barrier(ct_to_guc(ct)); >> >> The comment above needs update, Never mind, I decided to revert this change because it's not necessary. There is a MMIO write following the ct_write() call which would flush the write combining buffer anyway, so the barrier is redundant here. > >Will update the comment. > >> if this is correct change. The question is why it is correct? If yes, >> it implies that old barrier is incorrect, maybe it should be then separate >> fix? > > There is WRITE_ONCE(desc->tail, tail) right after the H2G buffer update which > is also > seen by the GuC firmware, the barrier is needed for both, thus moved it down > a few > lines to cover them all. > >> I am not an expert, but previous location of the barrier seems sane to >> me - assure GuC will see proper buffer, before updating buffer's tail. > > That is correct, but the barrier is needed for both H2G buffer and > descriptor, as they are all shared with GuC firmware. > > -Fei > >> And according to commit message this new barrier should flush WC >> buffer, so for me it seems to be different thing. >> Am I missing something? >> >> >> Regards >> Andrzej
RE: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >> index 1803a633ed64..98e682b7df07 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >> @@ -415,12 +415,6 @@ static int ct_write(struct intel_guc_ct *ct, >> } >> GEM_BUG_ON(tail > size); >> >> -/* >> - * make sure H2G buffer update and LRC tail update (if this triggering a >> - * submission) are visible before updating the descriptor tail >> - */ >> -intel_guc_write_barrier(ct_to_guc(ct)); >> - >> /* update local copies */ >> ctb->tail = tail; >> GEM_BUG_ON(atomic_read(&ctb->space) < len + GUC_CTB_HDR_LEN); @@ >> -429,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct, >> /* now update descriptor */ >> WRITE_ONCE(desc->tail, tail); >> >> +/* >> + * make sure H2G buffer update and LRC tail update (if this triggering a >> + * submission) are visible before updating the descriptor tail >> + */ >> +intel_guc_write_barrier(ct_to_guc(ct)); > > The comment above needs update, Will update the comment. > if this is correct change. The question is why it is correct? If yes, it > implies > that old barrier is incorrect, maybe it should be then separate fix? There is WRITE_ONCE(desc->tail, tail) right after the H2G buffer update which is also seen by the GuC firmware, the barrier is needed for both, thus moved it down a few lines to cover them all. > I am not an expert, but previous location of the barrier seems sane to me - > assure > GuC will see proper buffer, before updating buffer's tail. That is correct, but the barrier is needed for both H2G buffer and descriptor, as they are all shared with GuC firmware. -Fei > And according to commit message this new barrier should flush WC buffer, so > for me > it seems to be different thing. > Am I missing something? > > > Regards > Andrzej
Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
On 17.04.2023 08:24, fei.y...@intel.com wrote: From: Fei Yang This patch implements Wa_22016122933. In MTL, memory writes initiated by Media tile update the whole cache line even for partial writes. This creates a coherency problem for cacheable memory if both CPU and GPU are writing data to different locations within a single cache line. CTB communication is impacted by this issue because the head and tail pointers are adjacent words within a cache line (see struct guc_ct_buffer_desc), where one is written by GuC and the other by the host. This patch circumvents the issue by making CPU/GPU shared memory uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for CTB which is being updated by both CPU and GuC, mfence instruction is added to make sure the CPU writes are visible to GPU right away (flush the write combining buffer). While fixing the CTB issue, we noticed some random GSC firmware loading failure because the share buffers are cacheable (WB) on CPU side but uncached on GPU side. To fix these issues we need to map such shared buffers as WC on CPU side. Since such allocations are not all done through GuC allocator, to avoid too many code changes, the i915_coherent_map_type() is now hard coded to return WC for MTL. BSpec: 45101 Signed-off-by: Fei Yang --- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 - drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 + drivers/gpu/drm/i915/gt/uc/intel_guc.c| 7 +++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 18 -- 4 files changed, 36 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index ecd86130b74f..89fc8ea6bcfc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, struct drm_i915_gem_object *obj, bool always_coherent) { - if (i915_gem_object_is_lmem(obj)) + /* +* Wa_22016122933: always return I915_MAP_WC for MTL +*/ + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) return I915_MAP_WC; if (HAS_LLC(i915) || always_coherent) return I915_MAP_WB; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c index 1d9fdfb11268..236673c02f9a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) if (obj->base.size < gsc->fw.size) return -ENOSPC; + /* +* Wa_22016122933: For MTL the shared memory needs to be mapped +* as WC on CPU side and UC (PAT index 2) on GPU side +*/ + if (IS_METEORLAKE(i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + dst = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(i915, obj, true)); if (IS_ERR(dst)) @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) memset(dst, 0, obj->base.size); memcpy(dst, src, gsc->fw.size); + /* +* Wa_22016122933: Making sure the data in dst is +* visible to GSC right away +*/ + intel_guc_write_barrier(>->uc.guc); + i915_gem_object_unpin_map(gsc->fw.obj); i915_gem_object_unpin_map(obj); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index d76508fa3af7..f9bddaa876d9 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -743,6 +743,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) if (IS_ERR(obj)) return ERR_CAST(obj); + /* +* Wa_22016122933: For MTL the shared memory needs to be mapped +* as WC on CPU side and UC (PAT index 2) on GPU side +*/ + if (IS_METEORLAKE(gt->i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + vma = i915_vma_instance(obj, >->ggtt->vm, NULL); if (IS_ERR(vma)) goto err; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 1803a633ed64..98e682b7df07 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -415,12 +415,6 @@ static int ct_write(struct intel_guc_ct *ct, } GEM_BUG_ON(tail > size); - /* -* make sure H2G buffer update and LRC tail update (if this triggering a -* submission) are visible before updating the descriptor tail -*/ - intel_guc_write_barrier(ct_to_guc(ct)); - /* update local copies */ ctb->tail = tail;
Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
Hi Fei, On Wed, Apr 19, 2023 at 12:59:09PM +0200, Andi Shyti wrote: > Hi Fei, > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > @@ -743,6 +743,13 @@ struct i915_vma *intel_guc_allocate_vma(struct > > intel_guc *guc, u32 size) > > if (IS_ERR(obj)) > > return ERR_CAST(obj); > > > > + /* > > +* Wa_22016122933: For MTL the shared memory needs to be mapped > > +* as WC on CPU side and UC (PAT index 2) on GPU side > > Isn't UC PAT index 3? Sorry, it's 2... I was reading the wrong table. Reviewed-by: Andi Shyti Andi
Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media
Hi Fei, > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > @@ -743,6 +743,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc > *guc, u32 size) > if (IS_ERR(obj)) > return ERR_CAST(obj); > > + /* > + * Wa_22016122933: For MTL the shared memory needs to be mapped > + * as WC on CPU side and UC (PAT index 2) on GPU side Isn't UC PAT index 3? Andi