Re: [Intel-gfx] [PATCH v14 3/7] drm/i915: Define and use GuC and CTB TLB invalidation routines

2023-10-13 Thread Andi Shyti
Hi John,

...

> > @@ -560,6 +562,17 @@ static int __uc_init_hw(struct intel_uc *uc)
> > guc_info(guc, "submission %s\n", 
> > str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
> > guc_info(guc, "SLPC %s\n", 
> > str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
> > +   /*
> > +* The full GT reset will have cleared the TLB caches and flushed the
> > +* G2H message queue; we can release all the blocked waiters.
> > +*/
> > +   if (intel_guc_tlb_invalidation_is_available(guc)) {
> > +   xa_lock_irq(>tlb_lookup);
> > +   xa_for_each(>tlb_lookup, i, wait)
> > +   wake_up(>wq);
> > +   xa_unlock_irq(>tlb_lookup);
> > +   }
> > +
> This is not the right place for this. This is an init function but this
> comment is talking about resets and is doing things that assume the system
> has been running for some time.
> 
> Yes, hw init might happen as part of a reset but this is reset only
> processing and it should be in a reset specific code path.
> 
> What was wrong with the previous version that had the code in
> intel_guc_submission_reset?

It was wrongly placed there because at that point the gt reset
was not totally complete but it was still mid-way through.

The threads needed a bit more time in order to wait for the GT to
be completely alive after the reset.

The works need to be woken up at the end of the gt reset, where,
besides, we need to clear up the xa_array with work queues.

First Jonathan added it in driectly at the end of the
intel_gt_reset(), but that's not the right place as this is a UC
related operation, rather than GT.

Following the reset flow this looked like the right place, called
after the reset in the UC part.

What's wrong with placing it it here?

Andi


Re: [Intel-gfx] [PATCH v14 3/7] drm/i915: Define and use GuC and CTB TLB invalidation routines

2023-10-13 Thread John Harrison

On 10/13/2023 10:52, Jonathan Cavitt wrote:

From: Prathap Kumar Valsan 

The GuC firmware had defined the interface for Translation Look-Aside
Buffer (TLB) invalidation.  We should use this interface when
invalidating the engine and GuC TLBs.
Add additional functionality to intel_gt_invalidate_tlb, invalidating
the GuC TLBs and falling back to GT invalidation when the GuC is
disabled.
The invalidation is done by sending a request directly to the GuC
tlb_lookup that invalidates the table.  The invalidation is submitted as
a wait request and is performed in the CT event handler.  This means we
cannot perform this TLB invalidation path if the CT is not enabled.
If the request isn't fulfilled in two seconds, this would constitute
an error in the invalidation as that would constitute either a lost
request or a severe GuC overload.

With this new invalidation routine, we can perform GuC-based GGTT
invalidations.  GuC-based GGTT invalidation is incompatible with
MMIO invalidation so we should not perform MMIO invalidation when
GuC-based GGTT invalidation is expected.

The additional complexity incurred in this patch will be necessary for
range-based tlb invalidations, which will be platformed in the future.

Signed-off-by: Prathap Kumar Valsan 
Signed-off-by: Bruce Chang 
Signed-off-by: Chris Wilson 
Signed-off-by: Umesh Nerlige Ramappa 
Signed-off-by: Jonathan Cavitt 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Fei Yang 
CC: Andi Shyti 
Reviewed-by: Andi Shyti 
Acked-by: Tvrtko Ursulin 
Acked-by: Nirmoy Das 
Reviewed-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_ggtt.c  |  30 ++-
  drivers/gpu/drm/i915/gt/intel_tlb.c   |  16 +-
  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  33 
  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  22 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  11 ++
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   1 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 183 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |  13 ++
  8 files changed, 298 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 4d7d88b92632b..1c93e84278a03 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -206,22 +206,36 @@ static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
  }
  
+static void guc_ggtt_ct_invalidate(struct intel_gt *gt)

+{
+   struct intel_uncore *uncore = gt->uncore;
+   intel_wakeref_t wakeref;
+
+   with_intel_runtime_pm_if_active(uncore->rpm, wakeref) {
+   struct intel_guc *guc = >uc.guc;
+
+   intel_guc_invalidate_tlb_guc(guc);
+   }
+}
+
  static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
  {
struct drm_i915_private *i915 = ggtt->vm.i915;
+   struct intel_gt *gt;
  
  	gen8_ggtt_invalidate(ggtt);
  
-	if (GRAPHICS_VER(i915) >= 12) {

-   struct intel_gt *gt;
-
-   list_for_each_entry(gt, >gt_list, ggtt_link)
+   list_for_each_entry(gt, >gt_list, ggtt_link) {
+   if (intel_guc_tlb_invalidation_is_available(>uc.guc)) {
+   guc_ggtt_ct_invalidate(gt);
+   } else if (GRAPHICS_VER(i915) >= 12) {
intel_uncore_write_fw(gt->uncore,
  GEN12_GUC_TLB_INV_CR,
  GEN12_GUC_TLB_INV_CR_INVALIDATE);
-   } else {
-   intel_uncore_write_fw(ggtt->vm.gt->uncore,
- GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+   } else {
+   intel_uncore_write_fw(gt->uncore,
+ GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+   }
}
  }
  
@@ -1243,7 +1257,7 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)

ggtt->vm.raw_insert_page = gen8_ggtt_insert_page;
}
  
-	if (intel_uc_wants_guc(>vm.gt->uc))

+   if (intel_uc_wants_guc_submission(>vm.gt->uc))
ggtt->invalidate = guc_ggtt_invalidate;
else
ggtt->invalidate = gen8_ggtt_invalidate;
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c 
b/drivers/gpu/drm/i915/gt/intel_tlb.c
index 139608c30d978..4bb13d1890e37 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -12,6 +12,7 @@
  #include "intel_gt_print.h"
  #include "intel_gt_regs.h"
  #include "intel_tlb.h"
+#include "uc/intel_guc.h"
  
  /*

   * HW architecture suggest typical invalidation time at 40us,
@@ -131,11 +132,24 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, 
u32 seqno)
return;
  
  	with_intel_gt_pm_if_awake(gt, wakeref) {

+   struct intel_guc *guc = >uc.guc;
+
mutex_lock(>tlb.invalidate_lock);
if 

[Intel-gfx] [PATCH v14 3/7] drm/i915: Define and use GuC and CTB TLB invalidation routines

2023-10-13 Thread Jonathan Cavitt
From: Prathap Kumar Valsan 

The GuC firmware had defined the interface for Translation Look-Aside
Buffer (TLB) invalidation.  We should use this interface when
invalidating the engine and GuC TLBs.
Add additional functionality to intel_gt_invalidate_tlb, invalidating
the GuC TLBs and falling back to GT invalidation when the GuC is
disabled.
The invalidation is done by sending a request directly to the GuC
tlb_lookup that invalidates the table.  The invalidation is submitted as
a wait request and is performed in the CT event handler.  This means we
cannot perform this TLB invalidation path if the CT is not enabled.
If the request isn't fulfilled in two seconds, this would constitute
an error in the invalidation as that would constitute either a lost
request or a severe GuC overload.

With this new invalidation routine, we can perform GuC-based GGTT
invalidations.  GuC-based GGTT invalidation is incompatible with
MMIO invalidation so we should not perform MMIO invalidation when
GuC-based GGTT invalidation is expected.

The additional complexity incurred in this patch will be necessary for
range-based tlb invalidations, which will be platformed in the future.

Signed-off-by: Prathap Kumar Valsan 
Signed-off-by: Bruce Chang 
Signed-off-by: Chris Wilson 
Signed-off-by: Umesh Nerlige Ramappa 
Signed-off-by: Jonathan Cavitt 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Fei Yang 
CC: Andi Shyti 
Reviewed-by: Andi Shyti 
Acked-by: Tvrtko Ursulin 
Acked-by: Nirmoy Das 
Reviewed-by: John Harrison 
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c  |  30 ++-
 drivers/gpu/drm/i915/gt/intel_tlb.c   |  16 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  33 
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|  22 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  11 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   1 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 183 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  13 ++
 8 files changed, 298 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 4d7d88b92632b..1c93e84278a03 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -206,22 +206,36 @@ static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
 }
 
+static void guc_ggtt_ct_invalidate(struct intel_gt *gt)
+{
+   struct intel_uncore *uncore = gt->uncore;
+   intel_wakeref_t wakeref;
+
+   with_intel_runtime_pm_if_active(uncore->rpm, wakeref) {
+   struct intel_guc *guc = >uc.guc;
+
+   intel_guc_invalidate_tlb_guc(guc);
+   }
+}
+
 static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
 {
struct drm_i915_private *i915 = ggtt->vm.i915;
+   struct intel_gt *gt;
 
gen8_ggtt_invalidate(ggtt);
 
-   if (GRAPHICS_VER(i915) >= 12) {
-   struct intel_gt *gt;
-
-   list_for_each_entry(gt, >gt_list, ggtt_link)
+   list_for_each_entry(gt, >gt_list, ggtt_link) {
+   if (intel_guc_tlb_invalidation_is_available(>uc.guc)) {
+   guc_ggtt_ct_invalidate(gt);
+   } else if (GRAPHICS_VER(i915) >= 12) {
intel_uncore_write_fw(gt->uncore,
  GEN12_GUC_TLB_INV_CR,
  GEN12_GUC_TLB_INV_CR_INVALIDATE);
-   } else {
-   intel_uncore_write_fw(ggtt->vm.gt->uncore,
- GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+   } else {
+   intel_uncore_write_fw(gt->uncore,
+ GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+   }
}
 }
 
@@ -1243,7 +1257,7 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.raw_insert_page = gen8_ggtt_insert_page;
}
 
-   if (intel_uc_wants_guc(>vm.gt->uc))
+   if (intel_uc_wants_guc_submission(>vm.gt->uc))
ggtt->invalidate = guc_ggtt_invalidate;
else
ggtt->invalidate = gen8_ggtt_invalidate;
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c 
b/drivers/gpu/drm/i915/gt/intel_tlb.c
index 139608c30d978..4bb13d1890e37 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -12,6 +12,7 @@
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_tlb.h"
+#include "uc/intel_guc.h"
 
 /*
  * HW architecture suggest typical invalidation time at 40us,
@@ -131,11 +132,24 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, 
u32 seqno)
return;
 
with_intel_gt_pm_if_awake(gt, wakeref) {
+   struct intel_guc *guc = >uc.guc;
+
mutex_lock(>tlb.invalidate_lock);
if (tlb_seqno_passed(gt, seqno))
goto unlock;
 
-