Re: [PATCH 3/7] drm/i915/uc: use different ggtt pin offsets for uc loads

2022-10-10 Thread John Harrison

On 9/30/2022 16:42, Ceraolo Spurio, Daniele wrote:

On 9/30/2022 4:24 PM, John Harrison wrote:

On 9/22/2022 15:11, Daniele Ceraolo Spurio wrote:

Our current FW loading process is the same for all FWs:

- Pin FW to GGTT at the start of the ggtt->uc_fw node
- Load the FW
- Unpin

This worked because we didn't have a case where 2 FWs would be 
loaded on
the same GGTT at the same time. On MTL, however, this can happend if 
both
The point being that the mapping is done using a single 'dummy' vma 
that can't be mapped to two different places at the same time? But 
isn't there a separate dummy object per uc instance. So there would 
be one for the GuC on the render GT and another for the GuC on the 
media GT. So each would be mapped separately to it's own unique 
address and there is no conflict? Or what am I missing?


The issue is that without this patch all the dummy vmas are inserted 
at the same address (start of the reserved node), which only works if 
they can't be mapped at the same time. Note that we don't use the 
generic vm functions to insert the dummy vma and we instead specify 
the exact offset we want it mapped at. This patch makes it so that 
each dummy vma has its own private offset.


Got it. I think it would be worth adding some documentation about the 
forced mapping address and why it is necessary.



GTs are reset at the same time, so we can't pin everything in the same
spot and we need to use separate offset. For simplicity, instead of
calculating the exact required size, we reserve a 2MB slot for each fw.

Signed-off-by: Daniele Ceraolo Spurio 
Cc: John Harrison 
Cc: Alan Previn 
---
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 22 +++---
  1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c

index b91ad4aede1f..d6ca772e9f4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -666,16 +666,33 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
  return err;
  }
  +/*
+ * The reserved GGTT space is ~18 MBs. All our blobs are well below 
1MB, so for

+ * safety we reserve 2MB each.
+ */
+#define INTEL_UC_RSVD_GGTT_PER_FW SZ_2M
  static u32 uc_fw_ggtt_offset(struct intel_uc_fw *uc_fw)
  {
-    struct i915_ggtt *ggtt = __uc_fw_to_gt(uc_fw)->ggtt;
+    struct intel_gt *gt = __uc_fw_to_gt(uc_fw);
+    struct i915_ggtt *ggtt = gt->ggtt;
  struct drm_mm_node *node = >uc_fw;
+    u32 offset = uc_fw->type * INTEL_UC_RSVD_GGTT_PER_FW;
+
+    /*
+ * To keep the math simple, we use 8MB for the root tile and 
8MB for

+ * the media one.
+ */
+    BUILD_BUG_ON(INTEL_UC_FW_NUM_TYPES * INTEL_UC_RSVD_GGTT_PER_FW 
> SZ_8M);

Doesn't this need to be >= ?


No, this is a size check and 8MB is fine for a size.




+    if (gt->type == GT_MEDIA)
+    offset += SZ_8M;
    GEM_BUG_ON(!drm_mm_node_allocated(node));
  GEM_BUG_ON(upper_32_bits(node->start));
  GEM_BUG_ON(upper_32_bits(node->start + node->size - 1));
+    GEM_BUG_ON(offset + uc_fw->obj->base.size > node->size);
+    GEM_BUG_ON(uc_fw->obj->base.size > INTEL_UC_RSVD_GGTT_PER_FW);
Given that the firmware blob is loaded from the disk and therefore 
under user control, is a BUG_ON appropriate? Although there doesn't 
look to be any obvious way to abort at this point. Could the size 
check be moved to where we actually load the firmware rather than 
where it is being mapped?


My idea was that we wouldn't release a blob that violates this without 
updating the kernel first, so the only way a user can violate this is 
if they try to load a bogus file. But I can still move this check to 
fetch time and fail the fetch if the size is too big, so we can 
fall-back to something sensible.
Can't trust those pesky users. They can download a HTML page of an ASCII 
dump of a firmware blob and try to load that. For example. I've seen 
that before. So yeah, I think there definitely needs to be a 'drm_err; 
goto fail' rather than a BUG_ON somewhere.


Otherwise, it all seems good.

John.







  -    return lower_32_bits(node->start);
+    return lower_32_bits(node->start + offset);
  }
    static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
@@ -690,7 +707,6 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw 
*uc_fw)

  dummy->bi.pages = obj->mm.pages;
    GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
-    GEM_BUG_ON(dummy->node_size > ggtt->uc_fw.size);

Why remove this?


The new GEM_BUG_ONs in the function above perform a more strict test, 
so this is redundant.


Daniele



John.

    /* uc_fw->obj cache domains were not controlled across 
suspend */

  if (i915_gem_object_has_struct_page(obj))








Re: [PATCH 3/7] drm/i915/uc: use different ggtt pin offsets for uc loads

2022-09-30 Thread Ceraolo Spurio, Daniele




On 9/30/2022 4:24 PM, John Harrison wrote:



On 9/22/2022 15:11, Daniele Ceraolo Spurio wrote:

Our current FW loading process is the same for all FWs:

- Pin FW to GGTT at the start of the ggtt->uc_fw node
- Load the FW
- Unpin

This worked because we didn't have a case where 2 FWs would be loaded on
the same GGTT at the same time. On MTL, however, this can happend if 
both
The point being that the mapping is done using a single 'dummy' vma 
that can't be mapped to two different places at the same time? But 
isn't there a separate dummy object per uc instance. So there would be 
one for the GuC on the render GT and another for the GuC on the media 
GT. So each would be mapped separately to it's own unique address and 
there is no conflict? Or what am I missing?


The issue is that without this patch all the dummy vmas are inserted at 
the same address (start of the reserved node), which only works if they 
can't be mapped at the same time. Note that we don't use the generic vm 
functions to insert the dummy vma and we instead specify the exact 
offset we want it mapped at. This patch makes it so that each dummy vma 
has its own private offset.





GTs are reset at the same time, so we can't pin everything in the same
spot and we need to use separate offset. For simplicity, instead of
calculating the exact required size, we reserve a 2MB slot for each fw.

Signed-off-by: Daniele Ceraolo Spurio 
Cc: John Harrison 
Cc: Alan Previn 
---
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 22 +++---
  1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c

index b91ad4aede1f..d6ca772e9f4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -666,16 +666,33 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
  return err;
  }
  +/*
+ * The reserved GGTT space is ~18 MBs. All our blobs are well below 
1MB, so for

+ * safety we reserve 2MB each.
+ */
+#define INTEL_UC_RSVD_GGTT_PER_FW SZ_2M
  static u32 uc_fw_ggtt_offset(struct intel_uc_fw *uc_fw)
  {
-    struct i915_ggtt *ggtt = __uc_fw_to_gt(uc_fw)->ggtt;
+    struct intel_gt *gt = __uc_fw_to_gt(uc_fw);
+    struct i915_ggtt *ggtt = gt->ggtt;
  struct drm_mm_node *node = >uc_fw;
+    u32 offset = uc_fw->type * INTEL_UC_RSVD_GGTT_PER_FW;
+
+    /*
+ * To keep the math simple, we use 8MB for the root tile and 8MB 
for

+ * the media one.
+ */
+    BUILD_BUG_ON(INTEL_UC_FW_NUM_TYPES * INTEL_UC_RSVD_GGTT_PER_FW > 
SZ_8M);

Doesn't this need to be >= ?


No, this is a size check and 8MB is fine for a size.




+    if (gt->type == GT_MEDIA)
+    offset += SZ_8M;
    GEM_BUG_ON(!drm_mm_node_allocated(node));
  GEM_BUG_ON(upper_32_bits(node->start));
  GEM_BUG_ON(upper_32_bits(node->start + node->size - 1));
+    GEM_BUG_ON(offset + uc_fw->obj->base.size > node->size);
+    GEM_BUG_ON(uc_fw->obj->base.size > INTEL_UC_RSVD_GGTT_PER_FW);
Given that the firmware blob is loaded from the disk and therefore 
under user control, is a BUG_ON appropriate? Although there doesn't 
look to be any obvious way to abort at this point. Could the size 
check be moved to where we actually load the firmware rather than 
where it is being mapped?


My idea was that we wouldn't release a blob that violates this without 
updating the kernel first, so the only way a user can violate this is if 
they try to load a bogus file. But I can still move this check to fetch 
time and fail the fetch if the size is too big, so we can fall-back to 
something sensible.





  -    return lower_32_bits(node->start);
+    return lower_32_bits(node->start + offset);
  }
    static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
@@ -690,7 +707,6 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw 
*uc_fw)

  dummy->bi.pages = obj->mm.pages;
    GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
-    GEM_BUG_ON(dummy->node_size > ggtt->uc_fw.size);

Why remove this?


The new GEM_BUG_ONs in the function above perform a more strict test, so 
this is redundant.


Daniele



John.

    /* uc_fw->obj cache domains were not controlled across 
suspend */

  if (i915_gem_object_has_struct_page(obj))






Re: [PATCH 3/7] drm/i915/uc: use different ggtt pin offsets for uc loads

2022-09-30 Thread John Harrison




On 9/22/2022 15:11, Daniele Ceraolo Spurio wrote:

Our current FW loading process is the same for all FWs:

- Pin FW to GGTT at the start of the ggtt->uc_fw node
- Load the FW
- Unpin

This worked because we didn't have a case where 2 FWs would be loaded on
the same GGTT at the same time. On MTL, however, this can happend if both
The point being that the mapping is done using a single 'dummy' vma that 
can't be mapped to two different places at the same time? But isn't 
there a separate dummy object per uc instance. So there would be one for 
the GuC on the render GT and another for the GuC on the media GT. So 
each would be mapped separately to it's own unique address and there is 
no conflict? Or what am I missing?



GTs are reset at the same time, so we can't pin everything in the same
spot and we need to use separate offset. For simplicity, instead of
calculating the exact required size, we reserve a 2MB slot for each fw.

Signed-off-by: Daniele Ceraolo Spurio 
Cc: John Harrison 
Cc: Alan Previn 
---
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 22 +++---
  1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index b91ad4aede1f..d6ca772e9f4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -666,16 +666,33 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
return err;
  }
  
+/*

+ * The reserved GGTT space is ~18 MBs. All our blobs are well below 1MB, so for
+ * safety we reserve 2MB each.
+ */
+#define INTEL_UC_RSVD_GGTT_PER_FW SZ_2M
  static u32 uc_fw_ggtt_offset(struct intel_uc_fw *uc_fw)
  {
-   struct i915_ggtt *ggtt = __uc_fw_to_gt(uc_fw)->ggtt;
+   struct intel_gt *gt = __uc_fw_to_gt(uc_fw);
+   struct i915_ggtt *ggtt = gt->ggtt;
struct drm_mm_node *node = >uc_fw;
+   u32 offset = uc_fw->type * INTEL_UC_RSVD_GGTT_PER_FW;
+
+   /*
+* To keep the math simple, we use 8MB for the root tile and 8MB for
+* the media one.
+*/
+   BUILD_BUG_ON(INTEL_UC_FW_NUM_TYPES * INTEL_UC_RSVD_GGTT_PER_FW > SZ_8M);

Doesn't this need to be >= ?


+   if (gt->type == GT_MEDIA)
+   offset += SZ_8M;
  
  	GEM_BUG_ON(!drm_mm_node_allocated(node));

GEM_BUG_ON(upper_32_bits(node->start));
GEM_BUG_ON(upper_32_bits(node->start + node->size - 1));
+   GEM_BUG_ON(offset + uc_fw->obj->base.size > node->size);
+   GEM_BUG_ON(uc_fw->obj->base.size > INTEL_UC_RSVD_GGTT_PER_FW);
Given that the firmware blob is loaded from the disk and therefore under 
user control, is a BUG_ON appropriate? Although there doesn't look to be 
any obvious way to abort at this point. Could the size check be moved to 
where we actually load the firmware rather than where it is being mapped?


  
-	return lower_32_bits(node->start);

+   return lower_32_bits(node->start + offset);
  }
  
  static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)

@@ -690,7 +707,6 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
dummy->bi.pages = obj->mm.pages;
  
  	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));

-   GEM_BUG_ON(dummy->node_size > ggtt->uc_fw.size);

Why remove this?

John.

  
  	/* uc_fw->obj cache domains were not controlled across suspend */

if (i915_gem_object_has_struct_page(obj))




[PATCH 3/7] drm/i915/uc: use different ggtt pin offsets for uc loads

2022-09-22 Thread Daniele Ceraolo Spurio
Our current FW loading process is the same for all FWs:

- Pin FW to GGTT at the start of the ggtt->uc_fw node
- Load the FW
- Unpin

This worked because we didn't have a case where 2 FWs would be loaded on
the same GGTT at the same time. On MTL, however, this can happend if both
GTs are reset at the same time, so we can't pin everything in the same
spot and we need to use separate offset. For simplicity, instead of
calculating the exact required size, we reserve a 2MB slot for each fw.

Signed-off-by: Daniele Ceraolo Spurio 
Cc: John Harrison 
Cc: Alan Previn 
---
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index b91ad4aede1f..d6ca772e9f4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -666,16 +666,33 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
return err;
 }
 
+/*
+ * The reserved GGTT space is ~18 MBs. All our blobs are well below 1MB, so for
+ * safety we reserve 2MB each.
+ */
+#define INTEL_UC_RSVD_GGTT_PER_FW SZ_2M
 static u32 uc_fw_ggtt_offset(struct intel_uc_fw *uc_fw)
 {
-   struct i915_ggtt *ggtt = __uc_fw_to_gt(uc_fw)->ggtt;
+   struct intel_gt *gt = __uc_fw_to_gt(uc_fw);
+   struct i915_ggtt *ggtt = gt->ggtt;
struct drm_mm_node *node = >uc_fw;
+   u32 offset = uc_fw->type * INTEL_UC_RSVD_GGTT_PER_FW;
+
+   /*
+* To keep the math simple, we use 8MB for the root tile and 8MB for
+* the media one.
+*/
+   BUILD_BUG_ON(INTEL_UC_FW_NUM_TYPES * INTEL_UC_RSVD_GGTT_PER_FW > SZ_8M);
+   if (gt->type == GT_MEDIA)
+   offset += SZ_8M;
 
GEM_BUG_ON(!drm_mm_node_allocated(node));
GEM_BUG_ON(upper_32_bits(node->start));
GEM_BUG_ON(upper_32_bits(node->start + node->size - 1));
+   GEM_BUG_ON(offset + uc_fw->obj->base.size > node->size);
+   GEM_BUG_ON(uc_fw->obj->base.size > INTEL_UC_RSVD_GGTT_PER_FW);
 
-   return lower_32_bits(node->start);
+   return lower_32_bits(node->start + offset);
 }
 
 static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
@@ -690,7 +707,6 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
dummy->bi.pages = obj->mm.pages;
 
GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
-   GEM_BUG_ON(dummy->node_size > ggtt->uc_fw.size);
 
/* uc_fw->obj cache domains were not controlled across suspend */
if (i915_gem_object_has_struct_page(obj))
-- 
2.37.3